reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Error estimation and adaptive tuning for unregularized robust M-estimator

Authors: Pierre C. Bellec, Takuya Koriyama

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Section 4 is devoted to numerical simulations, and Section 5 gives an outline of the proof. The rigorous proofs are provided in appendix. See Figure 1a for simulation. The formal statement of the consistency result (5) is in Theorem 1 below.
Researcher Affiliation	Academia	Pierre C. Bellec EMAIL Department of Statistics Rutgers University Piscataway, NJ 08854, USA Takuya Koriyama EMAIL Booth School of Business The University of Chicago Chicago, IL 60637, USA
Pseudocode	No	The paper describes mathematical proofs, theorems, and derivations. While it presents detailed steps and formulas, it does not include any explicitly labeled pseudocode or algorithm blocks in a structured, code-like format.
Open Source Code	No	The paper mentions using 'scipy.optimize.fsolve from scipy (Virtanen et al., 2020)' for solving a nonlinear system in the numerical simulations. However, there is no explicit statement about releasing the source code for the authors' own methodology or a link to a repository.
Open Datasets	No	The paper does not use any pre-existing public datasets. Instead, it generates synthetic data for its numerical simulations, specifying parameters such as '(n, p) = (4000, 1200), Fϵ = t-dist(df = 2), Σ = Ip and β = 0p' and varying the noise distribution 'Fϵ := σ t-dist(df = 2) where σ [1, 3]'.
Dataset Splits	No	The paper uses synthetically generated data for numerical simulations rather than a pre-existing dataset. Therefore, the concept of training/test/validation splits for an external dataset is not applicable, and no such splits are described.
Hardware Specification	No	The paper provides details about the parameters and setup for its numerical simulations (e.g., sample size, noise distribution) but does not specify any particular hardware (e.g., CPU, GPU models, or cloud resources) used to run these simulations.
Software Dependencies	Yes	In the numerical simulations presented in Section 4 and Appendix D, this system was solved using the solver scipy.optimize.fsolve from scipy (Virtanen et al., 2020).
Experiment Setup	Yes	We set (n, p) = (4000, 1200), Fϵ = t-dist(df = 2), Σ = Ip and β = 0p. Once we generate (y, X), we compute (R(λ), ˆR(λ)) for each λ in a finite grid. We repeat the above procedure 100 times and plot (R(λ), ˆR(λ)) in Figure 3a, and the relative error \| ˆR(λ)/R(λ) 1\| in Figure 3b. We also plot α2(λ) in Figure 3a by solving the nonlinear system of equations (15) (see Remark 3 for details). Next, we conduct the adaptive tuning of the scale parameter λ > 0. Let us take I = [1, 10] and the finite grid IN as {λi = 10i/100 : i 0, 1, . . . , 100} I. Then, Corollary 1 and Theorem 3 implies... Below, we verify (26) as we change the scale of noise distribution Fϵ in the following way: Fϵ := σ t-dist(df = 2) where σ [1, 3]. For each σ in a finite grid over [1, 3], we generate dataset (X, y) and calculate R(ˆλN) and minλ IN R(λ). We repeat the above procedure 100 times.