reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Outlier Robust and Sparse Estimation of Linear Regression Coefficients

Authors: Takeyuki Sasai, Hironori Fujisawa

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present numerical experiments about our methods. In all of the experiments, we used CVXPY (Diamond and Boyd, 2016). The data were generated from yi = x i β + ξi + nθi, i = 1, , 80, where {ξi}80 i=1 was drawn from Student s t-distribution with two degrees of freedom, {xi}80 i=1 was drawn from 120-dimensional i.i.d. Gaussian, β = (20, 10, 0, , 0) and {ξi, ϱi}80 i=1 is a sequence of outliers. We note that due to the high computational cost of WEIGHT (Algorithm 2) in Algorithm 1, the sample size and dimensionality in the experiments were restricted. We increased the number of outliers o from 0 to 20 and we conducted 10 experiments for each value. For outliers, we randomly chose a set of indices O [1, , 80] such that \|O\| = o, and we tried two patterns of outlier values: (θi, ϱi)i O = (10000, 1) and (1, 10000). For both patterns, we set (θi, ϱi)i [1, ,80]\O = (0, 0). We compared the estimator from Algorithm 1, the estimator from Algorithm 6, and the estimator from the standard Lasso. The estimator of the standard Lasso is defined as follows: ... The results are shown in Figure 1 and Figure 2.
Researcher Affiliation	Academia	1 The Graduate Institute for Advanced Studies, SOKENDAI, Tokyo, Japan 2 The Institute of Statistical Mathematics, Tokyo, Japan 3 Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
Pseudocode	Yes	Algorithm 1 OUTLIER-ROBUST-AND-SPARSE-ESTIMATION Input: {yi, Xi}n i=1, Σ(= Exx ) and tuning parameters τcut, ε, r1, r2, λo, λs Output: ˆβ 1: { ˆwi}n i=1 WEIGHT({Xi}n i=1, τcut, ε, r1, r2, Σ) 2: { ˆw i}n i=1 TRUNCATION({ ˆwi}n i=1) 3: ˆβ WEIGHTED-PENALIZED-HUBER-REGRESSION ({yi, Xi}n i=1, { ˆw i}n i=1, λo, λs)
Open Source Code	No	The paper does not provide explicit links to source code repositories or a statement affirming the release of the code for the methodology described. It mentions using "CVXPY (Diamond and Boyd, 2016)" for experiments, which is a third-party tool, not the authors' own code release.
Open Datasets	No	The data were generated from yi = x i β + ξi + nθi, i = 1, , 80, where {ξi}80 i=1 was drawn from Student s t-distribution with two degrees of freedom, {xi}80 i=1 was drawn from 120-dimensional i.i.d. Gaussian, β = (20, 10, 0, , 0) and {ξi, ϱi}80 i=1 is a sequence of outliers.
Dataset Splits	No	The paper describes data generation for numerical experiments but does not specify any training/test/validation dataset splits. The number of outliers `o` is varied from 0 to 20, and 10 experiments were conducted for each value, but this is a parameter variation for evaluation, not a data splitting strategy.
Hardware Specification	No	The paper mentions running "numerical experiments" and discusses "computational cost" but does not provide any specific details about the hardware used (e.g., CPU, GPU models, memory, or cloud instances).
Software Dependencies	No	In all of the experiments, we used CVXPY (Diamond and Boyd, 2016).
Experiment Setup	Yes	For Algorithm 1, we set (λo, λs, r1, r2, ε) = ( o , sλs, sλs, ) and for simplicity we omitted the if part of WEIGHT. For Algorithm 6 and the standard Lasso, we set...