Outlier Robust and Sparse Estimation of Linear Regression Coefficients

Authors: Takeyuki Sasai, Hironori Fujisawa

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present numerical experiments about our methods. In all of the experiments, we used CVXPY (Diamond and Boyd, 2016). The data were generated from yi = x i β + ξi + nθi, i = 1, , 80, where {ξi}80 i=1 was drawn from Student s t-distribution with two degrees of freedom, {xi}80 i=1 was drawn from 120-dimensional i.i.d. Gaussian, β = (20, 10, 0, , 0) and {ξi, ϱi}80 i=1 is a sequence of outliers. We note that due to the high computational cost of WEIGHT (Algorithm 2) in Algorithm 1, the sample size and dimensionality in the experiments were restricted. We increased the number of outliers o from 0 to 20 and we conducted 10 experiments for each value. For outliers, we randomly chose a set of indices O [1, , 80] such that |O| = o, and we tried two patterns of outlier values: (θi, ϱi)i O = (10000, 1) and (1, 10000). For both patterns, we set (θi, ϱi)i [1, ,80]\O = (0, 0). We compared the estimator from Algorithm 1, the estimator from Algorithm 6, and the estimator from the standard Lasso. The estimator of the standard Lasso is defined as follows: ... The results are shown in Figure 1 and Figure 2.
Researcher Affiliation Academia 1 The Graduate Institute for Advanced Studies, SOKENDAI, Tokyo, Japan 2 The Institute of Statistical Mathematics, Tokyo, Japan 3 Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
Pseudocode Yes Algorithm 1 OUTLIER-ROBUST-AND-SPARSE-ESTIMATION Input: {yi, Xi}n i=1, Σ(= Exx ) and tuning parameters τcut, ε, r1, r2, λo, λs Output: ˆβ 1: { ˆwi}n i=1 WEIGHT({Xi}n i=1, τcut, ε, r1, r2, Σ) 2: { ˆw i}n i=1 TRUNCATION({ ˆwi}n i=1) 3: ˆβ WEIGHTED-PENALIZED-HUBER-REGRESSION ({yi, Xi}n i=1, { ˆw i}n i=1, λo, λs)
Open Source Code No The paper does not provide explicit links to source code repositories or a statement affirming the release of the code for the methodology described. It mentions using "CVXPY (Diamond and Boyd, 2016)" for experiments, which is a third-party tool, not the authors' own code release.
Open Datasets No The data were generated from yi = x i β + ξi + nθi, i = 1, , 80, where {ξi}80 i=1 was drawn from Student s t-distribution with two degrees of freedom, {xi}80 i=1 was drawn from 120-dimensional i.i.d. Gaussian, β = (20, 10, 0, , 0) and {ξi, ϱi}80 i=1 is a sequence of outliers.
Dataset Splits No The paper describes data generation for numerical experiments but does not specify any training/test/validation dataset splits. The number of outliers `o` is varied from 0 to 20, and 10 experiments were conducted for each value, but this is a parameter variation for evaluation, not a data splitting strategy.
Hardware Specification No The paper mentions running "numerical experiments" and discusses "computational cost" but does not provide any specific details about the hardware used (e.g., CPU, GPU models, memory, or cloud instances).
Software Dependencies No In all of the experiments, we used CVXPY (Diamond and Boyd, 2016).
Experiment Setup Yes For Algorithm 1, we set (λo, λs, r1, r2, ε) = ( o , sλs, sλs, ) and for simplicity we omitted the if part of WEIGHT. For Algorithm 6 and the standard Lasso, we set...