Outlier Robust and Sparse Estimation of Linear Regression Coefficients
Authors: Takeyuki Sasai, Hironori Fujisawa
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present numerical experiments about our methods. In all of the experiments, we used CVXPY (Diamond and Boyd, 2016). The data were generated from yi = x i β + ξi + nθi, i = 1, , 80, where {ξi}80 i=1 was drawn from Student s t-distribution with two degrees of freedom, {xi}80 i=1 was drawn from 120-dimensional i.i.d. Gaussian, β = (20, 10, 0, , 0) and {ξi, ϱi}80 i=1 is a sequence of outliers. We note that due to the high computational cost of WEIGHT (Algorithm 2) in Algorithm 1, the sample size and dimensionality in the experiments were restricted. We increased the number of outliers o from 0 to 20 and we conducted 10 experiments for each value. For outliers, we randomly chose a set of indices O [1, , 80] such that |O| = o, and we tried two patterns of outlier values: (θi, ϱi)i O = (10000, 1) and (1, 10000). For both patterns, we set (θi, ϱi)i [1, ,80]\O = (0, 0). We compared the estimator from Algorithm 1, the estimator from Algorithm 6, and the estimator from the standard Lasso. The estimator of the standard Lasso is defined as follows: ... The results are shown in Figure 1 and Figure 2. |
| Researcher Affiliation | Academia | 1 The Graduate Institute for Advanced Studies, SOKENDAI, Tokyo, Japan 2 The Institute of Statistical Mathematics, Tokyo, Japan 3 Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan |
| Pseudocode | Yes | Algorithm 1 OUTLIER-ROBUST-AND-SPARSE-ESTIMATION Input: {yi, Xi}n i=1, Σ(= Exx ) and tuning parameters τcut, ε, r1, r2, λo, λs Output: ˆβ 1: { ˆwi}n i=1 WEIGHT({Xi}n i=1, τcut, ε, r1, r2, Σ) 2: { ˆw i}n i=1 TRUNCATION({ ˆwi}n i=1) 3: ˆβ WEIGHTED-PENALIZED-HUBER-REGRESSION ({yi, Xi}n i=1, { ˆw i}n i=1, λo, λs) |
| Open Source Code | No | The paper does not provide explicit links to source code repositories or a statement affirming the release of the code for the methodology described. It mentions using "CVXPY (Diamond and Boyd, 2016)" for experiments, which is a third-party tool, not the authors' own code release. |
| Open Datasets | No | The data were generated from yi = x i β + ξi + nθi, i = 1, , 80, where {ξi}80 i=1 was drawn from Student s t-distribution with two degrees of freedom, {xi}80 i=1 was drawn from 120-dimensional i.i.d. Gaussian, β = (20, 10, 0, , 0) and {ξi, ϱi}80 i=1 is a sequence of outliers. |
| Dataset Splits | No | The paper describes data generation for numerical experiments but does not specify any training/test/validation dataset splits. The number of outliers `o` is varied from 0 to 20, and 10 experiments were conducted for each value, but this is a parameter variation for evaluation, not a data splitting strategy. |
| Hardware Specification | No | The paper mentions running "numerical experiments" and discusses "computational cost" but does not provide any specific details about the hardware used (e.g., CPU, GPU models, memory, or cloud instances). |
| Software Dependencies | No | In all of the experiments, we used CVXPY (Diamond and Boyd, 2016). |
| Experiment Setup | Yes | For Algorithm 1, we set (λo, λs, r1, r2, ε) = ( o , sλs, sλs, ) and for simplicity we omitted the if part of WEIGHT. For Algorithm 6 and the standard Lasso, we set... |