Rank-one Convexification for Sparse Regression
Authors: Alper Atamturk, Andres Gomez
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our computational experiments with benchmark datasets, the proposed conic formulations are solved within seconds and result in near-optimal solutions (with 0.4% optimality gap on average) for non-convex ℓ0-problems. Moreover, the resulting estimators also outperform alternative convex approaches, such as lasso and elastic net regression, from a statistical perspective, achieving high prediction accuracy and good interpretability. |
| Researcher Affiliation | Academia | Alper Atamturk EMAIL Department of Industrial Engineering & Operations Research University of California Berkeley, CA 94720, USA Andres Gomez EMAIL Department of Industrial & Systems Engineering University of Southern California Los Angeles, CA 90089, USA |
| Pseudocode | No | The paper describes mathematical formulations, theorems, and proofs but does not contain any explicit pseudocode or algorithm blocks with structured steps. |
| Open Source Code | No | The paper states: "The processed datasets before standardization can be downloaded from http://atamturk.ieor.berkeley.edu/data/sparse.regression." This provides access to datasets, not the source code for the methodology described in the paper. There is no explicit statement or link indicating that the code for their proposed methods is open-source or publicly available. |
| Open Datasets | Yes | We use the benchmark datasets in Table 1. The first five were first used by Miyashiro and Takano (2015) in the context of MIO algorithms for best subset selection, and later used by G omez and Prokopyev (2021). The diabetes dataset with all second interactions was introduced by Efron et al. (2004) in the context of lasso, and later used by Bertsimas et al. (2016). A few datasets require some manipulation to eliminate missing values and handle categorical variables. The processed datasets before standardization5 can be downloaded from http://atamturk.ieor.berkeley.edu/data/sparse.regression. |
| Dataset Splits | Yes | In addition to the training set of size n, a validation set of size n is generated with the same parameters, matching the precision of leave-one-out cross-validation. |
| Hardware Specification | Yes | All computations are performed on a laptop with a 1.80GHz Intel Core TM i7-8550U CPU and 16 GB main memory. |
| Software Dependencies | Yes | Semidefinite optimization problems are solved with MOSEK 8.1 solver, and conic quadratic optimization problems (continuous and mixedinteger) are solved with CPLEX 12.8 solver. |
| Experiment Setup | Yes | All solver parameters were set to their default values. We let α = 0.1ℓfor integer 0 ℓ 10, we generated 50 values of λ ranging from λmax = X y to λmax/200 on a log scale, and using the pair (λ, µ) that results in the best prediction error on the validation set. A total of 500 (α, λ) pairs are tested. |