Rank-one Convexification for Sparse Regression

Authors: Alper Atamturk, Andres Gomez

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our computational experiments with benchmark datasets, the proposed conic formulations are solved within seconds and result in near-optimal solutions (with 0.4% optimality gap on average) for non-convex ℓ0-problems. Moreover, the resulting estimators also outperform alternative convex approaches, such as lasso and elastic net regression, from a statistical perspective, achieving high prediction accuracy and good interpretability.
Researcher Affiliation Academia Alper Atamturk EMAIL Department of Industrial Engineering & Operations Research University of California Berkeley, CA 94720, USA Andres Gomez EMAIL Department of Industrial & Systems Engineering University of Southern California Los Angeles, CA 90089, USA
Pseudocode No The paper describes mathematical formulations, theorems, and proofs but does not contain any explicit pseudocode or algorithm blocks with structured steps.
Open Source Code No The paper states: "The processed datasets before standardization can be downloaded from http://atamturk.ieor.berkeley.edu/data/sparse.regression." This provides access to datasets, not the source code for the methodology described in the paper. There is no explicit statement or link indicating that the code for their proposed methods is open-source or publicly available.
Open Datasets Yes We use the benchmark datasets in Table 1. The first five were first used by Miyashiro and Takano (2015) in the context of MIO algorithms for best subset selection, and later used by G omez and Prokopyev (2021). The diabetes dataset with all second interactions was introduced by Efron et al. (2004) in the context of lasso, and later used by Bertsimas et al. (2016). A few datasets require some manipulation to eliminate missing values and handle categorical variables. The processed datasets before standardization5 can be downloaded from http://atamturk.ieor.berkeley.edu/data/sparse.regression.
Dataset Splits Yes In addition to the training set of size n, a validation set of size n is generated with the same parameters, matching the precision of leave-one-out cross-validation.
Hardware Specification Yes All computations are performed on a laptop with a 1.80GHz Intel Core TM i7-8550U CPU and 16 GB main memory.
Software Dependencies Yes Semidefinite optimization problems are solved with MOSEK 8.1 solver, and conic quadratic optimization problems (continuous and mixedinteger) are solved with CPLEX 12.8 solver.
Experiment Setup Yes All solver parameters were set to their default values. We let α = 0.1ℓfor integer 0 ℓ 10, we generated 50 values of λ ranging from λmax = X y to λmax/200 on a log scale, and using the pair (λ, µ) that results in the best prediction error on the validation set. A total of 500 (α, λ) pairs are tested.