reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Rank-one Convexification for Sparse Regression

Authors: Alper Atamturk, Andres Gomez

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our computational experiments with benchmark datasets, the proposed conic formulations are solved within seconds and result in near-optimal solutions (with 0.4% optimality gap on average) for non-convex ℓ0-problems. Moreover, the resulting estimators also outperform alternative convex approaches, such as lasso and elastic net regression, from a statistical perspective, achieving high prediction accuracy and good interpretability.
Researcher Affiliation	Academia	Alper Atamturk EMAIL Department of Industrial Engineering & Operations Research University of California Berkeley, CA 94720, USA Andres Gomez EMAIL Department of Industrial & Systems Engineering University of Southern California Los Angeles, CA 90089, USA
Pseudocode	No	The paper describes mathematical formulations, theorems, and proofs but does not contain any explicit pseudocode or algorithm blocks with structured steps.
Open Source Code	No	The paper states: "The processed datasets before standardization can be downloaded from http://atamturk.ieor.berkeley.edu/data/sparse.regression." This provides access to datasets, not the source code for the methodology described in the paper. There is no explicit statement or link indicating that the code for their proposed methods is open-source or publicly available.
Open Datasets	Yes	We use the benchmark datasets in Table 1. The first five were first used by Miyashiro and Takano (2015) in the context of MIO algorithms for best subset selection, and later used by G omez and Prokopyev (2021). The diabetes dataset with all second interactions was introduced by Efron et al. (2004) in the context of lasso, and later used by Bertsimas et al. (2016). A few datasets require some manipulation to eliminate missing values and handle categorical variables. The processed datasets before standardization5 can be downloaded from http://atamturk.ieor.berkeley.edu/data/sparse.regression.
Dataset Splits	Yes	In addition to the training set of size n, a validation set of size n is generated with the same parameters, matching the precision of leave-one-out cross-validation.
Hardware Specification	Yes	All computations are performed on a laptop with a 1.80GHz Intel Core TM i7-8550U CPU and 16 GB main memory.
Software Dependencies	Yes	Semidefinite optimization problems are solved with MOSEK 8.1 solver, and conic quadratic optimization problems (continuous and mixedinteger) are solved with CPLEX 12.8 solver.
Experiment Setup	Yes	All solver parameters were set to their default values. We let α = 0.1ℓfor integer 0 ℓ 10, we generated 50 values of λ ranging from λmax = X y to λmax/200 on a log scale, and using the pair (λ, µ) that results in the best prediction error on the validation set. A total of 500 (α, λ) pairs are tested.