reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simplex Constrained Sparse Optimization via Tail Screening

Authors: Peng Chen, Jin Zhu, Junxian Zhu, Xueqin Wang

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the statistical and computational eﬃciency of PERMITS with both synthetic and real data. The implementation of the proposed method can be found in https://github.com/abess-team/PERMITS. In this section, we perform numerical experiments using both synthetic and real data to verify the self-regularization property and the advantage of PERMITS. Statistical performance. We ﬁrst present the statistical performance of methods in the following. Two metrics of statistical performance are considered here: accuracy and error. Computation performance. With respect to the computational performance, we mainly focus on the running time of diﬀerent methods and their dependence on the dimension p. Real data: DJIA constituents detection.
Researcher Affiliation	Academia	Peng Chen EMAIL Department of Statistics and Finance, School of Management University of Science and Technology of China Jin Zhu EMAIL Department of Statistics, London School of Economics and Political Science Junxian Zhu EMAIL Saw Swee Hock School of Public Health, National University of Singapore Xueqin Wang EMAIL Department of Statistics and Finance/International Institute of Finance, School of Management University of Science and Technology of China
Pseudocode	Yes	Algorithm 1 Projected Gradient (PG) Method Algorithm 2 Proj Ected g Radient Method w Ith Tail Screening (PERMITS) Algorithm 3 Backtracking procedure for ﬁnding M
Open Source Code	Yes	We demonstrate the statistical and computational eﬃciency of PERMITS with both synthetic and real data. The implementation of the proposed method can be found in https://github.com/abess-team/PERMITS.
Open Datasets	No	No specific link, DOI, repository name, or formal citation for public access to the real-world dataset is provided. The paper states: "We collect daily returns data of the year 2022, which contains n = 251 samples." and "The daily return of the DJIA index yt is a price-weighted daily return of s = 30 prominent companies. Here, we choose the stocks pool to be the constituents S&P 500, which contains p = 490 stocks after dropping those with missing data." This implies data collection by the authors rather than use of an openly accessible, cited dataset.
Dataset Splits	No	The paper describes how synthetic data is generated and uses 50 repeated simulations, but it does not specify train/test/validation splits for either synthetic or real data. For the real data, it states "We collect daily returns data of the year 2022, which contains n = 251 samples," but does not detail any splits of these samples for experimentation.
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU types, memory amounts) used for running the experiments are mentioned in the paper.
Software Dependencies	No	The paper mentions "CVXPY (Diamond and Boyd, 2016), an open-source Python library for convex optimization problems" and "OSQP solver (Stellato et al., 2020)" but does not provide specific version numbers for these software components or for Python itself.
Experiment Setup	Yes	In our implementation, ϵ = 10 4p log(p)/n is the default choice. In our implementation, we set L0 = 1, γ = 2, which performed well across all numerical experiments. Tmin is an additional hyper-parameter for the minimum number of iterations that are needed for the proof of statistical properties, and a small value (e.g., Tmin = 5 in our numerical experiment) is usually suﬃcient. We compare the performance of PERMITS with the following methods: 1) Oracle: the oracle estimator, which is obtained by solving (1) subject to the constraint that supp(w) = S . 2) IHT: the iterative hard thresholding estimator proposed by Kyrillidis et al. (2013). 3) H-Lasso: the heuristic Lasso estimator that ﬁrstly solves the non-negative Lasso problem and then divides the solution by its ℓ1 norm to fulﬁll the simplex constraint. The penalty parameter λ of H-Lasso is selected via cross-validation. For IHT, the true sparsity s is provided since, otherwise, selecting the sparsity is time-consuming. Here, we show the results of two other choices such that ϵ = 10 3p log(p)/n and ϵ = 10 5p log(p)/n, and thus these methods are referred to as PERMITS(e-3), PERMITS(e-4), and PERMITS(e5), respectively. We use the OSQP solver s default hyperparameters: a maximum number of 104 iterations, an absolute accuracy tolerance of 10 5, a relative accuracy tolerance of 10 5.