reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Constructive Approach to $L_0$ Penalized Regression

Authors: Jian Huang, Yuling Jiao, Yanyan Liu, Xiliang Lu

JMLR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation studies demonstrate that SDAR outperforms Lasso, MCP and two greedy methods in accuracy and eﬃciency.
Researcher Affiliation	Academia	Jian Huang EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hung Hom, Kowloon Hong Kong, China; Yuling Jiao EMAIL School of Statistics and Mathematics Zhongnan University of Economics and Law Wuhan, 430063, China; Yanyan Liu EMAIL School of Mathematics and Statistics Wuhan University Wuhan, 430072, China; Xiliang Lu EMAIL School of Mathematics and Statistics Wuhan University Wuhan, 430072, China
Pseudocode	Yes	Algorithm 1 Support detection and root ﬁnding (SDAR); Algorithm 2 Adaptive SDAR (ASDAR)
Open Source Code	Yes	We have implemented SDAR in a Matlab package sdar, which is available at http: //homepage.stat.uiowa.edu/~jian/.
Open Datasets	No	To generate the design matrix X, we ﬁrst generate an n p random Gaussian matrix X whose entries are i.i.d. N(0, 1) and then normalize its columns to the n length. Then X is generated with X1 = X1, Xj = Xj + ρ( Xj+1 + Xj 1), j = 2, . . . , p 1 and Xp = Xp. The underlying regression coeﬃcient β is generated with the nonzero coeﬃcients uniformly distributed in [m, M], where m = σ p 2 log(p)/n and M = 100m. Then the observation vector y = Xβ + η with η1, . . . , ηn generated independently from N(0, σ2). The paper describes how data is generated for simulation studies, but does not provide access to a pre-existing public dataset.
Dataset Splits	No	The paper uses synthetically generated data and performs
Hardware Specification	No	For the examples we consider in our simulation studies with (n, p) = (5000, 50000), it can ﬁnd the solution in seconds on a personal laptop computer. This mention is too general to be considered specific hardware details.
Software Dependencies	No	We implemented SDAR/ASDAR, Fo Ba, Gra Des and MCP in Mat Lab. For Fo Ba, our Mat Lab implementation follows the R package developed by Zhang (2011a). Our implementation of MCP uses the iterative threshholding algorithm (She, 2009) with warm starts. Publicly available Matlab packages for LARS (included in the Sparse Lab package) are used. The paper mentions software names but no specific version numbers.
Experiment Setup	Yes	We consider a moderately large scale setting with n = 5000 and p = 50000. The number of nonzero coeﬃcients is set to be K = 400. ... We set R = 100, σ = 1 and ρ = 0.2, 0.4 and 0.6. ... In Gra Des, ... we set sk = 1/3 ... We stop Gra Des when the residual norm is smaller than ε = nσ, or the maximum number of iterations is greater than n/2. ... In ASDAR (Algorithm 2), we set τ = 50 and we stop the iteration if the residual y Xβk is smaller than ε = nσ or k L = n/ log(n).