A Constructive Approach to $L_0$ Penalized Regression

Authors: Jian Huang, Yuling Jiao, Yanyan Liu, Xiliang Lu

JMLR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulation studies demonstrate that SDAR outperforms Lasso, MCP and two greedy methods in accuracy and efficiency.
Researcher Affiliation Academia Jian Huang EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hung Hom, Kowloon Hong Kong, China; Yuling Jiao EMAIL School of Statistics and Mathematics Zhongnan University of Economics and Law Wuhan, 430063, China; Yanyan Liu EMAIL School of Mathematics and Statistics Wuhan University Wuhan, 430072, China; Xiliang Lu EMAIL School of Mathematics and Statistics Wuhan University Wuhan, 430072, China
Pseudocode Yes Algorithm 1 Support detection and root finding (SDAR); Algorithm 2 Adaptive SDAR (ASDAR)
Open Source Code Yes We have implemented SDAR in a Matlab package sdar, which is available at http: //homepage.stat.uiowa.edu/~jian/.
Open Datasets No To generate the design matrix X, we first generate an n p random Gaussian matrix X whose entries are i.i.d. N(0, 1) and then normalize its columns to the n length. Then X is generated with X1 = X1, Xj = Xj + ρ( Xj+1 + Xj 1), j = 2, . . . , p 1 and Xp = Xp. The underlying regression coefficient β is generated with the nonzero coefficients uniformly distributed in [m, M], where m = σ p 2 log(p)/n and M = 100m. Then the observation vector y = Xβ + η with η1, . . . , ηn generated independently from N(0, σ2). The paper describes how data is generated for simulation studies, but does not provide access to a pre-existing public dataset.
Dataset Splits No The paper uses synthetically generated data and performs
Hardware Specification No For the examples we consider in our simulation studies with (n, p) = (5000, 50000), it can find the solution in seconds on a personal laptop computer. This mention is too general to be considered specific hardware details.
Software Dependencies No We implemented SDAR/ASDAR, Fo Ba, Gra Des and MCP in Mat Lab. For Fo Ba, our Mat Lab implementation follows the R package developed by Zhang (2011a). Our implementation of MCP uses the iterative threshholding algorithm (She, 2009) with warm starts. Publicly available Matlab packages for LARS (included in the Sparse Lab package) are used. The paper mentions software names but no specific version numbers.
Experiment Setup Yes We consider a moderately large scale setting with n = 5000 and p = 50000. The number of nonzero coefficients is set to be K = 400. ... We set R = 100, σ = 1 and ρ = 0.2, 0.4 and 0.6. ... In Gra Des, ... we set sk = 1/3 ... We stop Gra Des when the residual norm is smaller than ε = nσ, or the maximum number of iterations is greater than n/2. ... In ASDAR (Algorithm 2), we set τ = 50 and we stop the iteration if the residual y Xβk is smaller than ε = nσ or k L = n/ log(n).