reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A minimax optimal approach to high-dimensional double sparse linear regression

Authors: Yanhang Zhang, Zhifan Li, Shixiang Liu, Jianxin Yin

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we demonstrate the superiority of our method by comparing it with several state-of-the-art algorithms on both synthetic and real-world datasets. Keywords: double sparsity, iterative hard thresholding, minimax optimality, fully adaptive procedure, oracle estimation rate.
Researcher Affiliation	Academia	Yanhang Zhang EMAIL School of Statistics, Renmin University of China 100872 Beijing, China Zhifan Li EMAIL Beijing Institute of Mathematical Sciences and Applications 101408 Beijing, China Shixiang Liu liushixiang EMAIL School of Statistics, Renmin University of China 100872 Beijing, China Jianxin Yin EMAIL Center for Applied Statistics and School of Statistics, Renmin University of China 100872 Beijing, China
Pseudocode	Yes	Algorithm 1 Double Sparse IHT (DSIHT) algorithm with known s, s0 and σ. Algorithm 2 Double Sparse IHT (DSIHT) algorithm with known s0 Algorithm 3 Adaptive Double Sparse IHT (ADSIHT) algorithm
Open Source Code	Yes	We have implemented our proposals in an open-source R package named ADSIHT.
Open Datasets	Yes	The TRIM32 dataset, which pertains to the Bardet-Biedl syndrome gene expression, was initially presented by Scheetz et al. (2006) and has been extensively studied in various statistical works (Huang et al., 2010; Fan et al., 2011; Zhang et al., 2023).
Dataset Splits	Yes	In our analysis, the 120 rats are randomly split into a training set with 100 samples and a test set with the remaining 20 samples. We repeat these random splitting procedures 200 times and compute the average of the numbers of selected variables and groups and the prediction mean square error (PMSE) in the test set.
Hardware Specification	Yes	All numerical experiments are conducted in R and executed on a personal laptop (AMD Ryzen 9 5900HX, 3.30 GHz, 16.00GB of RAM).
Software Dependencies	No	Our algorithms are implemented in R package ADSIHT. We compare against several state-of-the-art methods: sparse group Lasso (SGLasso, Simon et al. (2013)), which is fitted by R package sparsegl (Liang et al., 2024), group bridge (GBridge, Huang et al. (2009)), group exponential Lasso (GEL, Breheny (2015)) and composite minimax concave penalty (CMCP, Breheny and Huang (2009)), which are computed by R package grpreg (Breheny, 2015).
Experiment Setup	Yes	For SGLasso, we determine the tuning parameter by five-fold cross-validation. For the other comparison methods, we select the optimal solution using EBIC (Chen and Chen, 2008). For ADSIHT, we use our proposed DSIC with K = 5 to select the optimal model. Moreover, we leave the remaining hyper-parameters to their default values in sparsegl and grpreg.