A minimax optimal approach to high-dimensional double sparse linear regression
Authors: Yanhang Zhang, Zhifan Li, Shixiang Liu, Jianxin Yin
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we demonstrate the superiority of our method by comparing it with several state-of-the-art algorithms on both synthetic and real-world datasets. Keywords: double sparsity, iterative hard thresholding, minimax optimality, fully adaptive procedure, oracle estimation rate. |
| Researcher Affiliation | Academia | Yanhang Zhang EMAIL School of Statistics, Renmin University of China 100872 Beijing, China Zhifan Li EMAIL Beijing Institute of Mathematical Sciences and Applications 101408 Beijing, China Shixiang Liu liushixiang EMAIL School of Statistics, Renmin University of China 100872 Beijing, China Jianxin Yin EMAIL Center for Applied Statistics and School of Statistics, Renmin University of China 100872 Beijing, China |
| Pseudocode | Yes | Algorithm 1 Double Sparse IHT (DSIHT) algorithm with known s, s0 and σ. Algorithm 2 Double Sparse IHT (DSIHT) algorithm with known s0 Algorithm 3 Adaptive Double Sparse IHT (ADSIHT) algorithm |
| Open Source Code | Yes | We have implemented our proposals in an open-source R package named ADSIHT. |
| Open Datasets | Yes | The TRIM32 dataset, which pertains to the Bardet-Biedl syndrome gene expression, was initially presented by Scheetz et al. (2006) and has been extensively studied in various statistical works (Huang et al., 2010; Fan et al., 2011; Zhang et al., 2023). |
| Dataset Splits | Yes | In our analysis, the 120 rats are randomly split into a training set with 100 samples and a test set with the remaining 20 samples. We repeat these random splitting procedures 200 times and compute the average of the numbers of selected variables and groups and the prediction mean square error (PMSE) in the test set. |
| Hardware Specification | Yes | All numerical experiments are conducted in R and executed on a personal laptop (AMD Ryzen 9 5900HX, 3.30 GHz, 16.00GB of RAM). |
| Software Dependencies | No | Our algorithms are implemented in R package ADSIHT. We compare against several state-of-the-art methods: sparse group Lasso (SGLasso, Simon et al. (2013)), which is fitted by R package sparsegl (Liang et al., 2024), group bridge (GBridge, Huang et al. (2009)), group exponential Lasso (GEL, Breheny (2015)) and composite minimax concave penalty (CMCP, Breheny and Huang (2009)), which are computed by R package grpreg (Breheny, 2015). |
| Experiment Setup | Yes | For SGLasso, we determine the tuning parameter by five-fold cross-validation. For the other comparison methods, we select the optimal solution using EBIC (Chen and Chen, 2008). For ADSIHT, we use our proposed DSIC with K = 5 to select the optimal model. Moreover, we leave the remaining hyper-parameters to their default values in sparsegl and grpreg. |