Optimal Minimax Variable Selection for Large-Scale Matrix Linear Regression Model
Authors: Meiling Hao, Lianqiang Qu, Dehan Kong, Liuquan Sun, Hongtu Zhu
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The finite sample performance of the method is examined via extensive simulation studies and a real data application from the Alzheimer s Disease Neuroimaging Initiative study is provided. |
| Researcher Affiliation | Academia | Meiling Hao EMAIL School of Statistics University of International Business and Economics Beijing, 100029, China; Lianqiang Qu EMAIL School of Mathematics and Statistics Central China Normal University Wuhan, Hubei, 430079, China; Dehan Kong EMAIL Department of Statistical Sciences University of Toronto Toronto, Ontario, M5G 1X6, Canada; Liuquan Sun EMAIL Institute of Applied Mathematics, Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing, 100190, China; Hongtu Zhu EMAIL Department of Biostatistics University of North Carolina at Chapel Hill Chapel Hill, North Carolina, 27599, USA |
| Pseudocode | Yes | Algorithm 1 (Iterative Hard-Thresholding Algorithm) Step 1. Choose an initial value for B[0], such as B[0] = 0; [...] Algorithm 2 (Smoothing Estimator) Step 1. Let m = 0, and input A, M0, M and the initial value B[0] j = ( b[0] j,sk) with b[0] j,sk = ˆbj,sk, where A is a tolerance parameter and M0 is a positive integer. |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating that the source code for the methodology described is publicly available. It mentions using 'MACH-Admix software' but does not provide its own code. |
| Open Datasets | Yes | Data used in the preparation of this article were obtained from the Alzheimer s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). [...] The 1000 Genomes Project Consortium (2015) was used as a reference panel. |
| Dataset Splits | No | The paper describes the sample sizes for simulation studies (n = 100 and 200) and the number of subjects for the real data application (735 subjects). However, it does not specify any explicit training, validation, or test dataset splits or their percentages/counts for either the simulated or real data. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory amounts, or cloud instance types) used to run the experiments or simulations. |
| Software Dependencies | No | The paper mentions 'MACH-Admix software' and refers to '1000G Phase I Integrated Release Version 3 haplotypes' as a reference panel, but it does not specify version numbers for any software, libraries, or solvers used by the authors for their implementation. |
| Experiment Setup | Yes | We set ϱ = 10 3, ϵ = 10 3 and L = 1000 in Algorithm 1. The initial selection of λ[l] in Step 2.1 is vital to the success of the iterative hard-thresholding algorithm. [...] In practice, we set τ = [n1/5 log(n)], where [a] denotes the largest integer part of a. [...] Here we adopt cn = log{log(dn)}/3 in the simulation studies. [...] The total number of predictors is dn = 1000. The sample sizes are n = 100 and 200. We consider (p, q) = (50, 50) and (p, q) = (150, 150). [...] Here we set M = 10, M0 = 5, A = 10 3, and h = ϖn 1/5 with ϖ = 1/2. |