Optimal Minimax Variable Selection for Large-Scale Matrix Linear Regression Model

Authors: Meiling Hao, Lianqiang Qu, Dehan Kong, Liuquan Sun, Hongtu Zhu

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The finite sample performance of the method is examined via extensive simulation studies and a real data application from the Alzheimer s Disease Neuroimaging Initiative study is provided.
Researcher Affiliation Academia Meiling Hao EMAIL School of Statistics University of International Business and Economics Beijing, 100029, China; Lianqiang Qu EMAIL School of Mathematics and Statistics Central China Normal University Wuhan, Hubei, 430079, China; Dehan Kong EMAIL Department of Statistical Sciences University of Toronto Toronto, Ontario, M5G 1X6, Canada; Liuquan Sun EMAIL Institute of Applied Mathematics, Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing, 100190, China; Hongtu Zhu EMAIL Department of Biostatistics University of North Carolina at Chapel Hill Chapel Hill, North Carolina, 27599, USA
Pseudocode Yes Algorithm 1 (Iterative Hard-Thresholding Algorithm) Step 1. Choose an initial value for B[0], such as B[0] = 0; [...] Algorithm 2 (Smoothing Estimator) Step 1. Let m = 0, and input A, M0, M and the initial value B[0] j = ( b[0] j,sk) with b[0] j,sk = ˆbj,sk, where A is a tolerance parameter and M0 is a positive integer.
Open Source Code No The paper does not contain any explicit statement or link indicating that the source code for the methodology described is publicly available. It mentions using 'MACH-Admix software' but does not provide its own code.
Open Datasets Yes Data used in the preparation of this article were obtained from the Alzheimer s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). [...] The 1000 Genomes Project Consortium (2015) was used as a reference panel.
Dataset Splits No The paper describes the sample sizes for simulation studies (n = 100 and 200) and the number of subjects for the real data application (735 subjects). However, it does not specify any explicit training, validation, or test dataset splits or their percentages/counts for either the simulated or real data.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory amounts, or cloud instance types) used to run the experiments or simulations.
Software Dependencies No The paper mentions 'MACH-Admix software' and refers to '1000G Phase I Integrated Release Version 3 haplotypes' as a reference panel, but it does not specify version numbers for any software, libraries, or solvers used by the authors for their implementation.
Experiment Setup Yes We set ϱ = 10 3, ϵ = 10 3 and L = 1000 in Algorithm 1. The initial selection of λ[l] in Step 2.1 is vital to the success of the iterative hard-thresholding algorithm. [...] In practice, we set τ = [n1/5 log(n)], where [a] denotes the largest integer part of a. [...] Here we adopt cn = log{log(dn)}/3 in the simulation studies. [...] The total number of predictors is dn = 1000. The sample sizes are n = 100 and 200. We consider (p, q) = (50, 50) and (p, q) = (150, 150). [...] Here we set M = 10, M0 = 5, A = 10 3, and h = ϖn 1/5 with ϖ = 1/2.