reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Minimax Variable Selection for Large-Scale Matrix Linear Regression Model

Authors: Meiling Hao, Lianqiang Qu, Dehan Kong, Liuquan Sun, Hongtu Zhu

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The ﬁnite sample performance of the method is examined via extensive simulation studies and a real data application from the Alzheimer s Disease Neuroimaging Initiative study is provided.
Researcher Affiliation	Academia	Meiling Hao EMAIL School of Statistics University of International Business and Economics Beijing, 100029, China; Lianqiang Qu EMAIL School of Mathematics and Statistics Central China Normal University Wuhan, Hubei, 430079, China; Dehan Kong EMAIL Department of Statistical Sciences University of Toronto Toronto, Ontario, M5G 1X6, Canada; Liuquan Sun EMAIL Institute of Applied Mathematics, Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing, 100190, China; Hongtu Zhu EMAIL Department of Biostatistics University of North Carolina at Chapel Hill Chapel Hill, North Carolina, 27599, USA
Pseudocode	Yes	Algorithm 1 (Iterative Hard-Thresholding Algorithm) Step 1. Choose an initial value for B[0], such as B[0] = 0; [...] Algorithm 2 (Smoothing Estimator) Step 1. Let m = 0, and input A, M0, M and the initial value B[0] j = ( b[0] j,sk) with b[0] j,sk = ˆbj,sk, where A is a tolerance parameter and M0 is a positive integer.
Open Source Code	No	The paper does not contain any explicit statement or link indicating that the source code for the methodology described is publicly available. It mentions using 'MACH-Admix software' but does not provide its own code.
Open Datasets	Yes	Data used in the preparation of this article were obtained from the Alzheimer s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). [...] The 1000 Genomes Project Consortium (2015) was used as a reference panel.
Dataset Splits	No	The paper describes the sample sizes for simulation studies (n = 100 and 200) and the number of subjects for the real data application (735 subjects). However, it does not specify any explicit training, validation, or test dataset splits or their percentages/counts for either the simulated or real data.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory amounts, or cloud instance types) used to run the experiments or simulations.
Software Dependencies	No	The paper mentions 'MACH-Admix software' and refers to '1000G Phase I Integrated Release Version 3 haplotypes' as a reference panel, but it does not specify version numbers for any software, libraries, or solvers used by the authors for their implementation.
Experiment Setup	Yes	We set ϱ = 10 3, ϵ = 10 3 and L = 1000 in Algorithm 1. The initial selection of λ[l] in Step 2.1 is vital to the success of the iterative hard-thresholding algorithm. [...] In practice, we set τ = [n1/5 log(n)], where [a] denotes the largest integer part of a. [...] Here we adopt cn = log{log(dn)}/3 in the simulation studies. [...] The total number of predictors is dn = 1000. The sample sizes are n = 100 and 200. We consider (p, q) = (50, 50) and (p, q) = (150, 150). [...] Here we set M = 10, M0 = 5, A = 10 3, and h = ϖn 1/5 with ϖ = 1/2.