reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimality of Graphlet Screening in High Dimensional Variable Selection

Authors: Jiashun Jin, Cun-Hui Zhang, Qi Zhang

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a small-scale simulation study to investigate the numerical performance of Graphlet Screening and compare it with the lasso and the UPS. The subset selection is not included for comparison since it is computationally NP hard. We consider the experiments for both random design and ﬁxed design, where as before, the parameters (ϵp, τp) are tied to (ϑ, r) by ϵp = p ϑ and τp = p 2r log(p) (we assume σ = 1 for simplicity in this section).
Researcher Affiliation	Academia	Jiashun Jin EMAIL Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213, USA Cun-Hui Zhang EMAIL Department of Statistics Rutgers University Piscataway, NJ 08854, USA Qi Zhang EMAIL Department of Biostatistics & Medical Informatics University of Wisconsin-Madison Madison, WI 53705, USA
Pseudocode	Yes	Table 1: Graphlet Screening Algorithm. GS-step: List G ,δ-connected submodels I0,k with \|I0,1\| \|I0,2\| m0 Initialization: U p = and k = 1 Test H0 : I0,k U p against H1 : I0,k with χ2 test (8) Update: U p U p I0,k if H0 rejected, k k + 1 GC-step: As a subgraph of G ,δ, U p decomposes into many components I0 Use the L0-penalized test (9) to select a subset ˆI0 of each I0 Return the union of ˆI0 as the selected model
Open Source Code	Yes	The the presented algorithm is implemented as R-CRAN package Screen Clean and in matlab (available at http://www.stat.cmu.edu/~jiashun/Research/software/GS-matlab/).
Open Datasets	No	We conduct a small-scale simulation study to investigate the numerical performance of Graphlet Screening and compare it with the lasso and the UPS... We generate a vector b = (b1, b2, . . . , bp) such that bi iid Bernoulli(ϵp), and set β = b µ. 2. Fix κ and let n = np = pκ. Generate an n p matrix with iid rows from N(0, (1/n)Ω). 3. Generate Y N(Xβ, In), and apply the iterative Graphlet Screening, the reﬁned UPS and the lasso.
Dataset Splits	No	The paper describes generating synthetic data for simulations based on specific parameters (p, ϑ, r, µ, Ω) and repeating these simulations multiple times (e.g., 'across 40 repetitions' or 'across 40 runs'). It does not define or use explicit training/validation/test splits of a pre-existing dataset.
Hardware Specification	No	The research was supported in part by the computational resources on Pitt Grid. However, this does not specify particular CPU or GPU models, memory, or other detailed hardware specifications.
Software Dependencies	No	The presented algorithm is implemented as R-CRAN package Screen Clean and in matlab... We use glmnet package (Friedman et al., 2010) to perform lasso. The paper mentions software names (R-CRAN, Matlab, glmnet) but does not provide specific version numbers for any of them.
Experiment Setup	Yes	GS uses tuning parameters (m0, Q, ugs, vgs). We set m0 = 3 for our experiments... we set (ugs, vgs) as ( p 2 log(1/ϵp), τp)... For the whole experiment, we choose β the same as in Experiment 1, and Ω the same as in Experiment 4b. We use a ﬁxed design model in Experiment 5a-5c, and a random design model in Experiment 5d. For each sub-experiment, the results are based on 40 independent repetitions... In Experiment 5a, we choose ϑ {0.35, 0.6} and r {1.5, 3}... In Experiment 5b, we mis-specify (ϵp, τp) by a reasonably small amount... We take ϑ ϑ {0.85, 0.925, 1, 1.075, 1.15, 1.225} for the experiment.