Optimality of Graphlet Screening in High Dimensional Variable Selection
Authors: Jiashun Jin, Cun-Hui Zhang, Qi Zhang
JMLR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a small-scale simulation study to investigate the numerical performance of Graphlet Screening and compare it with the lasso and the UPS. The subset selection is not included for comparison since it is computationally NP hard. We consider the experiments for both random design and fixed design, where as before, the parameters (ϵp, τp) are tied to (ϑ, r) by ϵp = p ϑ and τp = p 2r log(p) (we assume σ = 1 for simplicity in this section). |
| Researcher Affiliation | Academia | Jiashun Jin EMAIL Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213, USA Cun-Hui Zhang EMAIL Department of Statistics Rutgers University Piscataway, NJ 08854, USA Qi Zhang EMAIL Department of Biostatistics & Medical Informatics University of Wisconsin-Madison Madison, WI 53705, USA |
| Pseudocode | Yes | Table 1: Graphlet Screening Algorithm. GS-step: List G ,δ-connected submodels I0,k with |I0,1| |I0,2| m0 Initialization: U p = and k = 1 Test H0 : I0,k U p against H1 : I0,k with χ2 test (8) Update: U p U p I0,k if H0 rejected, k k + 1 GC-step: As a subgraph of G ,δ, U p decomposes into many components I0 Use the L0-penalized test (9) to select a subset ˆI0 of each I0 Return the union of ˆI0 as the selected model |
| Open Source Code | Yes | The the presented algorithm is implemented as R-CRAN package Screen Clean and in matlab (available at http://www.stat.cmu.edu/~jiashun/Research/software/GS-matlab/). |
| Open Datasets | No | We conduct a small-scale simulation study to investigate the numerical performance of Graphlet Screening and compare it with the lasso and the UPS... We generate a vector b = (b1, b2, . . . , bp) such that bi iid Bernoulli(ϵp), and set β = b µ. 2. Fix κ and let n = np = pκ. Generate an n p matrix with iid rows from N(0, (1/n)Ω). 3. Generate Y N(Xβ, In), and apply the iterative Graphlet Screening, the refined UPS and the lasso. |
| Dataset Splits | No | The paper describes generating synthetic data for simulations based on specific parameters (p, ϑ, r, µ, Ω) and repeating these simulations multiple times (e.g., 'across 40 repetitions' or 'across 40 runs'). It does not define or use explicit training/validation/test splits of a pre-existing dataset. |
| Hardware Specification | No | The research was supported in part by the computational resources on Pitt Grid. However, this does not specify particular CPU or GPU models, memory, or other detailed hardware specifications. |
| Software Dependencies | No | The presented algorithm is implemented as R-CRAN package Screen Clean and in matlab... We use glmnet package (Friedman et al., 2010) to perform lasso. The paper mentions software names (R-CRAN, Matlab, glmnet) but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | GS uses tuning parameters (m0, Q, ugs, vgs). We set m0 = 3 for our experiments... we set (ugs, vgs) as ( p 2 log(1/ϵp), τp)... For the whole experiment, we choose β the same as in Experiment 1, and Ω the same as in Experiment 4b. We use a fixed design model in Experiment 5a-5c, and a random design model in Experiment 5d. For each sub-experiment, the results are based on 40 independent repetitions... In Experiment 5a, we choose ϑ {0.35, 0.6} and r {1.5, 3}... In Experiment 5b, we mis-specify (ϵp, τp) by a reasonably small amount... We take ϑ ϑ {0.85, 0.925, 1, 1.075, 1.15, 1.225} for the experiment. |