Confidence Intervals and Hypothesis Testing for High-Dimensional Regression

Authors: Adel Javanmard, Andrea Montanari

JMLR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our method on synthetic data and a high-throughput genomic data set about riboflavin production rate, made publicly available by B uhlmann et al. (2014). Keywords: hypothesis testing, confidence intervals, LASSO, high-dimensional models, bias of an estimator. Section 5 illustrates the above results through numerical simulations both on synthetic and on real data.
Researcher Affiliation Academia Adel Javanmard EMAIL Department of Electrical Engineering Stanford University Stanford, CA 94305, USA. Andrea Montanari EMAIL Department of Electrical Engineering and Department of Statistics Stanford University Stanford, CA 94305, USA
Pseudocode Yes Algorithm 1 Unbiased estimator for θ0 in high-dimensional linear regression models. Input: Measurement vector y, design matrix X, parameters λ, µ. Output: Unbiased estimator bθu.
Open Source Code Yes In the interest of reproducibility, an R implementation of our algorithm is available at http://www.stanford.edu/~montanar/sslasso/.
Open Datasets Yes We test our method on synthetic data and a high-throughput genomic data set about riboflavin production rate, made publicly available by B uhlmann et al. (2014). As a real data example, we consider a high-throughput genomic data set concerning riboflavin (vitamin B2) production rate. This data set is made publicly available by B uhlmann et al. (2014)
Dataset Splits No The paper uses synthetic data which is generated, and a real genomic dataset (riboflavin example) with n=71 samples and p=4,088 covariates. However, it does not explicitly provide details about how these datasets were split into training, validation, or test sets for the experiments presented in the paper.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions an "R implementation of our algorithm", and refers to "R-package hdi" and "R package glmnet (Friedman et al., 2010)". However, it does not specify version numbers for R itself or for the mentioned R packages.
Experiment Setup Yes We use the regularization parameter λ = 4bσ p(2 log p)/n, where bσ is given by the scaled LASSO as per equation (31) with eλ = 10 p(2 log p)/n. Furthermore, parameter µ (cf. Equation 4) is set to µ = 2.5 p(log p)/n. This choice of µ is guided by Theorem 7 (b). Throughout, we set the significance level α = 0.05.