Confidence Intervals and Hypothesis Testing for High-Dimensional Regression
Authors: Adel Javanmard, Andrea Montanari
JMLR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our method on synthetic data and a high-throughput genomic data set about riboflavin production rate, made publicly available by B uhlmann et al. (2014). Keywords: hypothesis testing, confidence intervals, LASSO, high-dimensional models, bias of an estimator. Section 5 illustrates the above results through numerical simulations both on synthetic and on real data. |
| Researcher Affiliation | Academia | Adel Javanmard EMAIL Department of Electrical Engineering Stanford University Stanford, CA 94305, USA. Andrea Montanari EMAIL Department of Electrical Engineering and Department of Statistics Stanford University Stanford, CA 94305, USA |
| Pseudocode | Yes | Algorithm 1 Unbiased estimator for θ0 in high-dimensional linear regression models. Input: Measurement vector y, design matrix X, parameters λ, µ. Output: Unbiased estimator bθu. |
| Open Source Code | Yes | In the interest of reproducibility, an R implementation of our algorithm is available at http://www.stanford.edu/~montanar/sslasso/. |
| Open Datasets | Yes | We test our method on synthetic data and a high-throughput genomic data set about riboflavin production rate, made publicly available by B uhlmann et al. (2014). As a real data example, we consider a high-throughput genomic data set concerning riboflavin (vitamin B2) production rate. This data set is made publicly available by B uhlmann et al. (2014) |
| Dataset Splits | No | The paper uses synthetic data which is generated, and a real genomic dataset (riboflavin example) with n=71 samples and p=4,088 covariates. However, it does not explicitly provide details about how these datasets were split into training, validation, or test sets for the experiments presented in the paper. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions an "R implementation of our algorithm", and refers to "R-package hdi" and "R package glmnet (Friedman et al., 2010)". However, it does not specify version numbers for R itself or for the mentioned R packages. |
| Experiment Setup | Yes | We use the regularization parameter λ = 4bσ p(2 log p)/n, where bσ is given by the scaled LASSO as per equation (31) with eλ = 10 p(2 log p)/n. Furthermore, parameter µ (cf. Equation 4) is set to µ = 2.5 p(log p)/n. This choice of µ is guided by Theorem 7 (b). Throughout, we set the significance level α = 0.05. |