Spectral Deconfounding via Perturbed Sparse Linear Models

Authors: Domagoj Ćevid, Peter Bühlmann, Nicolai Meinshausen

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The performance of the methodology is illustrated on simulated data and a genomic dataset. Keywords: confounding, data transformation, Lasso, latent variables, principal components
Researcher Affiliation Academia Domagoj Cevid EMAIL Seminar f ur Statistik ETH Z urich 8092 Z urich, Switzerland Peter B uhlmann EMAIL Seminar f ur Statistik ETH Z urich 8092 Z urich, Switzerland Nicolai Meinshausen EMAIL Seminar f ur Statistik ETH Z urich 8092 Z urich, Switzerland
Pseudocode No The paper discusses various algorithms (Lasso, Lava, Trim transform, PCA adjustment) and their theoretical properties and empirical performance, but it does not present any of them in a structured pseudocode or algorithm block format.
Open Source Code No The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets Yes We have obtained data from the GTEx Portal (http://gtexportal.org). The GTEx project provides large-scale data with an aim to help the scientific community to study gene expression, gene regulation and their relationship to genetic variation.
Dataset Splits No For the simulated data, the paper specifies parameters like 'sample size is set to be n = 200 and the dimensionality of the predictors is p = 600', but it does not specify how this data is split into training, validation, or test sets. For the genomic dataset, it describes applying methods to the original and deconfounded data and measuring dissimilarity of supports, but no standard dataset splits are mentioned.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions statistical methods and models but does not specify any software names with version numbers (e.g., Python, R, specific libraries, or frameworks with their versions) that were used for implementation.
Experiment Setup Yes We take ΣE = σ2 EIp, where σE = 2 and β = (1, 1, 1, 1, 1, 0, . . . , 0), so s = 5. For a fixed number q of hidden confounders, we sample the coefficients Γij and δi independently as standard normal random variables. By default, we take q = 6. Unless stated otherwise, we use the noise level σ = 1 as the standard deviation of ϵ. Finally, the sample size is set to be n = 200 and the dimensionality of the predictors is p = 600 as the default value. All results are based on N = 212 = 4096 independent simulations.