reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Spectral Deconfounding via Perturbed Sparse Linear Models

Authors: Domagoj Ćevid, Peter Bühlmann, Nicolai Meinshausen

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The performance of the methodology is illustrated on simulated data and a genomic dataset. Keywords: confounding, data transformation, Lasso, latent variables, principal components
Researcher Affiliation	Academia	Domagoj Cevid EMAIL Seminar f ur Statistik ETH Z urich 8092 Z urich, Switzerland Peter B uhlmann EMAIL Seminar f ur Statistik ETH Z urich 8092 Z urich, Switzerland Nicolai Meinshausen EMAIL Seminar f ur Statistik ETH Z urich 8092 Z urich, Switzerland
Pseudocode	No	The paper discusses various algorithms (Lasso, Lava, Trim transform, PCA adjustment) and their theoretical properties and empirical performance, but it does not present any of them in a structured pseudocode or algorithm block format.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets	Yes	We have obtained data from the GTEx Portal (http://gtexportal.org). The GTEx project provides large-scale data with an aim to help the scientiﬁc community to study gene expression, gene regulation and their relationship to genetic variation.
Dataset Splits	No	For the simulated data, the paper specifies parameters like 'sample size is set to be n = 200 and the dimensionality of the predictors is p = 600', but it does not specify how this data is split into training, validation, or test sets. For the genomic dataset, it describes applying methods to the original and deconfounded data and measuring dissimilarity of supports, but no standard dataset splits are mentioned.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions statistical methods and models but does not specify any software names with version numbers (e.g., Python, R, specific libraries, or frameworks with their versions) that were used for implementation.
Experiment Setup	Yes	We take ΣE = σ2 EIp, where σE = 2 and β = (1, 1, 1, 1, 1, 0, . . . , 0), so s = 5. For a ﬁxed number q of hidden confounders, we sample the coeﬃcients Γij and δi independently as standard normal random variables. By default, we take q = 6. Unless stated otherwise, we use the noise level σ = 1 as the standard deviation of ϵ. Finally, the sample size is set to be n = 200 and the dimensionality of the predictors is p = 600 as the default value. All results are based on N = 212 = 4096 independent simulations.