reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Meta-analysis of heterogeneous data: integrative sparse regression in high-dimensions

Authors: Subha Maity, Yuekai Sun, Moulinath Banerjee

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we compare the performance of Mr Lasso to several other global parameters considered in Section 2 on simulated data: the Mean (1.2), the Median (2.1), the square and absolute error trade-oﬀin (2.5) and the Huber estimator in (2.2). The datasets are generated from a linear model with d = 2000 covariates. ... The Cancer Cell Line Encyclopedia is a database of gene expression, genotype, and drug sensitivity data for human cancer cell lines. We use our method to study the sensitivity of cancer cell lines to certain anti-cancer drugs. ... The lower left panels in Figures 6, 7 and 8 give comparative plots for the prediction accuracy of drug-response of the three global parameter estimates (for the three drugs).
Researcher Affiliation	Academia	Subha Maity EMAIL Yuekai Sun EMAIL Moulinath Banerjee EMAIL Department of Statistics University of Michigan Ann Arbor, MI
Pseudocode	Yes	Algorithm 1: Mr Lasso({ηj}, t, {tk}) ... Algorithm 2: Cross-validation
Open Source Code	Yes	1. Codes are available in https://github.com/smaityumich/Mr Lasso.
Open Datasets	Yes	The Cancer Cell Line Encyclopedia is a database of gene expression, genotype, and drug sensitivity data for human cancer cell lines. ... As covariates, we use the RNAseq TPM gene expression data Dep Map (2021) for just protein coding genes (Dep Map 21Q2 Public release) and the pharmacologic proﬁles for 24 anticancer drugs Consortium et al. (2015) across 504 across lines as the responses. ...These data ﬁles are publicly available in Dep Map portal3 3. Depmap portal: https://depmap.org/portal/
Dataset Splits	Yes	In the simulation, we hold out 1/5 of the dataset as a validation set and we pick those parameters to minimize test error on the validation set. ... The regularization parameter for each of these cancer types is chosen using 7 fold cross-validation.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It discusses computational results but does not specify the underlying hardware.
Software Dependencies	No	The paper does not explicitly provide specific software dependencies with version numbers. While it mentions methods like 'lasso' and 'logistic regression', it does not list the specific software packages or libraries and their versions used for implementation.
Experiment Setup	Yes	The datasets are generated from a linear model with d = 2000 covariates. ... The regularization parameter for each of these cancer types is chosen using 7 fold cross-validation. ... To this end the appropriate pair (η, t) is again chosen via cross validation (see the discussion in last paragraph of Section 3).