Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Meta-analysis of heterogeneous data: integrative sparse regression in high-dimensions

Authors: Subha Maity, Yuekai Sun, Moulinath Banerjee

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we compare the performance of Mr Lasso to several other global parameters considered in Section 2 on simulated data: the Mean (1.2), the Median (2.1), the square and absolute error trade-offin (2.5) and the Huber estimator in (2.2). The datasets are generated from a linear model with d = 2000 covariates. ... The Cancer Cell Line Encyclopedia is a database of gene expression, genotype, and drug sensitivity data for human cancer cell lines. We use our method to study the sensitivity of cancer cell lines to certain anti-cancer drugs. ... The lower left panels in Figures 6, 7 and 8 give comparative plots for the prediction accuracy of drug-response of the three global parameter estimates (for the three drugs).
Researcher Affiliation Academia Subha Maity EMAIL Yuekai Sun EMAIL Moulinath Banerjee EMAIL Department of Statistics University of Michigan Ann Arbor, MI
Pseudocode Yes Algorithm 1: Mr Lasso({ηj}, t, {tk}) ... Algorithm 2: Cross-validation
Open Source Code Yes 1. Codes are available in https://github.com/smaityumich/Mr Lasso.
Open Datasets Yes The Cancer Cell Line Encyclopedia is a database of gene expression, genotype, and drug sensitivity data for human cancer cell lines. ... As covariates, we use the RNAseq TPM gene expression data Dep Map (2021) for just protein coding genes (Dep Map 21Q2 Public release) and the pharmacologic profiles for 24 anticancer drugs Consortium et al. (2015) across 504 across lines as the responses. ...These data files are publicly available in Dep Map portal3 3. Depmap portal: https://depmap.org/portal/
Dataset Splits Yes In the simulation, we hold out 1/5 of the dataset as a validation set and we pick those parameters to minimize test error on the validation set. ... The regularization parameter for each of these cancer types is chosen using 7 fold cross-validation.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It discusses computational results but does not specify the underlying hardware.
Software Dependencies No The paper does not explicitly provide specific software dependencies with version numbers. While it mentions methods like 'lasso' and 'logistic regression', it does not list the specific software packages or libraries and their versions used for implementation.
Experiment Setup Yes The datasets are generated from a linear model with d = 2000 covariates. ... The regularization parameter for each of these cancer types is chosen using 7 fold cross-validation. ... To this end the appropriate pair (η, t) is again chosen via cross validation (see the discussion in last paragraph of Section 3).