reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Top-$k$ Feature Importance Ranking

Authors: Eric Chen, Tiffany Tang, Genevera I. Allen

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical guarantees showing that RAMPART achieves the correct top-k ranking with high probability under mild conditions, and demonstrate through extensive simulation studies that RAMPART consistently outperforms popular feature importance methods, concluding with two high-dimensional genomics case studies.
Researcher Affiliation	Academia	Yuxi Chen EMAIL Carnegie Mellon University Tiffany Tang EMAIL University of Notre Dame Genevera Allen EMAIL Columbia University
Pseudocode	Yes	Algorithm 1 Ranked Attributions with Mini Patches (RAMP) Algorithm 2 Ranked Attributions with Mini Patches And Recursive Trimming (RAMPART)
Open Source Code	Yes	Our code is available at https://github.com/Data Slingers/Top K.
Open Datasets	Yes	For our first case study, we predict response to PD-0325901 (an MEK inhibitor) across N = 259 human cancer cell lines using their RNASeq gene expression profiles (M = 1104 genes) from the Cancer Cell Line Encyclopedia (CCLE) (Barretina et al., 2012) (see Appendix C.1 for details). ... The raw CCLE data used in this case study can be downloaded from the Dep Map Portal (https://depmap.org/portal/download/) (version 18Q3). We next consider a high-dimensional multi-class classification problem based on The Cancer Genome Atlas (TCGA) Breast Cancer (BRCA) cohort. Using RNA-seq data from the TCGA BRCA study (The Cancer Genome Atlas Network, 2012) and PAM50 subtype labels (Parker et al., 2009), we obtain a gene-expression matrix with N = 758 primary tumors and M = 5000 genes after preprocessing (see Appendix C.2 for details). ... We obtained the data from the TCGAbiolinks R package.
Dataset Splits	Yes	split the data into a 70/30 train-test split... For all methods, we obtain feature rankings by sorting importance scores by magnitude. ...assessed the model s test prediction performance as the top-ranked features are progressively added in the model in order of their estimated importance rankings. In Figure 2, we illustrate how classification error decreases as the top-ranked features are progressively added as predictors in the model.
Hardware Specification	Yes	Experiments were run on a 16-inch Mac Book Pro (Apple M3 Max, 48 GB RAM, macOS 14.3).
Software Dependencies	No	We employ task-appropriate prediction models: OLS and logistic regression for the linear regression and classification settings, respectively, and random forests (100 trees) for the nonlinear regression and classification tasks. Additionally, we employ neural networks across all settings... We also evaluate two popular model-agnostic approaches. First, we apply SHAP with architecture-specific variants (Linear SHAP, Tree SHAP, or Gradient SHAP)... We obtained the data from the TCGAbiolinks R package.
Experiment Setup	Yes	We construct four distinct settings: linear regression, nonlinear additive regression, linear classification, and nonlinear additive classification. ...random forests (100 trees) for the nonlinear regression and classification tasks. Additionally, we employ neural networks across all settings, configured as regressors for regression tasks and classifiers (with final sigmoidal activation) for classification tasks. All neural networks have a consistent two-layer architecture with M hidden units and Re LU activation trained to convergence. Both RAMP and RAMPART use minipatches with n = 125 observations and m = 10 features. For our experiments with M = 500 dimensions and k = 10 target features, RAMPART requires 6 halving iterations with 2000 minipatches per iteration, while RAMP uses 10000 total minipatches, maintaining comparable computational budgets. Our baseline approach uses a random forest regressor (200 trees) with Mean Decrease in Impurity (MDI). We also compute Tree SHAP values on the same random forest model and evaluate permutation importance by measuring prediction error changes over 100 random permutations on a 50/50 train-test split. For RAMP and RAMPART, we use MDI with regression trees as the minipatch ranking procedure M, with minipatch parameters m = 10 and n = 100. RAMPART uses 4000 minipatches per iteration, while RAMP used 20000 total minipatches. All hyperparameters are identical to Case Study 4.2.1, except that we replace decision-tree/random-forest regressors with their classification counterparts.