reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generative Intervention Models for Causal Perturbation Modeling

Authors: Nora Schneider, Lars Lorch, Niki Kilbertus, Bernhard Schölkopf, Andreas Krause

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On synthetic data and sc RNA-seq drug perturbation data, GIMs achieve robust out-of-distribution predictions on par with unstructured approaches, while effectively inferring the underlying perturbation mechanisms, often better than other causal inference methods.
Researcher Affiliation	Academia	1Technical University of Munich and Helmholtz Munich, Germany 2Department of Computer Science, ETH Zurich, Switzerland 3Munich Center for Machine Learning, Germany 4MPI for Intelligent Systems, T ubingen, Germany 5ELLIS Institute T ubingen, Germany. Correspondence to: Nora Schneider <EMAIL>.
Pseudocode	No	The paper describes methods and derivations mathematically but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/Nora Schneider/gim
Open Datasets	Yes	Sci Plex3 drug perturbation data We also evaluate the predictive performance of GIMs on sc RNA-seq data by Srivatsan et al. (2020).
Dataset Splits	Yes	The training dataset consists of a total of 160 perturbational datasets (corresponding to 40 Hill functions considered at 4 different dosages), where each one has nk = 50 samples, and one observational dataset with n0 = 800 samples. The partially out-of-distribution test dataset consists of 200 perturbational contexts corresponding to the same 40 hill functions and intervention targets from the training dataset, but evaluated at 5 different dosages, c {0.25, 0.75, 1.25, 1.75, 2.25}. Finally, the fully out-of-distribution test dataset consists of 80 perturbational contexts corresponding to 20 newly sampled hill functions, which are evaluated at dosages c {0.5, 1, 1.5, 2}. ... For evaluation, we create test sets by holding out, one at a time, the highest dosage (10µM) of each drug, resulting in four unique training-test splits per cell type.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies	No	The paper mentions several software components and methods (e.g., Adam optimization, Gumbel-sigmoid, NO-BEARS, OTT library) but does not provide specific version numbers for any of them.
Experiment Setup	Yes	We employ gradient-based optimization to obtain MAP estimates for Mα and ϕ. In Appendix B.2 we provided the gradients of the posterior allowing us to use Adam optimization (Kingma & Ba, 2015) with a learning rate of 0.001. On synthetic data, we use 30000 steps and on drug perturbation data, we use 100000 steps. For the Monte Carlo approximations, we use a sample size of n MC = 128. We apply a cosine annealing schedule to the coefficient, βI, which controls the sparsity of the intervention targets. ... Table 1 and Table 2 list hyperparameter search ranges for the experiments.