reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Counterfactual Generative Modeling with Variational Causal Inference

Authors: Yulun Wu, Louis McConnell, Claudia Iriondo

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, we demonstrate the advantage of our framework compared to state-of-the-art models in counterfactual generative modeling on multiple benchmarks. We present experiment results of our framework on two datasets with vector outcomes single cell perturbation datasets, as well as datasets with image outcomes facial imaging and handwritten digits datasets, and compared to state-of-the-art models in the two domains. The results show that ours outperformed state-of-the-arts in both domains with notable margins.
Researcher Affiliation	Collaboration	Yulun Wu University of California, Berkeley EMAIL Louie Mc Connell Genentech EMAIL Claudia Iriondo Genentech EMAIL
Pseudocode	No	The paper describes its methodology using mathematical formulations and prose, and includes figures illustrating workflows. However, there are no explicitly labeled pseudocode blocks or algorithms with structured steps in the provided text.
Open Source Code	Yes	For complete information on hyperparameter settings, see our codebase at https://github.com/yulun-rayn/variational-causal-inference.
Open Datasets	Yes	We present experiment results of our framework on two datasets with vector outcomes (sci-Plex dataset from Srivatsan et al. (2020) (Sciplex) and the CRISPRa dataset from Schmidt et al. (2022) (Marson)) and two datasets with image outcomes: Morpho-MNIST (Castro et al., 2019) and Celeb AHQ (Karras et al., 2017).
Dataset Splits	Yes	Data with certain treatment-covariate combinations are held out as the out-of-distribution (OOD) set and the rest are split into training and validation set with a four-to-one ratio. ... We used the original Morpho-MNIST training set for model training, and the original Morpho MNIST testing set as the observed samples for model testing. ... The train-test split of the Celeb A-HQ dataset is inherited from the original Celeb A dataset.
Hardware Specification	Yes	Models are trained on Amazon web services accelerated computing EC2 instance G4dn which contains 2nd Generation Intel Xeon Scalable Processors (Cascade Lake P-8259CL) and up to 8 NVIDIA T4 Tensor Core GPUs. ... Models are trained on Amazon web services accelerated computing EC2 instance P3 which contains high frequency Intel Xeon Scalable Processor (Broadwell E5-2686 v4) and up to 8 NVIDIA Tesla V100 GPUs, each pairing 5,120 CUDA Cores and 640 Tensor Cores.
Software Dependencies	No	The paper does not explicitly list specific software dependencies with version numbers. It mentions a codebase link where such information might be available, but the text itself lacks these details.
Experiment Setup	Yes	All common hyperparameters of all models are set to the same as the defaults of CPA (Lotfollahi et al., 2021): an universal number of hidden dimensions 128; number of layers 6 (encoder 3, decoder 3); an universal learning rate 3 × 10−4, weight decay rate 4 × 10−7. Contrary to CPA, we use step-based learning rate decay instead of epoch-based learning rate decay, and decay step size is set to 400,000 while decay rate remains the same at 0.1. Batch size is 64 for Marson and 128 for Sciplex. ... Training is conducted with a batch size of 32, an universal learning rate 1 × 10−4, and weight decay rate 4 × 10−5. Learning rate decays at epoch 100 and linearly decays to 0 at epoch 200.