Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Modeling Causal Mechanisms with Diffusion Models for Interventional and Counterfactual Queries

Authors: Patrick Chao, Patrick Blöbaum, Sapan Kirit Patel, Shiva Kasiviswanathan

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical evaluations demonstrate significant improvements over existing state-of-the-art methods for answering causal queries. Furthermore, we provide theoretical results that offer a methodology for analyzing counterfactual estimation in general encoder-decoder models, which could be useful in settings beyond our proposed approach. ... We evaluate the performance of DCM on a range of synthetic datasets generated with various structural equation types for all three forms of causal queries. We find that DCM consistently outperforms existing stateof-the-art methods (Sánchez-Martin et al., 2022; Khemakhem et al., 2021). Furthermore, we provide theoretical results that offer a methodology for analyzing counterfactual estimation in general encoder-decoder models, which could be useful in settings beyond our proposed approach.
Researcher Affiliation Collaboration Patrick Chao EMAIL University of Pennsylvania Patrick Blöbaum EMAIL Amazon Sapan Patel EMAIL Amazon Shiva Prasad Kasiviswanathan EMAIL Amazon
Pseudocode Yes Algorithm 1 DCM Training Algorithm 2 Enci(Xi, Xpai) Algorithm 3 Deci(Zi, Xpai) Algorithm 4 Observational/Interventional Sampling Algorithm 5 Counterfactual Estimation
Open Source Code Yes Code for reproducibility: https://github.com/patrickrchao/Diffusion Based Causal Models
Open Datasets Yes Our empirical evaluations demonstrate significant improvements over existing state-of-the-art methods for answering causal queries. Furthermore, we provide theoretical results that offer a methodology for analyzing counterfactual estimation in general encoder-decoder models, which could be useful in settings beyond our proposed approach. ... We evaluate the performance of DCM on a range of synthetic datasets generated with various structural equation types for all three forms of causal queries. ... on an interventional query experiment conducted on f MRI data. ... We evaluate DCM on interventional real world data by evaluating our model on the electrical stimulation interventional f MRI data from (Thompson et al., 2020), using the experimental setup from (Khemakhem et al., 2021). ... To further evaluate the effectiveness of DCM, we explore a semi-synthetic experiment based on the Sachs dataset (Sachs et al., 2005). ... Infant Health and Development Program Dataset. The dataset aims at predicting the effect of specialized childcare on cognitive test scores of infants (Hill, 2011). ... Lalonde Dataset. The dataset contains different demographic variables (e.g., gender, age, education etc.) with the goal to see if training programs increase earnings (La Londe, 1986).
Dataset Splits No Each simulation generates n = 5000 samples as training data. ... We generate 1000 samples from both the fitted and true observational distribution and report the MMD between the two. ... For each intervention do(Xi := γj), we generate 100 values from the fitted model and true causal model, ˆX and X for the samples from the fitted model and true model respectively. ... Similarly to interventional evaluation, we consider interventions of individual nodes and for node i, we choose 20 intervention values γ1, . . . , γ20, linearly interpolating between the 10% and 90% quantiles of the marginal distribution of node i to represent realistic interventions. Then for each intervention do(Xi := γj), we generate 100 nonintervened factual samples x F.
Hardware Specification No We present the training times in minutes for one seed on the ladder graph using the default implementation and parameters. For a fair comparison, these are all evaluated on a CPU. Note that ANM is the fastest as it uses standard regression models, and our proposed DCM approach is about 7-9x faster than CAREFL and VACA.
Software Dependencies No The paper mentions using specific models and implementations like "εθ model in DCM", "Adam" optimizer, "VACA", "CAREFL", and "ANM" with references to their respective papers or default implementations, but does not provide specific version numbers for software libraries, programming languages, or environments (e.g., Python version, PyTorch/TensorFlow version, CUDA version).
Experiment Setup Yes For our implementation of the εθ model in DCM, we use a simple fully connected neural network with three hidden layers of size [128, 256, 256] and Si LU activation (Elfwing et al., 2018). We fit the model using Adam with a learning rate of 1e-4, batch size of 64, and train for 500 epochs. ... For DCM, we use T = 100 total time steps with a linear βt schedule interpolating between 1e-4 and 0.1, or βt = 0.1 10 4 t 1 for t [T]. ... For VACA, we use the default implementation, training for 500 epochs, with a learning rate of 0.005, and the encoder and decoder have hidden dimensions of size [8, 8] and [16] respectively, a latent vector dimension of 4, and a parent dropout rate of 0.2. ... For CAREFL, we also use the default implementation ... training for 500 epochs with a learning rate of 0.005, four flows, and ten hidden units.