Amortized Inference of Causal Models via Conditional Fixed-Point Iterations

Authors: Divyat Mahajan, Jannes Gladrow, Agrin Hilmkil, Cheng Zhang, Meyer Scetbon

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that our amortized procedure performs on par with baselines trained specifically for each dataset on both in and out-of-distribution problems, and also outperforms them in scarce data regimes .
Researcher Affiliation Collaboration Divyat Mahajan ,1, Jannes Gladrow2, Agrin Hilmkil2, Cheng Zhang2, Meyer Scetbon ,2 1 Mila, Université de Montréal, 2 Microsoft Research
Pseudocode Yes A.4 Pseduo Code Algorithm 1 Cond-Fi P Part 1: Dataset Encoder µ(DX, G) ... Algorithm 2 Cond-Fi P Part 2: Conditional Fixed-Point Decoder T (z, DX, G)
Open Source Code Yes 1The code is available on Github: microsoft/causica.
Open Datasets Yes We use the synthetic data generation procedure proposed by Lorch et al. (2022) to generate SCMs... We further evaluate Cond-Fi P on test datasets generated using C-Suite (Geffner et al., 2022)... experiments on real-world instances using the flow cytometry dataset (Sachs et al., 2005) and ecoli dataset (Scutari, 2010).
Dataset Splits Yes Test Datasets. We evaluate the model s generalization both in-distribution and out-of-distribution by sampling test datasets from PIN and POUT, respectively... For each SCM we generate ntest = 800 samples, split equally into task context DX and queries DX for evaluation... We split this into context Dcontext X Rncontext d and queries Dquery X Rnquery d, each of size ncontext = nquery = 400.
Hardware Specification Yes We trained Cond-Fi P on a single L40 GPU with 48GB of memory, using an effective batch size of 8 with gradient accumulation.
Software Dependencies No The paper mentions "Adam optimizer (Paszke et al., 2017)", which points to a PyTorch publication, but no explicit version numbers for PyTorch or other libraries are provided. Therefore, specific ancillary software details with versions are missing.
Experiment Setup Yes For both the dataset encoder and cond-Fi P, we set the embedding dimension to dh = 256 and the hidden dimension of MLP blocks to 512. Both of our transformer-based models contains 4 attention layers and each attention consists of 8 attention heads. The models were trained for a total of 10k epochs with the Adam optimizer (Paszke et al., 2017), where we used a learning rate of 1e 4 and a weight decay of 5e 9. Each epoch contains 400 randomly generated datasets.