Amortized Inference of Causal Models via Conditional Fixed-Point Iterations
Authors: Divyat Mahajan, Jannes Gladrow, Agrin Hilmkil, Cheng Zhang, Meyer Scetbon
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that our amortized procedure performs on par with baselines trained specifically for each dataset on both in and out-of-distribution problems, and also outperforms them in scarce data regimes . |
| Researcher Affiliation | Collaboration | Divyat Mahajan ,1, Jannes Gladrow2, Agrin Hilmkil2, Cheng Zhang2, Meyer Scetbon ,2 1 Mila, Université de Montréal, 2 Microsoft Research |
| Pseudocode | Yes | A.4 Pseduo Code Algorithm 1 Cond-Fi P Part 1: Dataset Encoder µ(DX, G) ... Algorithm 2 Cond-Fi P Part 2: Conditional Fixed-Point Decoder T (z, DX, G) |
| Open Source Code | Yes | 1The code is available on Github: microsoft/causica. |
| Open Datasets | Yes | We use the synthetic data generation procedure proposed by Lorch et al. (2022) to generate SCMs... We further evaluate Cond-Fi P on test datasets generated using C-Suite (Geffner et al., 2022)... experiments on real-world instances using the flow cytometry dataset (Sachs et al., 2005) and ecoli dataset (Scutari, 2010). |
| Dataset Splits | Yes | Test Datasets. We evaluate the model s generalization both in-distribution and out-of-distribution by sampling test datasets from PIN and POUT, respectively... For each SCM we generate ntest = 800 samples, split equally into task context DX and queries DX for evaluation... We split this into context Dcontext X Rncontext d and queries Dquery X Rnquery d, each of size ncontext = nquery = 400. |
| Hardware Specification | Yes | We trained Cond-Fi P on a single L40 GPU with 48GB of memory, using an effective batch size of 8 with gradient accumulation. |
| Software Dependencies | No | The paper mentions "Adam optimizer (Paszke et al., 2017)", which points to a PyTorch publication, but no explicit version numbers for PyTorch or other libraries are provided. Therefore, specific ancillary software details with versions are missing. |
| Experiment Setup | Yes | For both the dataset encoder and cond-Fi P, we set the embedding dimension to dh = 256 and the hidden dimension of MLP blocks to 512. Both of our transformer-based models contains 4 attention layers and each attention consists of 8 attention heads. The models were trained for a total of 10k epochs with the Adam optimizer (Paszke et al., 2017), where we used a learning rate of 1e 4 and a weight decay of 5e 9. Each epoch contains 400 randomly generated datasets. |