Sample, estimate, aggregate: A recipe for causal discovery foundation models
Authors: Menghua Wu, Yujia Bao, Regina Barzilay, Tommi Jaakkola
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on biological and synthetic data confirm that this model generalizes well beyond its training set, runs on graphs with hundreds of variables in seconds, and can be easily adapted to different underlying data assumptions. |
| Researcher Affiliation | Collaboration | Menghua Wu rmwu{at}mit.edu Department of Computer Science, Massachusetts Institute of Technology Yujia Bao yujia.bao{at}accenture.com Center for Advanced AI, Accenture Regina Barzilay regina{at}csail.mit.edu Department of Computer Science, Massachusetts Institute of Technology Tommi S. Jaakkola tommi{at}csail.mit.edu Department of Computer Science, Massachusetts Institute of Technology |
| Pseudocode | Yes | Algorithm 1 Resolve marginal estimates of f F 1: Input: Data DG faithful to G 2: Initialize E KN as the complete undirected graph on N nodes. 3: for S Sd+2 do 4: Compute E S = f(DG[S]) 5: for (i, j) E S do 6: Remove (i, j) from E 8: end for 9: for E S {E S}Sd+2 do 10: for v-structure i j k in E S do 11: if {i, j}, {j, k} E and {i, k} E then 12: Assign orientation i j k in E 14: end for 15: end for 16: Propagate orientations in E (optional). |
| Open Source Code | Yes | 1Our code is available at https://github.com/rmwu/sea. |
| Open Datasets | Yes | We pretrained Sea models on 6,480 synthetic datasets... To assess generalization and robustness, we evaluate on unseen in-distribution and out-of-distribution synthetic datasets, as well as two real biological datasets (Sachs et al., 2005; Replogle et al., 2022), using the versions from Wang et al. (2017); Chevalley et al. (2025). |
| Dataset Splits | Yes | We generated 90 training, 5 validation, and 5 testing datasets for each combination. |
| Hardware Specification | Yes | The models were trained across 2 NVIDIA RTX A6000 GPUs and 60 CPU cores. |
| Software Dependencies | No | Observational algorithm implementations were provided by the causal-learn library (Zheng et al., 2024). |
| Experiment Setup | Yes | Our model was implemented with 4 layers with 8 attention heads and hidden dimension 64. Our model was trained using the Adam W optimizer with a learning rate of 1e-4 (Loshchilov et al., 2017). See B.4 for additional details about hyperparameters. |