Sample, estimate, aggregate: A recipe for causal discovery foundation models

Authors: Menghua Wu, Yujia Bao, Regina Barzilay, Tommi Jaakkola

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on biological and synthetic data confirm that this model generalizes well beyond its training set, runs on graphs with hundreds of variables in seconds, and can be easily adapted to different underlying data assumptions.
Researcher Affiliation Collaboration Menghua Wu rmwu{at}mit.edu Department of Computer Science, Massachusetts Institute of Technology Yujia Bao yujia.bao{at}accenture.com Center for Advanced AI, Accenture Regina Barzilay regina{at}csail.mit.edu Department of Computer Science, Massachusetts Institute of Technology Tommi S. Jaakkola tommi{at}csail.mit.edu Department of Computer Science, Massachusetts Institute of Technology
Pseudocode Yes Algorithm 1 Resolve marginal estimates of f F 1: Input: Data DG faithful to G 2: Initialize E KN as the complete undirected graph on N nodes. 3: for S Sd+2 do 4: Compute E S = f(DG[S]) 5: for (i, j) E S do 6: Remove (i, j) from E 8: end for 9: for E S {E S}Sd+2 do 10: for v-structure i j k in E S do 11: if {i, j}, {j, k} E and {i, k} E then 12: Assign orientation i j k in E 14: end for 15: end for 16: Propagate orientations in E (optional).
Open Source Code Yes 1Our code is available at https://github.com/rmwu/sea.
Open Datasets Yes We pretrained Sea models on 6,480 synthetic datasets... To assess generalization and robustness, we evaluate on unseen in-distribution and out-of-distribution synthetic datasets, as well as two real biological datasets (Sachs et al., 2005; Replogle et al., 2022), using the versions from Wang et al. (2017); Chevalley et al. (2025).
Dataset Splits Yes We generated 90 training, 5 validation, and 5 testing datasets for each combination.
Hardware Specification Yes The models were trained across 2 NVIDIA RTX A6000 GPUs and 60 CPU cores.
Software Dependencies No Observational algorithm implementations were provided by the causal-learn library (Zheng et al., 2024).
Experiment Setup Yes Our model was implemented with 4 layers with 8 attention heads and hidden dimension 64. Our model was trained using the Adam W optimizer with a learning rate of 1e-4 (Loshchilov et al., 2017). See B.4 for additional details about hyperparameters.