Deriving Causal Order from Single-Variable Interventions: Guarantees & Algorithm
Authors: Mathieu Chevalley, Patrick Schwab, Arash Mehrjou
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we demonstrate, both theoretically and empirically, that such datasets contain a wealth of causal information that can be effectively extracted under realistic assumptions about the data distribution. More specifically, we introduce a novel variant of interventional faithfulness... To empirically verify our theory, we introduce INTERSORT, an algorithm designed to infer the causal order from datasets... INTERSORT outperforms baselines (GIES, DCDI, PC and EASE) on almost all simulated data settings... Our empirical evaluations on diverse simulated datasets... confirm our theoretical results. |
| Researcher Affiliation | Collaboration | Mathieu Chevalley1,2 Patrick Schwab1 Arash Mehrjou1,3 1GSK.ai 2ETH Z urich 3 MPI for Intelligent Systems Correspondence to EMAIL, EMAIL |
| Pseudocode | Yes | C PSEUDOCODES Algorithm 1 Complete algorithm Algorithm 2 Finding an initial solution Algorithm 3 Local search optimization |
| Open Source Code | No | The paper mentions using other implementations: "We run the implementation of Zheng et al. (2024) (MIT License, causal-learn v0.1.3)." and "run the package implementation of Gamella (2022) (BSD 3-Clause License, v0.0.1)." and "using the code from Lorch et al. (2022) (MIT License, v1.0.5)." and "using the implementation of Nazaret et al. (2023) (MIT License, v0.1.0)." For their own algorithm, it states: "For EASE, we run our own Python implementation of the algorithm." but it does not provide any statement of public release or a link for INTERSORT or their EASE implementation. |
| Open Datasets | Yes | Our empirical evaluations on diverse simulated datasets (linear, random Fourier features (Lorch et al., 2022), neural network (Nazaret et al., 2023; Brouillard et al., 2020) and single cell (Dibaeinia & Sinha, 2020) with various noise distributions)... Output of Intersort on the flow cytometry data set of the Sachs et al. (2005) dataset |
| Dataset Splits | Yes | We simulate 5000 samples for the observational datasets and 100 samples for each interventions |
| Hardware Specification | Yes | The simulations of fig. 1 were run on an Apple M1 Macbook pro. The experiments for figs. 2 and 7 were run on a cluster with 20 CPUs, 16Gb of memory per CPU. |
| Software Dependencies | Yes | We run the implementation of Zheng et al. (2024) (MIT License, causal-learn v0.1.3). We use the Gaussian BIC score, and run the package implementation of Gamella (2022) (BSD 3-Clause License, v0.0.1). We compute the Wassertein distances with the SCIPY python package (Virtanen et al., 2020). |
| Experiment Setup | Yes | For Intersort, we use ϵ = 0.3 for the linear, RFF and neural network domain, and ϵ = 0.5 for the single-cell domain. We set c = 0.5. We simulate 5000 samples for the observational datasets and 100 samples for each interventions... The distribution of the noise is chosen uniformly at random from uniform Gaussian (noise scale independent from the parents), heteroscedastic Gaussian (noise scale functionally dependent on the parents), and Laplace. For the neural network domain, the noise distribution is Gaussian with a fixed variance. |