Recursive Causal Discovery
Authors: Ehsan Mokhtarian, Sepehr Elahi, Sina Akbari, Negar Kiyavash
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present a series of simulations to compare the RCD algorithms with other widely used causal discovery algorithms for learning the skeleton of a causal graph. We study the effect of varying the number of variables, graph density, and sample size in both linear and non-linear settings. We also include experiments on real-world networks. |
| Researcher Affiliation | Academia | Ehsan Mokhtarian EMAIL School of Computer and Communication Sciences EPFL 1015 Lausanne, Switzerland / Sepehr Elahi EMAIL School of Computer and Communication Sciences EPFL 1015 Lausanne, Switzerland / Sina Akbari EMAIL School of Computer and Communication Sciences EPFL 1015 Lausanne, Switzerland / Negar Kiyavash EMAIL College of Management of Technology EPFL 1015 Lausanne, Switzerland |
| Pseudocode | Yes | Algorithm 1: Recursive framework for learning the skeleton of GV, Algorithm 2: Finding a removable variable., Algorithm 3: Learning Gπ., Algorithm 4: MARVEL, Algorithm 5: MARVEL functions, Algorithm 6: L-MARVEL, Algorithm 7: RSLω, Algorithm 8: RSLω Without Side Information., Algorithm 9: RSLD, Algorithm 10: ROLHC, Algorithm 11: Updates Markov boundaries. |
| Open Source Code | Yes | Another contribution of this paper is the release of RCD, a Python package that efficiently implements these algorithms. This package is designed for practitioners and researchers interested in applying these methods in practical scenarios. The package is available at github.com/ban-epfl/rcd, with comprehensive documentation provided at rcdpackage.com. |
| Open Datasets | Yes | For the ground-truth DAGs, we consider both synthetic random graphs generated from Erd os-R enyi (ER) models and real-world networks available at https://www.bnlearn.com/bnrepository. |
| Dataset Splits | No | For each value of n, we generate 20 DAGs, and for each DAG, we generate 10 data sets, each containing 50n samples. / For each graph, we generate a data set following the data generation procedure described earlier. The paper does not specify train/test/validation splits for its experiments; instead, it generates new datasets or uses existing ones to generate data for evaluation without explicit partitioning into training, validation, and test sets for a single experimental run. |
| Hardware Specification | Yes | All simulations were conducted on a PC equipped with two Intel Xeon E5-2680 v3 CPUs, 256 GB of RAM, and running Ubuntu 20.04.4 LTS. |
| Software Dependencies | Yes | RCD uses only four packages Network X, Num Py, Pandas, and Sci Py, all commonly used in causal discovery. / We ran all algorithms using Python 3.10, except for f GES, which was run using JPype to connect to Tetrad running on the Amazon Corretto 22 JDK. |
| Experiment Setup | Yes | For the significance level in the conditional independence tests, we set α = 2/n2 (Pellet and Elisseeff, 2008b). / We set a time limit of 100 seconds for each algorithm, after which the execution was terminated. / Inputs to Algorithm 10 are the observational data Data(V) and two parameters max Iter and max Swap. |