Recursive Causal Discovery

Authors: Ehsan Mokhtarian, Sepehr Elahi, Sina Akbari, Negar Kiyavash

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present a series of simulations to compare the RCD algorithms with other widely used causal discovery algorithms for learning the skeleton of a causal graph. We study the effect of varying the number of variables, graph density, and sample size in both linear and non-linear settings. We also include experiments on real-world networks.
Researcher Affiliation Academia Ehsan Mokhtarian EMAIL School of Computer and Communication Sciences EPFL 1015 Lausanne, Switzerland / Sepehr Elahi EMAIL School of Computer and Communication Sciences EPFL 1015 Lausanne, Switzerland / Sina Akbari EMAIL School of Computer and Communication Sciences EPFL 1015 Lausanne, Switzerland / Negar Kiyavash EMAIL College of Management of Technology EPFL 1015 Lausanne, Switzerland
Pseudocode Yes Algorithm 1: Recursive framework for learning the skeleton of GV, Algorithm 2: Finding a removable variable., Algorithm 3: Learning Gπ., Algorithm 4: MARVEL, Algorithm 5: MARVEL functions, Algorithm 6: L-MARVEL, Algorithm 7: RSLω, Algorithm 8: RSLω Without Side Information., Algorithm 9: RSLD, Algorithm 10: ROLHC, Algorithm 11: Updates Markov boundaries.
Open Source Code Yes Another contribution of this paper is the release of RCD, a Python package that efficiently implements these algorithms. This package is designed for practitioners and researchers interested in applying these methods in practical scenarios. The package is available at github.com/ban-epfl/rcd, with comprehensive documentation provided at rcdpackage.com.
Open Datasets Yes For the ground-truth DAGs, we consider both synthetic random graphs generated from Erd os-R enyi (ER) models and real-world networks available at https://www.bnlearn.com/bnrepository.
Dataset Splits No For each value of n, we generate 20 DAGs, and for each DAG, we generate 10 data sets, each containing 50n samples. / For each graph, we generate a data set following the data generation procedure described earlier. The paper does not specify train/test/validation splits for its experiments; instead, it generates new datasets or uses existing ones to generate data for evaluation without explicit partitioning into training, validation, and test sets for a single experimental run.
Hardware Specification Yes All simulations were conducted on a PC equipped with two Intel Xeon E5-2680 v3 CPUs, 256 GB of RAM, and running Ubuntu 20.04.4 LTS.
Software Dependencies Yes RCD uses only four packages Network X, Num Py, Pandas, and Sci Py, all commonly used in causal discovery. / We ran all algorithms using Python 3.10, except for f GES, which was run using JPype to connect to Tetrad running on the Amazon Corretto 22 JDK.
Experiment Setup Yes For the significance level in the conditional independence tests, we set α = 2/n2 (Pellet and Elisseeff, 2008b). / We set a time limit of 100 seconds for each algorithm, after which the execution was terminated. / Inputs to Algorithm 10 are the observational data Data(V) and two parameters max Iter and max Swap.