reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Recursive Causal Discovery

Authors: Ehsan Mokhtarian, Sepehr Elahi, Sina Akbari, Negar Kiyavash

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present a series of simulations to compare the RCD algorithms with other widely used causal discovery algorithms for learning the skeleton of a causal graph. We study the eﬀect of varying the number of variables, graph density, and sample size in both linear and non-linear settings. We also include experiments on real-world networks.
Researcher Affiliation	Academia	Ehsan Mokhtarian EMAIL School of Computer and Communication Sciences EPFL 1015 Lausanne, Switzerland / Sepehr Elahi EMAIL School of Computer and Communication Sciences EPFL 1015 Lausanne, Switzerland / Sina Akbari EMAIL School of Computer and Communication Sciences EPFL 1015 Lausanne, Switzerland / Negar Kiyavash EMAIL College of Management of Technology EPFL 1015 Lausanne, Switzerland
Pseudocode	Yes	Algorithm 1: Recursive framework for learning the skeleton of GV, Algorithm 2: Finding a removable variable., Algorithm 3: Learning Gπ., Algorithm 4: MARVEL, Algorithm 5: MARVEL functions, Algorithm 6: L-MARVEL, Algorithm 7: RSLω, Algorithm 8: RSLω Without Side Information., Algorithm 9: RSLD, Algorithm 10: ROLHC, Algorithm 11: Updates Markov boundaries.
Open Source Code	Yes	Another contribution of this paper is the release of RCD, a Python package that eﬃciently implements these algorithms. This package is designed for practitioners and researchers interested in applying these methods in practical scenarios. The package is available at github.com/ban-epfl/rcd, with comprehensive documentation provided at rcdpackage.com.
Open Datasets	Yes	For the ground-truth DAGs, we consider both synthetic random graphs generated from Erd os-R enyi (ER) models and real-world networks available at https://www.bnlearn.com/bnrepository.
Dataset Splits	No	For each value of n, we generate 20 DAGs, and for each DAG, we generate 10 data sets, each containing 50n samples. / For each graph, we generate a data set following the data generation procedure described earlier. The paper does not specify train/test/validation splits for its experiments; instead, it generates new datasets or uses existing ones to generate data for evaluation without explicit partitioning into training, validation, and test sets for a single experimental run.
Hardware Specification	Yes	All simulations were conducted on a PC equipped with two Intel Xeon E5-2680 v3 CPUs, 256 GB of RAM, and running Ubuntu 20.04.4 LTS.
Software Dependencies	Yes	RCD uses only four packages Network X, Num Py, Pandas, and Sci Py, all commonly used in causal discovery. / We ran all algorithms using Python 3.10, except for f GES, which was run using JPype to connect to Tetrad running on the Amazon Corretto 22 JDK.
Experiment Setup	Yes	For the signiﬁcance level in the conditional independence tests, we set α = 2/n2 (Pellet and Elisseeﬀ, 2008b). / We set a time limit of 100 seconds for each algorithm, after which the execution was terminated. / Inputs to Algorithm 10 are the observational data Data(V) and two parameters max Iter and max Swap.