reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DCILP: A Distributed Approach for Large-Scale Causal Structure Learning

Authors: Shuyu Dong, Michele Sebag, Kento Uemura, Akito Fujii, Shuang Chang, Yusuke Koyanagi, Koji Maruhashi

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section reports on the experimental validation of DCILP, referring to the extended version (Dong et al. 2025) for more details and complementary results. 4.1 Experimental Setting Goals. The primary goal of the experiments is to evaluate the performance of DCILP according to the standard SHD, TPR, FDR and FPR indicators for causal learning, together with its computational efficiency. A second goal is to assess how the causal learner used in DCILP Phase-2 influences the overall performances. We report on the performances of DCILP-ges (respectively DCILP-dagma), corresponding to DCILP using GES (resp. DAGMA) during Phase-2. ... Benchmarks. Following (Zheng et al. 2018), we consider synthetic and real-world datasets.
Researcher Affiliation	Collaboration	1INRIA, LISN, Universit e Paris-Saclay, 91190, Gif-sur-Yvette, France 2Fujitsu Limited, Kanagawa, 211-8588, Japan EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 DCILP Require: Observational data X Rn d 1: (PHASE-1) Divide: Estimate Markov blanket MB(Xi) for i [d] 2: (PHASE-2) for i [d] do in parallel 3: A(i) Causal discovery on Si := {Xi} MB(Xi) 4: b B(i) j,k A(i) j,k if j = i or k = i, and 0 otherwise 5: (PHASE-3) Conquer: B Reconciliation from { b B(i), i [d]} via ILP
Open Source Code	Yes	Code https://github.com/shuyu-d/dcilp-exp
Open Datasets	Yes	The real-world dataset is generated using the so-called MUNIN model (Gu and Zhou 2020; Andreassen et al. 1989), which is a Bayesian network with d = 1,041 nodes that models a medical expert system based on electromyographs (EMG) to assist diagnosis of neuromuscular disorders.
Dataset Splits	No	The paper specifies how synthetic datasets are generated and ratios like "n of samples is set to 5 , 10 or 50 d" and "The ratio n/d is 5" for the MUNIN model, but it does not specify explicit training, validation, or test dataset splits typically used for model evaluation.
Hardware Specification	No	The paper mentions "each experiment uses at most 400 CPU cores" and "DCILP Phase-3 runs on four CPU cores". It also states "CPU specifications are detailed in (Dong et al. 2025, Appendix D.2)". However, it does not provide specific model numbers or types of CPUs in the main text.
Software Dependencies	No	The paper mentions using "Gurobi s ILP solver (Gurobi Optimization 2025)" and the "R package pcalg". While Gurobi is a specific solver, "Gurobi Optimization 2025" is a citation, not a version number for the software. No version number is provided for the 'pcalg' package either. Thus, specific version numbers for key software components are missing.
Experiment Setup	No	The paper provides details on the number of variables (d) and samples (n), the distribution of edge weights and noise variables for synthetic data generation, and the number of runs for averaging results. However, it does not specify concrete hyperparameters or system-level training settings for the causal learning algorithms (GES or DAGMA) used in Phase-2, such as learning rates, batch sizes, or specific parameters for conditional independence tests.