Since Faithfulness Fails: The Performance Limits of Neural Causal Discovery

Authors: Mateusz Olko, Mateusz Gajewski, Joanna Wojciechowska, Mikołaj Morzy, Piotr Sankowski, Piotr Miłoś

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our systematic evaluation highlights significant room for improvement in their accuracy when uncovering causal structures. We identify a fundamental limitation: unavoidable likelihood score estimation errors disallow distinguishing the true structure, even for small graphs and relatively large sample sizes. Furthermore, we identify the faithfulness property as a critical bottleneck: (i) it is likely to be violated across any reasonable dataset size range, and (ii) its violation directly undermines the performance of neural penalized-likelihood discovery methods.
Researcher Affiliation Collaboration 1IDEAS NCBR, Warsaw, Poland 2University of Warsaw, Warsaw, Poland 3Poznan University of Technology, Poznan, Poland 4MIM Solutions, Warsaw, Poland 5Research Institute IDEAS, Warsaw, Poland 6Institute of Mathematics, Polish Academy of Sciences, Warsaw, Poland 7deepsense.ai, Warsaw, Poland.
Pseudocode Yes Pseudocode of the method described in Sec. 3 is provided in Algorithm 1. Algorithm 1 Overview of NN-Opt
Open Source Code No The paper does not contain an explicit statement about the release of source code for the methodology described, nor does it provide a direct link to a code repository.
Open Datasets No We generate synthetic data with a known ground-truth causal structure. We consider causal DAGs with only five nodes V = {1, . . . , 5}. We generate these DAGs using the Erdos-Renyi model with the expected number of 5 edges.
Dataset Splits No The paper discusses evaluating causal discovery on "subsets of varying sizes" and "datasets with varying number of observational samples, ranging from 20 to 8,000 observations", but does not provide specific training, validation, or test dataset splits in terms of percentages, counts, or predefined files.
Hardware Specification No We gratefully acknowledge Polish high-performance computing infrastructure PLGrid (HPC Center: ACK Cyfronet AGH) for providing computer facilities and support within computational grant no. PLG/2024/016906.
Software Dependencies No The paper mentions several neural causal discovery methods like DCDI, Di BS, Bayes DAG, and SDCD, and discusses hyperparameter tuning for them, but does not provide specific version numbers for these or any other software libraries or dependencies used.
Experiment Setup Yes To ensure a fair comparison across all methods, we perform systematic hyperparameter tuning, selecting the best-performing parameters for each method We employ a grid search approach based on the parameter ranges suggested by the original authors. This process optimizes key variables such as regularization coefficients, sparsity controls, and kernel configurations. Details can be found in Appendix E.2. DCDI Grid search: ... Selected: Regularization coefficient = 1, learning rate = 0.001, Augmented Lagrangian tolerance = 10^-8.