Efficient and Trustworthy Causal Discovery with Latent Variables and Complex Relations

Authors: Xiuchuan Li, Tongliang Liu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first use four causal graphs shown as Fig. 9 to generate synthetic data. For each graph, we draw 10 sample sets of size 2k, 5k, 10k respectively. Each causal strength is sampled from a uniform distribution over [ 2.0, 0.5] [0.5, 2.0] and each noise is generated from the seventh power of uniform distribution. We compare our methods with GIN (Xie et al., 2020), La HME (Xie et al., 2022), and PO-Li NGAM (Jin et al., 2024). We use 3 metrics to evaluate the performance, including (i) Error in Latent Variables, the absolute difference between the estimated number of latent variables and the ground-truth one; (ii) Correct-Ordering Rate, the number of correctly estimated causal ordering divided by the number of causal ordering in the ground-truth graph; (iii) F1-Score of causal edges. The results are summarized in Tab. 1, where we also report the running time.
Researcher Affiliation Academia Xiu-Chuan Li Tongliang Liu Sydney AI Centre, University of Sydney Correspondence to Tongliang Liu (EMAIL).
Pseudocode Yes An overview of our algorithm in stage 1 is shown as Alg. 1 while a detailed version is deferred to Alg. 3 in App. E. It has O(𝑅|O0|3) complexity where 𝑅is the number of iterations. An overview of our algorithm in stage 2 is shown as Alg. 2 while a detailed version is deferred to Alg. 4 in App. E. It has O(|Vπ‘Ž|3) time complexity.
Open Source Code Yes Our code is available at: https://github.com/Xiuchuan Li/ICLR2025-ETCD
Open Datasets No We first use four causal graphs shown as Fig. 9 to generate synthetic data. For each graph, we draw 10 sample sets of size 2k, 5k, 10k respectively. Each causal strength is sampled from a uniform distribution over [ 2.0, 0.5] [0.5, 2.0] and each noise is generated from the seventh power of uniform distribution. We also apply our proposed algorithm to real-world data, more details are deferred to App. D. (Appendix D refers to 'multitasking behavior model' but does not provide public access to the data.)
Dataset Splits No For each graph, we draw 10 sample sets of size 2k, 5k, 10k respectively. (This describes dataset sizes for synthetic data but no training/validation/test splits. The paper does not mention dataset splits for real-world data.)
Hardware Specification No The paper does not contain any specific hardware details such as GPU/CPU models, processor types, or memory specifications used for running the experiments.
Software Dependencies No The paper does not contain any specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup No Each causal strength is sampled from a uniform distribution over [ 2.0, 0.5] [0.5, 2.0] and each noise is generated from the seventh power of uniform distribution. (This describes the data generation process for synthetic data but not specific hyperparameters or training configurations for the proposed algorithm itself.)