Recovery of Causal Graph Involving Latent Variables via Homologous Surrogates
Authors: Xiuchuan Li, Jun Wang, Tongliang Liu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Building on these theoretical results, we derive an algorithm that fully leverages the properties of homologous surrogates for causal graph recovery. Also, we validate its efficacy through experiments. Our code is available at: https://github.com/Xiuchuan Li/ICLR2025-CDHS. We first use four causal graphs shown as Fig. 5 to generate synthetic data. For each causal graph, we draw 10 sample sets of size 5k, 10k, 20k respectively. Each direct causal strength is sampled from a uniform distribution over [ 2.0, 0.5] [0.5, 2.0] and each exogenous noise is generated from exponential distribution. We compare our methods with GIN (Xie et al., 2020), La HME (Xie et al., 2022), and PO-Li NGAM (Jin et al., 2024). We use 3 metrics to evaluate their performances: (1) Error in Latent Variables, the absolute difference between the estimated number of latent variables and the ground-truth one; (2) Correct-Ordering Rate, the number of correctly estimated causal orderings divided by that of ground-truth causal orderings; (3) F1-Score of causal edges. Results are summarized in Tab. 1... Besides synthetic data, we also evaluate our algorithm on a real-world dataset Holzinger and Swineford 1939 (Rosseel, 2012). |
| Researcher Affiliation | Academia | Xiu-Chuan Li Jun Wang Tongliang Liu Sydney AI Centre, University of Sydney. Correspondence to Tongliang Liu (EMAIL). |
| Pseudocode | Yes | Algorithm 1: Partial recovery of the causal graph under Asmp. 1. Input: O Output: An(𝑉) for each 𝑉 V, ML O, MO O... Algorithm 2: Full recovery of the causal graph under Asmp. 2. Input: O, L, ML O, MO O returned by Alg. 1. Output: A |
| Open Source Code | Yes | Our code is available at: https://github.com/Xiuchuan Li/ICLR2025-CDHS |
| Open Datasets | Yes | Besides synthetic data, we also evaluate our algorithm on a real-world dataset Holzinger and Swineford 1939 (Rosseel, 2012). |
| Dataset Splits | No | For each causal graph, we draw 10 sample sets of size 5k, 10k, 20k respectively. Each direct causal strength is sampled from a uniform distribution over [ 2.0, 0.5] [0.5, 2.0] and each exogenous noise is generated from exponential distribution. The Holzinger and Swineford 1939 dataset consists of mental ability test scores of seventhand eighth-grade children from two different schools (Pasteur and Grant-White). No specific training/test/validation dataset splits are mentioned in the paper for either the synthetic or real-world datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | No | The paper describes the generation of synthetic data (e.g., 'Each direct causal strength is sampled from a uniform distribution over [ 2.0, 0.5] [0.5, 2.0] and each exogenous noise is generated from exponential distribution') and a specific setting for comparison methods ('we set the size of the largest atomic unit in GIN and PO-Li NGAM to 1 for a fair comparison'). However, it does not explicitly provide specific experimental setup details such as hyperparameters, training configurations, or system-level settings (e.g., learning rates, batch sizes, optimizers) for its own proposed algorithm. |