Hybrid Local Causal Discovery
Authors: Zhaolong Ling, Honghui Peng, Yiwen Zhang, Debo Cheng, Xingyu Wu, Peng Zhou, Kui Yu
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on 14 benchmark Bayesian networks and two real datasets validate that the proposed algorithm outperforms the existing local causal discovery methods. We conducted experiments on 14 benchmark BN datasets, where each BN dataset generated samples of sizes 500 and 1000, respectively. Furthermore, we used two real datasets. |
| Researcher Affiliation | Academia | 1Anhui University 2University of South Australia 3Hong Kong Polytechnic University 4Hefei University of Technology EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Hybrid Local Causal Discovery Input: D: Data, T: The target variable Output: Parents of T: Direct causes of T, Children of T: Direct effects of T Initialize: V = , Q (a regular queue) = {T} repeat /* Step 1: Hybrid local causal skeleton construction */ Z = Q.pop; if Z / V then PCZ = get PC(D,Z); V = V {Z}; end for each X PCZ do if The local score of X Z is not equal to Z X or SA/B(X Z, D) SA/B( Z, D) < 0 then PCZ = PCZ\{X}; end end Q = Q.push(PCZ\{V}); /* Step 2: Hybrid local causal orientation */ for each X,Y PCZ do if The local score of X Z Y is greater than X Z Y then The X, Y , Z form a V-structure, and Z is the collision node; end end Using Meek-rules to orient edge orientations between variables in V; until All causal orientations of T is determined, or Q = , or V contains all variables; Return Parents of T, Children of T; |
| Open Source Code | No | The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper. The reference to [Ling et al., 2025a] is a preprint of the current paper itself, not a code repository. |
| Open Datasets | Yes | We conducted experiments on 14 benchmark BN datasets, where each BN dataset generated samples of sizes 500 and 1000, respectively. Furthermore, we used two real datasets. The first was a well-known dataset from [Sachs et al., 2005], which captures the varying expression levels of proteins and phospholipids in human cells. ... The second dataset was a pseudo-real dataset generated using the Syn TRe N generator [Van den Bulcke et al., 2006], which simulates synthetic transcriptional regulatory networks to approximate experimental gene expression data. |
| Dataset Splits | No | The paper mentions sample sizes for the datasets used (e.g., "samples of sizes 500 and 1000" for BN datasets, "853 samples" for Sachs, "500 samples" for Syn TRe N). However, it does not specify how these datasets were split into training, validation, or test sets, or reference any standard splits for the experiments performed. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions various parent and child discovery algorithms (e.g., MMPC, FCBF, HITON-PC, PCsimple) and the use of Meek-rules, but it does not specify any software names with version numbers for their implementation (e.g., Python, PyTorch, scikit-learn, with their respective versions). |
| Experiment Setup | No | The paper describes ensuring that HLCD and comparison methods use the "same parent-child identification approach" and refers to [Ling et al., 2025a] for more details on experimental setup. However, the provided text does not contain specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or other detailed training configurations for the models or algorithms. |