reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Emergence-Inspired Multi-Granularity Causal Learning

Authors: Hanwen Luo, Guoxian Yu, Jun Wang, Yanyu Xu, Yongqing Zheng, Qingzhong Li

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on both synthetic and real datasets demonstrate that EMCausal can identify causal graphs under the influence of causal emergence, outperforming competitive baselines in term of accuracy and robustness. In this section, we conduct a series of experiments to study the effectiveness of EMCausal. We first compare the performance of various causal discovery algorithms using synthetic datasets and give a detailed analysis of the strengths and limitations of compared methods.
Researcher Affiliation	Collaboration	1School of Software, Shandong University, Jinan, China 2SDU-NTU Joint Centre for AI Research, Shandong University, Jinan, China 3Dareway Software Co., Ltd., Jinan, China EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: EMCausal: Emergence-inspired Multigranularity Causal Learning
Open Source Code	Yes	The code of EMCausal is shared at http://www.sdu-idea.cn/codes.php?name=EMCausal.
Open Datasets	Yes	Experimental results on both synthetic and real datasets demonstrate that EMCausal can identify causal graphs under the influence of causal emergence, outperforming competitive baselines in term of accuracy and robustness. Result on the Sachs Dataset To further validate the applicability of EMCausal in realworld scenarios, we conducted experiments on the Sachs dataset (Sachs et al. 2005). The Sachs dataset, comprising 853 samples, records the expression levels of 11 proteins and phosphorylated proteins measured from human T cells in the immune system.
Dataset Splits	No	The paper mentions generating synthetic data and using the Sachs dataset, but does not provide specific train/test/validation splits. For synthetic data: "For each experiment, we sample 1000 data samples from the data generation process." For Sachs dataset: "The Sachs dataset, comprising 853 samples" without specifying how these samples were split for experimental purposes.
Hardware Specification	Yes	We implement EMCausal using Py Torch 1.13 and conduct experiments on a server with the Intel(R) Xeon(R) Gold 6248R CPU, 512G memory, 8 NVIDIA Ge Force RTX 3090 GPUs, and Ubuntu 22.04.
Software Dependencies	Yes	We implement EMCausal using Py Torch 1.13 and conduct experiments on a server with the Intel(R) Xeon(R) Gold 6248R CPU, 512G memory, 8 NVIDIA Ge Force RTX 3090 GPUs, and Ubuntu 22.04.
Experiment Setup	Yes	Experimental Setup We choose several representative causal discovery algorithms as our baselines, including classical PC (Spirtes et al. 2000), which is based on conditional independence tests, and score-based optimization methods such as GES (Chickering 2002b), DAG-GNN (Yu et al. 2019), GAE (Ng et al. 2019), and Mg CSL (Liang et al. 2024b). The parameter configurations of these methods are summarized in Table 2. To synthesize multi-granularity data, we set the number of micro variables that constitute each macro variable to 4 and 8 with the total number of micro variables as 20 or 40. The edge density is set to 2. We use randomly generated two layer MLP, whose hidden size is set to 100, to model the nonlinear relationships between variables. For each experiment, we sample 1000 data samples from the data generation process. To reduce randomness, we perform ten independent experiments and report the mean and standard deviation. Table 2: Parameter configuration of compared methods