A Conditional Independence Test in the Presence of Discretization

Authors: Boyang Sun, Yu Yao, Guang-Yuan Hao, Qiu, Kun Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Theoretical analysis, along with empirical validation on various datasets, rigorously demonstrates the effectiveness of our testing methods. We applied the proposed method DCT to synthetic data to evaluate its practical performance and compare it with Fisher-Z test... The experiments investigating its robustness, performance in denser graphs and effectiveness in a real-world dataset can be found in App. H.
Researcher Affiliation Academia 1 Mohamed bin Zayed University of Artificial Intelligence 2 Carnegie Mellon University 3 Peking University 4 The University of Sydney
Pseudocode Yes The pseudocode of DCT is provided in App. D. Algorithm 1 DCT: Discretization-Aware CI Test
Open Source Code Yes Our code implementation can be found in https:// github.com/boyangaaaaa/DCT.
Open Datasets Yes To further validate DCT, we employ it on a real-world dataset: Big Five Personality https://openpsychometrics.org/, which includes 50 personality indicators and over 19000 data samples.
Dataset Splits No The paper uses synthetic data generated under specific conditions to evaluate statistical tests and causal discovery algorithms. It describes data generation processes and evaluation metrics (e.g., Type I/II error, F1, SHD) but does not specify traditional training/test/validation dataset splits as would be common for supervised learning tasks.
Hardware Specification Yes All the experiments are run using Intel(R) Xeon(R) CPU E5-2680 v4 with 55 processors.
Software Dependencies No The paper mentions using "Causal-DAG (Chandler Squires, 2018)" and Python implicitly through its GitHub repository, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Our experiment investigates the variations in Type I and Type II error (1 minus power) probabilities under two conditions. In the first scenario, we focus on the effects of modifying the sample size, denoted as n = (100, 500, 1000, 2000), while conditioning on a single variable. In the second, the sample size is held constant at 2000, and we vary the cardinality of the conditioning set, represented as D = (1, 2, . . . , 5). ... We repeat each test 1500 times. The data are then discretized into K = (2, 4, 8, 12) levels, with boundaries randomly set based on the variable range.