A Sample Efficient Conditional Independence Test in the Presence of Discretization

Authors: Boyang Sun, Yu Yao, Xinshuai Dong, Zongfang Liu, Tongliang Liu, Yumou Qiu, Kun Zhang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We applied the proposed test DCT-GMM to synthetic dataset to evaluate its performance compared with baselines including DCT (Sun et al., 2024), Fisher-z test (Fisher, 1921), Chi-square test (F.R.S., 2009). Specifically, we investigate its Type I and Type II error in different scenarios and its application in causal discovery.
Researcher Affiliation Academia 1Mohamed bin Zayed University of Artificial Intelligence 2Sydney AI Centre, The University of Sydney 3Carnegie Mellon University 4Peking University.
Pseudocode Yes The pseudo-code of both approaches is provided in Appendix C.
Open Source Code Yes Our code implementation is provided in https://github.com/boyangaaaaa/DCT.
Open Datasets Yes We conduct experiments on the Big Five Personality dataset, where each variable has 5 discrete values representing agreement levels (1=Disagree to 5=Agree). This dataset has been closely examined by Dong et al. (2024a) and Dong et al. (2024b).
Dataset Splits No We first generate Z as an independent multivariate normal distribution whose mean and variance are randomly sampled from a uniform distribution U(0, 1). We then generate corresponding X and Y using Z, structured as PD i=1 ai Zi+Ei (for the first scenario, D = 1)... The data are then discretized into three levels, with random boundaries set based on the support of each variable... The paper describes data generation for experiments, not dataset splits of a fixed dataset. No splits are mentioned for the real-world dataset either.
Hardware Specification No No specific hardware details (like GPU/CPU models or cloud instances) used for running experiments are mentioned in the paper.
Software Dependencies No The paper mentions 'Our code implementation is provided in https://github.com/boyangaaaaa/DCT.' but does not list any specific software dependencies with version numbers in the text.
Experiment Setup Yes We first generate Z as an independent multivariate normal distribution whose mean and variance are randomly sampled from a uniform distribution U(0, 1). We then generate corresponding X and Y using Z, structured as PD i=1 ai Zi+Ei (for the first scenario, D = 1), where ai is a scalar sampled from a standard normal distribution and Ei follows a standard normal distribution... The data are then discretized into three levels, with random boundaries set based on the support of each variable... The true DAG is generated using the Bipartite Pairing (BP) model (Asratian et al., 1998), with weights drawn from a uniform distribution U (1, 3) and incorporating noise following a standard normal distribution... The first two columns of Figure 2 show the resulting Type I error at a significance level of α = 0.05.