A Sample Efficient Conditional Independence Test in the Presence of Discretization
Authors: Boyang Sun, Yu Yao, Xinshuai Dong, Zongfang Liu, Tongliang Liu, Yumou Qiu, Kun Zhang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We applied the proposed test DCT-GMM to synthetic dataset to evaluate its performance compared with baselines including DCT (Sun et al., 2024), Fisher-z test (Fisher, 1921), Chi-square test (F.R.S., 2009). Specifically, we investigate its Type I and Type II error in different scenarios and its application in causal discovery. |
| Researcher Affiliation | Academia | 1Mohamed bin Zayed University of Artificial Intelligence 2Sydney AI Centre, The University of Sydney 3Carnegie Mellon University 4Peking University. |
| Pseudocode | Yes | The pseudo-code of both approaches is provided in Appendix C. |
| Open Source Code | Yes | Our code implementation is provided in https://github.com/boyangaaaaa/DCT. |
| Open Datasets | Yes | We conduct experiments on the Big Five Personality dataset, where each variable has 5 discrete values representing agreement levels (1=Disagree to 5=Agree). This dataset has been closely examined by Dong et al. (2024a) and Dong et al. (2024b). |
| Dataset Splits | No | We first generate Z as an independent multivariate normal distribution whose mean and variance are randomly sampled from a uniform distribution U(0, 1). We then generate corresponding X and Y using Z, structured as PD i=1 ai Zi+Ei (for the first scenario, D = 1)... The data are then discretized into three levels, with random boundaries set based on the support of each variable... The paper describes data generation for experiments, not dataset splits of a fixed dataset. No splits are mentioned for the real-world dataset either. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or cloud instances) used for running experiments are mentioned in the paper. |
| Software Dependencies | No | The paper mentions 'Our code implementation is provided in https://github.com/boyangaaaaa/DCT.' but does not list any specific software dependencies with version numbers in the text. |
| Experiment Setup | Yes | We first generate Z as an independent multivariate normal distribution whose mean and variance are randomly sampled from a uniform distribution U(0, 1). We then generate corresponding X and Y using Z, structured as PD i=1 ai Zi+Ei (for the first scenario, D = 1), where ai is a scalar sampled from a standard normal distribution and Ei follows a standard normal distribution... The data are then discretized into three levels, with random boundaries set based on the support of each variable... The true DAG is generated using the Bipartite Pairing (BP) model (Asratian et al., 1998), with weights drawn from a uniform distribution U (1, 3) and incorporating noise following a standard normal distribution... The first two columns of Figure 2 show the resulting Type I error at a significance level of α = 0.05. |