Conditional Diffusion Models Based Conditional Independence Testing
Authors: Yanfeng Yang, Shuai Li, Yingjie Zhang, Zhuoran Sun, Hai Shu, Ziqi Chen, Renming Zhang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A series of experiments on synthetic data demonstrates that our new test effectively controls both type-I and type-II errors, even in high dimensional scenarios. |
| Researcher Affiliation | Academia | Yanfeng Yang1 , Shuai Li1 , Yingjie Zhang1 , Zhuoran Sun1, Hai Shu2, Ziqi Chen1 , Renming Zhang3 1School of Statistics, KLATASDS-MOE, East China Normal University, Shanghai, China 2 Department of Biostatistics, School of Global Public Health, New York University, New York, USA 3 Department of Computer Science, Boston University, Boston, USA |
| Pseudocode | Yes | Algorithm 1: Training the conditional score matching models Algorithm 2: Sampling from score-based conditional diffusion models Algorithm 3: Conditional diffusion models based conditional independence testing (CDCIT) |
| Open Source Code | Yes | Code https://github.com/Yanfeng-Yang-0316/CDCIT |
| Open Datasets | No | The paper uses synthetic datasets generated based on models (M1, M2, M3) described within the paper, but does not provide concrete access information (link, DOI, specific citation to an external repository) for a pre-existing public dataset. |
| Dataset Splits | Yes | For each experiment, 1000 samples are generated. We use N = 500 to train the conditional sampler and n = 500 to compute the test statistic in our CDCIT. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments. It mentions timing performance but not the underlying hardware. |
| Software Dependencies | No | The paper mentions using 'XGBoost' and 'deep neural networks' for the classifier but does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We set the number of repetitions B to 100 and the significance level α to 0.05. For each experiment, 1000 samples are generated. We use N = 500 to train the conditional sampler and n = 500 to compute the test statistic in our CDCIT. We vary dz, the dimension of Z, from 10 to 100. |