reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Conditional Diffusion Models Based Conditional Independence Testing

Authors: Yanfeng Yang, Shuai Li, Yingjie Zhang, Zhuoran Sun, Hai Shu, Ziqi Chen, Renming Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A series of experiments on synthetic data demonstrates that our new test effectively controls both type-I and type-II errors, even in high dimensional scenarios.
Researcher Affiliation	Academia	Yanfeng Yang1 , Shuai Li1 , Yingjie Zhang1 , Zhuoran Sun1, Hai Shu2, Ziqi Chen1 , Renming Zhang3 1School of Statistics, KLATASDS-MOE, East China Normal University, Shanghai, China 2 Department of Biostatistics, School of Global Public Health, New York University, New York, USA 3 Department of Computer Science, Boston University, Boston, USA
Pseudocode	Yes	Algorithm 1: Training the conditional score matching models Algorithm 2: Sampling from score-based conditional diffusion models Algorithm 3: Conditional diffusion models based conditional independence testing (CDCIT)
Open Source Code	Yes	Code https://github.com/Yanfeng-Yang-0316/CDCIT
Open Datasets	No	The paper uses synthetic datasets generated based on models (M1, M2, M3) described within the paper, but does not provide concrete access information (link, DOI, specific citation to an external repository) for a pre-existing public dataset.
Dataset Splits	Yes	For each experiment, 1000 samples are generated. We use N = 500 to train the conditional sampler and n = 500 to compute the test statistic in our CDCIT.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments. It mentions timing performance but not the underlying hardware.
Software Dependencies	No	The paper mentions using 'XGBoost' and 'deep neural networks' for the classifier but does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	We set the number of repetitions B to 100 and the signiﬁcance level α to 0.05. For each experiment, 1000 samples are generated. We use N = 500 to train the conditional sampler and n = 500 to compute the test statistic in our CDCIT. We vary dz, the dimension of Z, from 10 to 100.