reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Causally Consistent Normalizing Flow

Authors: Qingyang Zhou, Kangjie Lu, Meng Xu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate CCNF to answer three key questions: 1. Causal Consistency. Despite theoretical assurances of causal consistency is demonstrated, does CCNF maintain this consistency in practical implementations? 2. Performance on Causal Inference Tasks. In causal inference tasks, How accurately do the data instances generated by CCNF compare with those generated by actual models? Is there an observable improvement in accuracy compared to state-of-the-art models? 3. Effectiveness in Real-world Case Studies. Can CCNF be effectively applied to real-world scenarios, such as mitigating unfairness? ... Results are summarized in Table 2. In a nutshell, the practical results are consistent with theoretical expectations. ... As shown in Table 3, CCNF demonstrates superior performance compared with previous works across nearly all datasets. ... The results are summarized in Table 4. Overall, CCNF can enhance fairness while maintaining accuracy in both methods.
Researcher Affiliation	Academia	1University of Waterloo, Ontario, Canada 2University of Minnesota, Minnesota, America
Pseudocode	No	The extended version contains algorithms for all three tasks.
Open Source Code	Yes	Code https://github.com/UWCSZhou/CCNF
Open Datasets	Yes	We test given models on representative synthetic datasets: Nonlinear Triangle dataset, Nonlinear Simpson dataset, M-graph dataset, Network dataset, Backdoor dataset, and Chain dataset with 3 8 nodes, respectively. ... Applying CCNF to the German credit dataset (Hofmann 1994) ... Hofmann, H. 1994. Statlog (German Credit Data). UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5NC77.
Dataset Splits	No	We test given models on representative synthetic datasets: Nonlinear Triangle dataset, Nonlinear Simpson dataset, M-graph dataset, Network dataset, Backdoor dataset, and Chain dataset with 3 8 nodes, respectively. ... For observations, the measurement is the KL distance, for interventions, we measure the max Maximum Mean Discrepancy (MMD) distance. For counterfactuals, we measure the Root Mean-Square Deviation (RMSD) distance. More details are in the extended version. ... Like prior works (S anchez-Martin, Rateike, and Valera 2022; Javaloy, Martin, and Valera 2023), we select the German credit dataset as a representative example.
Hardware Specification	No	The paper does not provide specific hardware details like GPU/CPU models or types of machines used for running experiments.
Software Dependencies	No	The paper does not provide specific software dependency versions (e.g., library names with version numbers) needed to replicate the experiment.
Experiment Setup	No	For observations, the measurement is the KL distance, for interventions, we measure the max Maximum Mean Discrepancy (MMD) distance. For counterfactuals, we measure the Root Mean-Square Deviation (RMSD) distance. More details are in the extended version.