reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense

Authors: Mingkun Zhang, Keping Bi, Wei Chen, Quanrun Chen, Jiafeng Guo, Xueqi Cheng

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, Causal Diff has significantly outperformed state-of-the-art defense methods on various unseen attacks, achieving an average robustness of 86.39% (+4.01%) on CIFAR-10, 56.25% (+3.13%) on CIFAR-100, and 82.62% (+4.93%) on GTSRB (German Traffic Sign Recognition Benchmark).
Researcher Affiliation	Academia	Mingkun Zhang CAS Key Laboratory of AI Safety Institute of Computing Technology, CAS EMAIL Keping Bi Key Laboratory of Network Data Science and Technology Institute of Computing Technology, CAS EMAIL Wei Chen CAS Key Laboratory of AI Safety Institute of Computing Technology, CAS EMAIL Quanrun Chen School of Statistics University of International Business and Economics EMAIL Jiafeng Guo Key Laboratory of Network Data Science and Technology Institute of Computing Technology, CAS EMAIL Xueqi Cheng CAS Key Laboratory of AI Safety Institute of Computing Technology, CAS EMAIL
Pseudocode	Yes	Algorithm 1 Causal Diff Algorithm; Algorithm 2 Causal Diff Pretrain Algorithm; Algorithm 3 Adversarially Robust Inference Algorithm
Open Source Code	Yes	The code is available at https://github.com/CAS-AISafety Basic Research Group/Causal Diff.
Open Datasets	Yes	Our experiments utilize the CIFAR-10, CIFAR-100 [18] and GTSRB [19] datasets.
Dataset Splits	No	The paper mentions 'CIFAR-10 and CIFAR-100 each consists of 50,000 training images, categorized into 10 and 100 classes, respectively.' and 'GTSRB comprises 39,209 training images' but does not explicitly state the train/validation/test splits, percentages, or specific counts for these datasets within the paper.
Hardware Specification	Yes	We evaluate the computational complexity of Causal Diff and Diff Pure [33] as well as a discriminative model (WRN-70-16) by measuring the inference time in seconds for a single sample (average on 100 examples from CIFAR-10 dataset) on two types of GPUs, including NVIDIA A6000 GPU and 4090 GPU (Our experiments leverage 4 A6000 GPUs and 4 4090 GPUs).
Software Dependencies	No	The paper mentions using 'DDPM [17]' and 'Wide Res Net-70-16 (WRN-70-16)' and the 'Adam optimizer', but does not specify version numbers for these or other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Both the pretraining and joint training phases utilize a learning rate of 1e 4 and a batch size of 128. For simplicity, we follow the setting of wt = 1 [17]. We set α = 1., γ = 1e 2, η = 1e 5, λ = 1e 2 as the weights for the loss function in Eq. (7).