Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective
Authors: Yiming Liu, Kezhao Liu, Yao Xiao, ZiYi Dong, Xiaogang Xu, Pengxu Wei, Liang Lin
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, through empirical analysis, we propose that the intrinsic stochasticity in the DBP process is the primary factor driving robustness. To test this hypothesis, we introduce a novel Deterministic White-Box (DW-box) setting to assess robustness in the absence of stochasticity, and we analyze attack trajectories and loss landscapes. Our results suggest that DBP models primarily rely on stochasticity to avoid effective attack directions, while their ability to purify adversarial perturbations may be limited. To further enhance the robustness of DBP models, we propose Adversarial Denoising Diffusion Training (ADDT), which incorporates classifier-guided adversarial perturbations into the diffusion training process, thereby strengthening the models ability to purify adversarial perturbations. Additionally, we propose Rank-Based Gaussian Mapping (RBGM) to improve the compatibility of perturbations with diffusion models. Experimental results validate the effectiveness of ADDT. In conclusion, our study suggests that future research on DBP can benefit from a clearer distinction between stochasticity-driven and purification-driven robustness. |
| Researcher Affiliation | Academia | Yiming Liu1 , Kezhao Liu1 , Yao Xiao1, Ziyi Dong1, Xiaogang Xu2, Pengxu Wei1,3 , Liang Lin1,3 1Sun Yat-Sen University, 2Chinese University of Hong Kong, 3Peng Cheng Laboratory EMAIL EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Figure 4 provides an overview of ADDT, and pseudocode is available in Appendix G. The following subsections detail the components of ADDT. |
| Open Source Code | Yes | The code was developed independently by two individuals and mutually verified, with consistent results achieved through independent training and testing. The open-source code is available at https://github.com/LYMDLUT/ADDT. |
| Open Datasets | Yes | Classifier. We train a Wide Res Net-2810 model for 200 epochs following the methods in Yoon et al. (2021); Wang et al. (2022), achieving an accuracy of 95.12% on CIFAR-10 and 76.66% on CIFAR-100 dataset. (...) For Tiny-Image Net, we trained the diffusion model from scratch for 200 epochs and then fine-tuned it with ADDT for 50 epochs, using a pretrained WRN-28-10 classifier for guidance. For Image Net-1k, the diffusion model was trained from scratch for 12 epochs, followed by 8 epochs of ADDT fine-tuning, with guidance from a pretrained Res Net-101 classifier. |
| Dataset Splits | Yes | Due to the high computational cost of Eo T attacks, we evaluate the models on the first 1024 images from the CIFAR-10 and CIFAR-100 datasets. |
| Hardware Specification | Yes | Fine-tuning DDPM and DDIM models using ADDT to achieve near-optimal performance requires 50 epochs and approximately 12 hours of training on 4 NVIDIA Ge Force RTX 2080 Ti GPUs. This efficiency matches that of traditional adversarial training approaches and is notably faster than recent adversarial training techniques that utilize diffusion models for dataset augmentation (Wang et al., 2023). However, testing DPDDPM and DPDDIM involves significant computational expense due to the use of Expectation over Transformation (Eo T). For instance, validating 1,024 images on the CIFAR10/CIFAR100 datasets takes approximately 5 hours on the same GPU configuration. (...) For instance, our evaluation with PGD200+Eo T20 on 224 224 images at a resolution of 1024 takes approximately 7 days of computation on 8 NVIDIA RTX 4090 GPUs. |
| Software Dependencies | No | For the CIFAR-10 dataset, we utilize the pre-trained exponential moving average (EMA) diffusion model developed by Ho et al. (2020) (converted to Huggingface Diffusers format by Fang et al. (2023)). |
| Experiment Setup | Yes | Classifier. We train a Wide Res Net-2810 model for 200 epochs following the methods in Yoon et al. (2021); Wang et al. (2022), achieving an accuracy of 95.12% on CIFAR-10 and 76.66% on CIFAR-100 dataset. (...) DBP timestep. For the diffusion forward process, we adopt the same timestep settings as Diff Pure (Nie et al., 2022). In continuous-time models, such as the VPSDE (DDPM++) variant, with the forward time parameter 0 t 1, we set t = 0.1, balancing noise introduction and computational efficiency. For discrete-time models, such as DDPM and DDIM, where t = 0, 1, ..., T, we similarly set the timestep to t = 0.1 T. (...) Robustness evaluation. We assess model robustness using the PGD20+Eo T10 attack (Athalye et al., 2018b). For ℓ -norm attacks, we set the step size α = 2/255 and the maximum perturbation ϵ = 8/255; for ℓ2norm attacks, we use α = 0.1 and ϵ = 0.5. (...) ADDT. ADDT fine-tuning is guided by the pre-trained Wide Res Net-28-10 classifier. For the CIFAR-100 dataset, we fine-tune the CIFAR-10 diffusion model for 100 epochs. In CGPO, we set the hyperparameters to λunit = 0.03, λmin = 0, and λmax = 0.3, and iteratively refine the perturbation δ for 5 steps. |