DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models
Authors: Yang Sui, Huy Phan, Jinqi Xiao, Tianfang Zhang, Zijie Tang, Cong Shi, Yan Wang, Yingying Chen, Bo Yuan
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluations across various diffusion models and datasets demonstrate the effectiveness of the proposed trigger detection and detection-evading attack strategy. For trigger detection, our distribution discrepancy-based solution can achieve a 100% detection rate for the Trojan triggers used in the existing works. For evading trigger detection, our proposed stealthy trigger design approach performs end-to-end learning to make the distribution of poisoned noise input approach that of benign noise, enabling nearly 100% detection pass rate with very high attack and benign performance for the backdoored diffusion models. |
| Researcher Affiliation | Academia | 1Rutgers University 2Temple University 3New Jersey Institute of Technology |
| Pseudocode | Yes | Algorithm 1 The Proposed 2-Step Training Scheme |
| Open Source Code | No | The paper mentions that "The pre-trained models of CIFAR-10 and Celeb A datasets are from repository pesserpytorch/diffusion and ermongroup/ddim". This refers to third-party pre-trained models and not the authors' own implementation code. There is no explicit statement or link provided for the authors' source code. |
| Open Datasets | Yes | We evaluate the performance of the proposed detection method and the detection-evading trigger for DDPM Ho et al. (2020) and DDIM Song et al. (2020a) diffusion models on CIFAR-10 (32 32) Krizhevsky et al. (2009) and Celeb A (64 64) Liu et al. (2015) datasets. |
| Dataset Splits | Yes | The benign performance is evaluated on 50K samples via measuring Frechet Inception Distance (FID) Heusel et al. (2017), which reveals the similarity between two sets of images. A lower FID score indicates the higher quality of the generated images. The attack performance is evaluated on 10K samples in terms of Attack Success Rate (ASR). |
| Hardware Specification | Yes | All the experiments are conducted on NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions the use of "Adam optimizer Kingma & Ba (2015)" but does not specify version numbers for any software libraries, programming languages, or frameworks used for implementation (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | When training the detection-evading trigger (Phase 1), an Embedding layer with the same shape of input noise (3 32 32 for CIFAR-10 dataset, and 3 64 64 for Celeb A dataset) is used for trigger learning with γ = 0.6. The threshold is set as ϕT h = 0.01 and ϕT h = 0.005 for the CIFAR10 and Celeb A datasets, respectively. The training process adopts Adam optimizer Kingma & Ba (2015) with 50k training steps, 2 10 3 learning rate and scaling factor τ as 104. After that, during the training procedure for the backdoored diffusion model (Phase 2), we follow the standard training procedure using Adam optimizer, 2 10 4 learning rate, batch size as 256, and 100k training steps. Also, the number of bins is set as 50 for both regular histogram h( ) and differentiable histogram hd( ). The smoothness parameter is set as ω = 6 for the Sigmoid function in hd( ) to approximate the step function and histogram h( ). |