DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models

Authors: Yang Sui, Huy Phan, Jinqi Xiao, Tianfang Zhang, Zijie Tang, Cong Shi, Yan Wang, Yingying Chen, Bo Yuan

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical evaluations across various diffusion models and datasets demonstrate the effectiveness of the proposed trigger detection and detection-evading attack strategy. For trigger detection, our distribution discrepancy-based solution can achieve a 100% detection rate for the Trojan triggers used in the existing works. For evading trigger detection, our proposed stealthy trigger design approach performs end-to-end learning to make the distribution of poisoned noise input approach that of benign noise, enabling nearly 100% detection pass rate with very high attack and benign performance for the backdoored diffusion models.
Researcher Affiliation Academia 1Rutgers University 2Temple University 3New Jersey Institute of Technology
Pseudocode Yes Algorithm 1 The Proposed 2-Step Training Scheme
Open Source Code No The paper mentions that "The pre-trained models of CIFAR-10 and Celeb A datasets are from repository pesserpytorch/diffusion and ermongroup/ddim". This refers to third-party pre-trained models and not the authors' own implementation code. There is no explicit statement or link provided for the authors' source code.
Open Datasets Yes We evaluate the performance of the proposed detection method and the detection-evading trigger for DDPM Ho et al. (2020) and DDIM Song et al. (2020a) diffusion models on CIFAR-10 (32 32) Krizhevsky et al. (2009) and Celeb A (64 64) Liu et al. (2015) datasets.
Dataset Splits Yes The benign performance is evaluated on 50K samples via measuring Frechet Inception Distance (FID) Heusel et al. (2017), which reveals the similarity between two sets of images. A lower FID score indicates the higher quality of the generated images. The attack performance is evaluated on 10K samples in terms of Attack Success Rate (ASR).
Hardware Specification Yes All the experiments are conducted on NVIDIA RTX A6000 GPUs.
Software Dependencies No The paper mentions the use of "Adam optimizer Kingma & Ba (2015)" but does not specify version numbers for any software libraries, programming languages, or frameworks used for implementation (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes When training the detection-evading trigger (Phase 1), an Embedding layer with the same shape of input noise (3 32 32 for CIFAR-10 dataset, and 3 64 64 for Celeb A dataset) is used for trigger learning with γ = 0.6. The threshold is set as ϕT h = 0.01 and ϕT h = 0.005 for the CIFAR10 and Celeb A datasets, respectively. The training process adopts Adam optimizer Kingma & Ba (2015) with 50k training steps, 2 10 3 learning rate and scaling factor τ as 104. After that, during the training procedure for the backdoored diffusion model (Phase 2), we follow the standard training procedure using Adam optimizer, 2 10 4 learning rate, batch size as 256, and 100k training steps. Also, the number of bins is set as 50 for both regular histogram h( ) and differentiable histogram hd( ). The smoothness parameter is set as ω = 6 for the Sigmoid function in hd( ) to approximate the step function and histogram h( ).