reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models

Authors: Yang Sui, Huy Phan, Jinqi Xiao, Tianfang Zhang, Zijie Tang, Cong Shi, Yan Wang, Yingying Chen, Bo Yuan

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical evaluations across various diffusion models and datasets demonstrate the effectiveness of the proposed trigger detection and detection-evading attack strategy. For trigger detection, our distribution discrepancy-based solution can achieve a 100% detection rate for the Trojan triggers used in the existing works. For evading trigger detection, our proposed stealthy trigger design approach performs end-to-end learning to make the distribution of poisoned noise input approach that of benign noise, enabling nearly 100% detection pass rate with very high attack and benign performance for the backdoored diffusion models.
Researcher Affiliation	Academia	1Rutgers University 2Temple University 3New Jersey Institute of Technology
Pseudocode	Yes	Algorithm 1 The Proposed 2-Step Training Scheme
Open Source Code	No	The paper mentions that "The pre-trained models of CIFAR-10 and Celeb A datasets are from repository pesserpytorch/diffusion and ermongroup/ddim". This refers to third-party pre-trained models and not the authors' own implementation code. There is no explicit statement or link provided for the authors' source code.
Open Datasets	Yes	We evaluate the performance of the proposed detection method and the detection-evading trigger for DDPM Ho et al. (2020) and DDIM Song et al. (2020a) diffusion models on CIFAR-10 (32 32) Krizhevsky et al. (2009) and Celeb A (64 64) Liu et al. (2015) datasets.
Dataset Splits	Yes	The benign performance is evaluated on 50K samples via measuring Frechet Inception Distance (FID) Heusel et al. (2017), which reveals the similarity between two sets of images. A lower FID score indicates the higher quality of the generated images. The attack performance is evaluated on 10K samples in terms of Attack Success Rate (ASR).
Hardware Specification	Yes	All the experiments are conducted on NVIDIA RTX A6000 GPUs.
Software Dependencies	No	The paper mentions the use of "Adam optimizer Kingma & Ba (2015)" but does not specify version numbers for any software libraries, programming languages, or frameworks used for implementation (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	When training the detection-evading trigger (Phase 1), an Embedding layer with the same shape of input noise (3 32 32 for CIFAR-10 dataset, and 3 64 64 for Celeb A dataset) is used for trigger learning with γ = 0.6. The threshold is set as ϕT h = 0.01 and ϕT h = 0.005 for the CIFAR10 and Celeb A datasets, respectively. The training process adopts Adam optimizer Kingma & Ba (2015) with 50k training steps, 2 10 3 learning rate and scaling factor τ as 104. After that, during the training procedure for the backdoored diffusion model (Phase 2), we follow the standard training procedure using Adam optimizer, 2 10 4 learning rate, batch size as 256, and 100k training steps. Also, the number of bins is set as 50 for both regular histogram h( ) and differentiable histogram hd( ). The smoothness parameter is set as ω = 6 for the Sigmoid function in hd( ) to approximate the step function and histogram h( ).