Fine-grained Prompt Screening: Defending Against Backdoor Attack on Text-to-Image Diffusion Models

Authors: Yiran Xu, Nan Zhong, Guobiao Li, Anda Cheng, Yinggui Wang, Zhenxing Qian, Xinpeng Zhang

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments 5.1 Experimental Settings Attack Baselines. We consider two types of attack methods in our experimental, where Rickrolling [Struppek et al., 2023] and Villan Diffusion [Chou et al., 2024] are universal backdoor attacks, and Personalization [Huang et al., 2024] and Evil Edit [Wang et al., 2024a] are class-specific backdoor attacks. ... Metrics. Following the prior works on backdoor detection [Guan et al., 2024; Wang et al., 2024c], we adopt three popular metrics for evaluating the effectiveness of our detection method: Precision, Recall, and F1 Score. We also report the inference time for each method. ... 5.2 Detection Results For each backdoor attack method, we train six backdoor models with different triggers and targets. We then evaluate the performance of our detection method on each model. ... 5.4 Ablation Study Impact of Scaling Threshold t In this section, we study the effect of different thresholds on the detection. Figure 7 presents the average precision, recall, and F1 score of the detector at different threshold values.
Researcher Affiliation Collaboration Yiran Xu1 , Nan Zhong1 , Guobiao Li1 , Anda Cheng2 , Yinggui Wang2 , Zhenxing Qian1 and Xinpeng Zhang1 1School of Computer Science, Fudan University 2Ant Group EMAIL, EMAIL, EMAIL,EMAIL
Pseudocode No The paper describes the methodology in Section 4 and illustrates the pipeline in Figure 4, but it does not include a formally structured pseudocode or algorithm block.
Open Source Code No The paper does not contain any explicit statements about releasing code, nor does it provide any links to a code repository in the main text or supplementary materials.
Open Datasets Yes For the training datasets, we choose Celeb A-HQ-Dialog[Jiang et al., 2021] for Villan Diffusion, Pokemon [Pinkney, 2022] for Rickrolling and Evil Edit, and Dreambooth [Ruiz et al., 2023] for Personalization. ... For evaluations, we randomly select 300 clean prompts from MS COCO 2017 validation dataset[Lin et al., 2014] and construct 300 triggered prompts for each backdoor model. ... First, we utilized a pre-trained T2I diffusion model and randomly selected 1,000 clean prompts from the MSCOCO dataset [Lin et al., 2014].
Dataset Splits Yes For evaluations, we randomly select 300 clean prompts from MS COCO 2017 validation dataset[Lin et al., 2014] and construct 300 triggered prompts for each backdoor model. ... To determine the appropriate threshold, we followed the procedure outlined below. First, we utilized a pre-trained T2I diffusion model and randomly selected 1,000 clean prompts from the MSCOCO dataset [Lin et al., 2014].
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or cloud configurations) used for running the experiments.
Software Dependencies No The paper mentions using 'spaCy' for POS tagging, 'BERT [Kenton and Toutanova, 2019]' for similarity calibration, and 'stable diffusion v1.4 [Ramesh et al., 2022]' as the victim model. However, it does not specify version numbers for spaCy or for any other software libraries or dependencies used in their own implementation, which is required for reproducibility.
Experiment Setup Yes Defense Settings. The defender has access to a subset of benign samples. We adopt fixed hyperparameters across all attacks and datasets: t = 0.29 (estimated from 1000 benign prompts) and m = 2.