reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fine-grained Prompt Screening: Defending Against Backdoor Attack on Text-to-Image Diffusion Models

Authors: Yiran Xu, Nan Zhong, Guobiao Li, Anda Cheng, Yinggui Wang, Zhenxing Qian, Xinpeng Zhang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiments 5.1 Experimental Settings Attack Baselines. We consider two types of attack methods in our experimental, where Rickrolling [Struppek et al., 2023] and Villan Diffusion [Chou et al., 2024] are universal backdoor attacks, and Personalization [Huang et al., 2024] and Evil Edit [Wang et al., 2024a] are class-specific backdoor attacks. ... Metrics. Following the prior works on backdoor detection [Guan et al., 2024; Wang et al., 2024c], we adopt three popular metrics for evaluating the effectiveness of our detection method: Precision, Recall, and F1 Score. We also report the inference time for each method. ... 5.2 Detection Results For each backdoor attack method, we train six backdoor models with different triggers and targets. We then evaluate the performance of our detection method on each model. ... 5.4 Ablation Study Impact of Scaling Threshold t In this section, we study the effect of different thresholds on the detection. Figure 7 presents the average precision, recall, and F1 score of the detector at different threshold values.
Researcher Affiliation	Collaboration	Yiran Xu1 , Nan Zhong1 , Guobiao Li1 , Anda Cheng2 , Yinggui Wang2 , Zhenxing Qian1 and Xinpeng Zhang1 1School of Computer Science, Fudan University 2Ant Group EMAIL, EMAIL, EMAIL,EMAIL
Pseudocode	No	The paper describes the methodology in Section 4 and illustrates the pipeline in Figure 4, but it does not include a formally structured pseudocode or algorithm block.
Open Source Code	No	The paper does not contain any explicit statements about releasing code, nor does it provide any links to a code repository in the main text or supplementary materials.
Open Datasets	Yes	For the training datasets, we choose Celeb A-HQ-Dialog[Jiang et al., 2021] for Villan Diffusion, Pokemon [Pinkney, 2022] for Rickrolling and Evil Edit, and Dreambooth [Ruiz et al., 2023] for Personalization. ... For evaluations, we randomly select 300 clean prompts from MS COCO 2017 validation dataset[Lin et al., 2014] and construct 300 triggered prompts for each backdoor model. ... First, we utilized a pre-trained T2I diffusion model and randomly selected 1,000 clean prompts from the MSCOCO dataset [Lin et al., 2014].
Dataset Splits	Yes	For evaluations, we randomly select 300 clean prompts from MS COCO 2017 validation dataset[Lin et al., 2014] and construct 300 triggered prompts for each backdoor model. ... To determine the appropriate threshold, we followed the procedure outlined below. First, we utilized a pre-trained T2I diffusion model and randomly selected 1,000 clean prompts from the MSCOCO dataset [Lin et al., 2014].
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or cloud configurations) used for running the experiments.
Software Dependencies	No	The paper mentions using 'spaCy' for POS tagging, 'BERT [Kenton and Toutanova, 2019]' for similarity calibration, and 'stable diffusion v1.4 [Ramesh et al., 2022]' as the victim model. However, it does not specify version numbers for spaCy or for any other software libraries or dependencies used in their own implementation, which is required for reproducibility.
Experiment Setup	Yes	Defense Settings. The defender has access to a subset of benign samples. We adopt fixed hyperparameters across all attacks and datasets: t = 0.29 (estimated from 1000 benign prompts) and m = 2.