reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Test-Time Multimodal Backdoor Detection by Contrastive Prompting

Authors: Yuwei Niu, Shuo He, Qi Wei, Zongyu Wu, Feng Liu, Lei Feng

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate that our proposed BDet CLIP is superior to state-of-the-art backdoor detection methods, in terms of both effectiveness and efficiency. (Abstract) and the entire Section 4 Experiments.
Researcher Affiliation	Academia	1Chongqing University 2Nanyang Technological University 3Penn State University 4University of Melbourne 5Southeast University. Correspondence to: Lei Feng <EMAIL>.
Pseudocode	Yes	Algorithm 1 BDet CLIP
Open Source Code	No	The official open-sourced codes for STRIP (Gao et al., 2019) can be found at: https://github.com/garrisongys/STRIP. ...The official open-sourced codes for SCALE-UP (Guo et al., 2023) can be found at: https://github.com/Junfeng Go/SCALE-UP. ...The official open-sourced codes for Te Co (Liu et al., 2023) can be found at: https://github.com/CGCL-codes/Te Co. (These are for comparison methods, not the paper's own work.) The paper does not explicitly state that its own code for BDet CLIP is released or provide a link.
Open Datasets	Yes	In the experiment, we evaluate BDet CLIP on various downstream classification datasets including Image Net-1K (Russakovsky et al., 2015), Food-101 (Bossard et al., 2014) and Caltech-101 (Fei-Fei et al., 2004). ... Besides, we select target backdoored samples from CC3M (Sharma et al., 2018) which is a popular multimodal pre-training dataset.
Dataset Splits	Yes	In our experiment, we utilized the validation set of Image Net-1K (Russakovsky et al., 2015), along with the test sets of Food-101 (Bossard et al., 2014) and Caltech101 (Fei-Fei et al., 2004). By using a fixed backdoor ratio (0.3) on different downstream datasets in the evaluation, there are 15,000 (out of 50,000) backdoored images on Image Net-1K, 7,575 (out of 25,250) backdoored images on Food-101, and 740 (out of 2,465) backdoored images on Caltech-101.
Hardware Specification	Yes	All experiments are conducted on 8 NVIDIA 3090 GPUs.
Software Dependencies	Yes	Specifically, we first prompt the GPT-4 (Achiam et al., 2023) to generate class-related (or class-perturbed random) description texts...Also, using open-source models (e.g., LLa MA3-8B (Dubey et al., 2024) and Mistral-7B-Instruct-v0.2 (Jiang et al., 2023)) as alternatives.
Experiment Setup	Yes	We finetune the pretrained model for 5 epochs with an initial learning rate of 1e-6 with cosine scheduling and 50 warmup steps and use Adam W as the optimizer. ... We trained for 64 epochs with a batch size of 128, an initial learning rate of 0.0005 for cosine scheduling, and 10000 warm-up steps for the Adam W optimizer.