A Closer Look at Backdoor Attacks on CLIP

Authors: Shuo He, Zhifang Zhang, Feng Liu, Roy Ka-Wei Lee, Bo An, Lei Feng

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a comprehensive empirical study on how backdoor attacks affect CLIP by analyzing the representations of backdoor images. Specifically, based on the methodology of representation decomposing, image representations can be decomposed into a sum of representations across individual image patches, attention heads (AHs), and multi-layer perceptrons (MLPs) in different model layers. By examining the effect of backdoor attacks on model components, we have the following empirical findings. (1) Different backdoor attacks would infect different model components, i.e., local patch-based backdoor attacks mainly affect AHs, while global perturbation-based backdoor attacks mainly affect MLPs. (2) Infected AHs are centered on the last layer, while infected MLPs are decentralized on several late layers. (3) Not all AHs in the last layer are infected and even some AHs could still maintain the original property-specific roles (e.g., color and location ). These observations motivate us to defend against backdoor attacks by detecting infected AHs, repairing their representations, or filtering backdoor samples with too many infected AHs, in the inference stage. Experimental results validate our empirical findings and demonstrate the effectiveness of the defense methods.
Researcher Affiliation Collaboration 1Nanyang Technological University 2Southeast University 3University of Melbourne 4Singapore University of Technology and Design 5 Idealism Technology (Beijing). Correspondence to: Lei Feng <EMAIL>.
Pseudocode Yes The pseudo-code of our methods is shown in Appendix C. Algorithm 1 Our methods of repairing representations or filtering backdoor samples
Open Source Code No The paper does not provide an explicit statement or a direct link to the source code for the methodology described in this paper. While it mentions open-sourced tools used (e.g., Bad Net, Bad CLIP, Clean CLIP, STRIP, SCALE-UP, Te Co, TEXTSPAN) and open-sourced CLIP model, it does not state that *their own* implementation code is released.
Open Datasets Yes We evaluate our methods on Image Net-1K (Russakovsky et al., 2015), Caltech-101 (Fei Fei et al., 2004), and Oxford Pets (Parkhi et al., 2012). More details of these datasets are provided in Appendix B.1. Besides, we select clean image-text pairs from CC3M (Sharma et al., 2018) to fine-tune the backdoored CLIP.
Dataset Splits Yes We conduct this experiment on the Image Net-1K validation dataset, using 20% of the images as the clean validation data. ... we select 500K image-pairs from CC3M (Sharma et al., 2018) and poison 1,500 pairs of them by the strategies of five backdoor attacks. ... In the proposed method, the value of ϵ is set to 0.0025, 0.002, and 0.001 on Image Net1K, Caltech-101, and Oxford Pets, respectively. The value of ζ is set to 5. The proportion of clean validation data is set to 0.2. ... For Clean CLIP... we randomly selected 10,0000 image-text pairs from CC3M as the fine-tuning data. The learning rates were set to 5e-6 for Bad Net, Blended, and Bad CLIP, and 3e-6 for Blended and ISSBA on Image Net-1K. The batch size was 64. The fine-tuning epoch was 10.
Hardware Specification No The paper mentions using "Vi T-B/32 as the backbone" which refers to a model architecture. It also states "Due to the limited storage and computational resources", but no specific hardware details such as GPU models, CPU types, or memory amounts are provided.
Software Dependencies No The paper mentions using specific optimizers (Adam W) and models (Vi T-B/32, CLIP), but it does not specify any software libraries or programming languages with their version numbers (e.g., Python version, PyTorch/TensorFlow version, CUDA version).
Experiment Setup Yes For these backdoor attacks, we utilize the Adam W optimizer with an initial learning rate of 1e-5, applying cosine scheduling over a total of five epochs with a batch size of 128. ... In the proposed method, the value of ϵ is set to 0.0025, 0.002, and 0.001 on Image Net1K, Caltech-101, and Oxford Pets, respectively. The value of ζ is set to 5. The proportion of clean validation data is set to 0.2. We use Vi T-B/32 as the backbone. ... For Clean CLIP... The learning rates were set to 5e-6 for Bad Net, Blended, and Bad CLIP, and 3e-6 for Blended and ISSBA on Image Net-1K. The batch size was 64. The fine-tuning epoch was 10.