Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective
Authors: Sahil Verma, Gantavya Bhatt, Avi Schwarzschild, Soumye Singhal, Arnav Mohanty Das, Chirag Shah, John P Dickerson, Pin-Yu Chen, Jeff Bilmes
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we demonstrate that the efficacy of Clean CLIP in mitigating backdoors is highly dependent on the particular objective used during model pre-training. We observe that adding self-supervised objective to pre-training, that leads to higher zero-shot classification performance, correlate with harder to remove backdoors behaviors. We show this by training multimodal models on two large datasets consisting of 3 million (CC3M) and 6 million (CC6M) datapoints, under various pre-training objectives, followed by poison removal using Clean CLIP. |
| Researcher Affiliation | Collaboration | Sahil Verma EMAIL University of Washington Gantavya Bhatt EMAIL University of Washington Avi Schwarzschild EMAIL Carnegie Mellon University Soumye Singhal EMAIL Nvidia Arnav Das EMAIL University of Washington Chirag Shah EMAIL University of Washington John P Dickerson EMAIL University of Maryland Pin-Yu Chen EMAIL IBM Research Jeff Bilmes EMAIL University of Washington |
| Pseudocode | No | The paper describes methodologies in prose and mathematical formulations, but it does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is open-sourced at https://github.com/vsahil/attack-cleanclip. |
| Open Datasets | Yes | Conceptual Captions 3M (CC3M) (Sharma et al., 2018): This dataset has 3M image-text paired datapoints. 2. Conceptual Caption 6M (CC6M): This dataset has 6M image-text paired datapoints from the CC12M dataset (Changpinyo et al., 2021)... The models are evaluated for their Top-1 zero-shot accuracy on the Imagenet-1K validation set (referred to as Imagenet hereafter). |
| Dataset Splits | Yes | Using the same settings as Clean CLIP, we introduce the trigger in 1,500 randomly sampled datapoints for the CC3M dataset and 3,000 randomly sampled datapoints for the CC6M dataset (a mere 0.05% of the training datapoints). ... We clean the poisoned model by finetuning it on a 100K, guaranteed to be poison-free, image-text pairs for 20 epochs... The models are evaluated for their Top-1 zero-shot accuracy on the Imagenet-1K validation set... |
| Hardware Specification | Yes | We train models for 64 epochs using 8 Nvidia A100 GPUs. |
| Software Dependencies | No | The paper mentions using 'Adam W optimizer' and 'FAISS', but does not specify version numbers for these or other key software components like programming languages or deep learning frameworks. |
| Experiment Setup | Yes | We train models for 64 epochs using 8 Nvidia A100 GPUs. The initial learning rate of 1e 3 with cosine scheduling is used when trained from scratch and 5e 7 when finetuned from a checkpoint. We use Adam W optimizer with 10,000 warmup steps (Loshchilov & Hutter, 2017). Models trained with Lpre MMCL use a batch size of 256, whereas models trained with Lpre MMCL + Lpre SSL use a batch size of 128. Please refer to Appendix A for the loss dynamics. ... We clean the poisoned model by finetuning it on a 100K, guaranteed to be poison-free, image-text pairs for 20 epochs using a batch size of 128 and Adam W as the optimizer. We perform extensive hyperparameter search and use various learning rates (as many as 8 in some experiments and 14 in others, all with cosine scheduling and 50 warmup steps) for this process. Please refer to Appendix D for the set of learning rates explored for the cleaning procedure. |