reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective

Authors: Sahil Verma, Gantavya Bhatt, Avi Schwarzschild, Soumye Singhal, Arnav Mohanty Das, Chirag Shah, John P Dickerson, Pin-Yu Chen, Jeff Bilmes

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we demonstrate that the efficacy of Clean CLIP in mitigating backdoors is highly dependent on the particular objective used during model pre-training. We observe that adding self-supervised objective to pre-training, that leads to higher zero-shot classification performance, correlate with harder to remove backdoors behaviors. We show this by training multimodal models on two large datasets consisting of 3 million (CC3M) and 6 million (CC6M) datapoints, under various pre-training objectives, followed by poison removal using Clean CLIP.
Researcher Affiliation	Collaboration	Sahil Verma EMAIL University of Washington Gantavya Bhatt EMAIL University of Washington Avi Schwarzschild EMAIL Carnegie Mellon University Soumye Singhal EMAIL Nvidia Arnav Das EMAIL University of Washington Chirag Shah EMAIL University of Washington John P Dickerson EMAIL University of Maryland Pin-Yu Chen EMAIL IBM Research Jeff Bilmes EMAIL University of Washington
Pseudocode	No	The paper describes methodologies in prose and mathematical formulations, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is open-sourced at https://github.com/vsahil/attack-cleanclip.
Open Datasets	Yes	Conceptual Captions 3M (CC3M) (Sharma et al., 2018): This dataset has 3M image-text paired datapoints. 2. Conceptual Caption 6M (CC6M): This dataset has 6M image-text paired datapoints from the CC12M dataset (Changpinyo et al., 2021)... The models are evaluated for their Top-1 zero-shot accuracy on the Imagenet-1K validation set (referred to as Imagenet hereafter).
Dataset Splits	Yes	Using the same settings as Clean CLIP, we introduce the trigger in 1,500 randomly sampled datapoints for the CC3M dataset and 3,000 randomly sampled datapoints for the CC6M dataset (a mere 0.05% of the training datapoints). ... We clean the poisoned model by finetuning it on a 100K, guaranteed to be poison-free, image-text pairs for 20 epochs... The models are evaluated for their Top-1 zero-shot accuracy on the Imagenet-1K validation set...
Hardware Specification	Yes	We train models for 64 epochs using 8 Nvidia A100 GPUs.
Software Dependencies	No	The paper mentions using 'Adam W optimizer' and 'FAISS', but does not specify version numbers for these or other key software components like programming languages or deep learning frameworks.
Experiment Setup	Yes	We train models for 64 epochs using 8 Nvidia A100 GPUs. The initial learning rate of 1e 3 with cosine scheduling is used when trained from scratch and 5e 7 when finetuned from a checkpoint. We use Adam W optimizer with 10,000 warmup steps (Loshchilov & Hutter, 2017). Models trained with Lpre MMCL use a batch size of 256, whereas models trained with Lpre MMCL + Lpre SSL use a batch size of 128. Please refer to Appendix A for the loss dynamics. ... We clean the poisoned model by finetuning it on a 100K, guaranteed to be poison-free, image-text pairs for 20 epochs using a batch size of 128 and Adam W as the optimizer. We perform extensive hyperparameter search and use various learning rates (as many as 8 in some experiments and 14 in others, all with cosine scheduling and 50 warmup steps) for this process. Please refer to Appendix D for the set of learning rates explored for the cleaning procedure.