Detecting Backdoor Samples in Contrastive Language Image Pretraining
Authors: Hanxun Huang, Sarah Erfani, Yige Li, Xingjun Ma, James Bailey
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments also reveal that an unintentional backdoor already exists in the original CC3M dataset and has been trained into a popular open-source model released by Open CLIP. Based on our detector, one can clean up a million-scale web dataset (e.g., CC3M) efficiently within 15 minutes using 4 Nvidia A100 GPUs. The code is publicly available in our Git Hub repository. |
| Researcher Affiliation | Academia | 1School of Computing and Information Systems, The University of Melbourne, Australia 2School of Computing and Information Systems, Singapore Management University, Singapore 3School of Computer Science, Fudan University, China EMAIL; {yigeli}@smu.edu.sg;{xingjunma}@fudan.edu.cn. |
| Pseudocode | Yes | We present the pseudo-code in Appendix A. Algorithm 1 Backdoor Sample Detection With Local Outlier Methods |
| Open Source Code | Yes | The code is publicly available in our Git Hub repository. ... Open-source code is available here*. |
| Open Datasets | Yes | We conduct our experiment on the CC3M dataset (Sharma et al., 2018). The evaluation is conducted on Image Net (Deng et al., 2009) with the zero-shot classifier (Radford et al., 2021). ... We provide evaluations with a larger dataset, the CC12M (Changpinyo et al., 2021), in Appendix B.7, which shows a consistent performance for local outlier methods. |
| Dataset Splits | Yes | Here, we experiment to remove 10% of the data from a poisoned CC3M dataset according to the detection score using DAO. ... In Figure 3a, we present the sensitivity of the defense performance to different filtering percentages. For the patch trigger, removing 1% is sufficient to mitigate the backdoor threat. Additional results for other triggers are in Appendix B.3. They show that removing 5% is sufficient to mitigate the backdoor threat of different kinds of triggers. |
| Hardware Specification | Yes | Based on our detector, one can clean up a million-scale web dataset (e.g., CC3M) efficiently within 15 minutes using 4 Nvidia A100 GPUs. ... We conducted our experiments on Nvidia A100GPUs with Py Torch implementation. |
| Software Dependencies | No | We conducted our experiments on Nvidia A100GPUs with Py Torch implementation. ... We use Adam W optimizer (Loshchilov & Hutter, 2019), weight decay is set to 0.2, batch size of 1024 and train for 30 epochs. We use Res Net-50 (He et al., 2016) and Vi T-B-16 (Dosovitskiy et al., 2021) for the image encoder and transformer (Vaswani et al., 2017) for the text encoder. ... For Adam (Kingma & Ba, 2014) as the optimizer for m and , the learning rate is set to 0.05, β1 and β2 are set to 0.1. |
| Experiment Setup | Yes | We use a learning rate of 0.001, with Adam W optimizer (Loshchilov & Hutter, 2019), weight decay is set to 0.2, batch size of 1024 and train for 30 epochs. ... For CD, we find that in SSL, backdoor data has a higher L1 norm of the mask instead of lower in a supervised setting. As a result, we use a higher L1 norm of the mask to indicate it is more likely to be a backdoor data. We use 100 optimization steps, α set to 0.001 and β set to 100 for CD. For ABL, we use the average loss value for the first 10 epochs for each sample. For i Forest (Liu et al., 2008), we set the number of trees in the ensemble to 100. For LID, the k is set to 16. For k-dist, SLOF and DAO, the k is set to 16. We use batch size of 2048 for running the detection algorithm. |