Backdooring Vision-Language Models with Out-Of-Distribution Data
Authors: Weimin Lyu, Michael Yao, Saumya Gupta, Lu Pang, Tao Sun, Lingjie Yi, Lijie Hu, Haibin Ling, Chao Chen
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation on image captioning and visual question answering (VQA) tasks confirms the effectiveness of VLOOD, revealing a critical security vulnerability in VLMs and laying the foundation for future research on securing multimodal models against sophisticated threats. Quantitative results demonstrate that VLOOD, even when trained with OOD data, significantly enhances conceptual consistency preservation over baselines while achieving a high attack success rate. |
| Researcher Affiliation | Academia | Weimin Lyu1, Jiachen Yao1, Saumya Gupta1, Lu Pang1, Tao Sun1, Lingjie Yi1, Lijie Hu2 Haibin Ling1, Chao Chen1 1 Stony Brook University, 2 King Abdullah University of Science and Technology |
| Pseudocode | No | The paper describes the methodology using text and mathematical equations, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The codebase is built upon Backdoor Bench (https://github.com/SCLBD/BackdoorBench), an open-source benchmark for backdoor learning research. This refers to a third-party codebase used by the authors, not their own source code for the methodology described in the paper. |
| Open Datasets | Yes | We evaluate the image captioning task on the Flickr8k (Hodosh et al., 2013), Flickr30k (Young et al., 2014), and COCO (Lin et al., 2014) datasets, and the VQA task on the OK-VQA (Marino et al., 2019) and VQAv2 (Goyal et al., 2017) datasets. |
| Dataset Splits | Yes | We utilize only 3000 samples of clean data D = {(I, T, O)}, and generate another 3000 poisoned data samples D = {( I, T, O)} from D. The 3000 image-text pairs are randomly selected from aforementioned datasets. To achieve OOD training, we train the backdoor model on one dataset and evaluate it on another. Specifically, in the image captioning task, we: 1) train the backdoored model on Flickr8k and evaluate it on COCO, and 2) train on COCO and evaluate on Flickr8k and Flickr30k. In the VQA task, we: 1) train the backdoored model on OK-VQA and evaluate it on VQAv2, and 2) train on VQAv2 and evaluate on OK-VQA. |
| Hardware Specification | Yes | The backdoored model is trained on an A6000 GPU with 48 GB of memory. |
| Software Dependencies | No | The codebase is built upon Backdoor Bench (https://github.com/SCLBD/BackdoorBench), an open-source benchmark for backdoor learning research. This mentions a codebase but does not provide specific version numbers for key software components or libraries. |
| Experiment Setup | No | The paper mentions that 'during fine-tuning, only the Q-Former adaptor is trained, while the image encoder and LLM remain frozen' and describes the dynamic adjustment mechanism for 'λ'. It also provides 'Converge Epoch' values for different 'λ' initializations in Table 11. However, specific values for common hyperparameters like learning rate, batch size, optimizer type, or other detailed training configurations are not explicitly provided in the main text. |