Few-Shot Adversarial Prompt Learning on Vision-Language Models
Authors: Yiwei Zhou, Xiaobo Xia, Zhiwei Lin, Bo Han, Tongliang Liu
NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We justify our claims through a series of experiments on 11 benchmark datasets covering multiple recognition tasks. |
| Researcher Affiliation | Academia | Yiwei Zhou School of Automation Beijing Institute of Technology EMAIL Xiaobo Xia Sydney AI Centre University of Sydney EMAIL Zhiwei Lin School of Automation Beijing Institute of Technology EMAIL Bo Han Department of Computer Science Hong Kong Baptist University EMAIL Tongliang Liu Sydney AI Centre University of Sydney EMAIL |
| Pseudocode | Yes | A Pipelines of Adversarial Prompt Learning and Testing For a better understanding of the designed algorithm, we describe our adversarial prompt learning and adversarial prompt testing pipeline in Algorithm 1 and Algorithm 2 respectively. |
| Open Source Code | Yes | Code is available at: https://github.com/lionel-w2/FAP. |
| Open Datasets | Yes | To evaluate the proposed method, we align with previous works [28, 33] and utilize 11 diverse image recognition datasets that span multiple vision tasks. Specifically, the datasets include two generic object datasets: Image Net-1K [20] and Caltech101 [32]; a texture recognition dataset: DTD [34]; five fine-grained object recognition datasets: FGVCAircraft [35], Oxford Pets [36], Flowers102 [37], Food101 [38], and Stanford Cars [39]; a scene recognition dataset: SUN397 [40]; an action recognition dataset: UCF101 [41]; and a satellite image classification dataset: Euro SAT [42]. |
| Dataset Splits | No | The paper uses 'test dataset' and a 'few-shot dataset S' for training, but does not explicitly mention a 'validation set' or 'validation split' for hyperparameter tuning or model selection in its experimental setup details. Training is done for a fixed number of epochs. |
| Hardware Specification | Yes | Experiments of adversarial prompt tuning on the Image Net-1K dataset are carried out on a single NVIDIA RTX A40 GPU, while experiments on the other 10 datasets are performed on a single NVIDIA RTX 4090 GPU. |
| Software Dependencies | Yes | All experiments are conducted in an environment running Py Torch 1.10.1 and CUDA 11.3 on Python 3.8. |
| Experiment Setup | Yes | All models are trained for 5 epochs in cross-dataset evaluation and 10 epochs for other benchmark settings by using an SGD optimizer with a momentum of 0.9. The initial learning rate is set at 0.0035. We apply a cosine learning rate scheduler and a warm-up strategy during the first epoch. For adversarial prompt learning, we use token prompts of size 2 in both the vision and text branches across the first 9 transformer blocks. Attacks are generated under ℓ threat model through a 2-step PGD attack, with a perturbation boundary ϵ = 1/255 and a step size α = 1/255, following the methodologies outlined in [11]. |