Attribute-based Visual Reprogramming for Vision-Language Models
Authors: Chengyi Cai, Zesheng Ye, Lei Feng, Jianzhong Qi, Feng Liu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, it achieves superior performance in 12 downstream tasks for both Vi T-based and Res Net-based CLIP. The success of Attr VR facilitates more effective integration of VR from unimodal vision models into vision-language models. Our code is available at https://github.com/tmlr-group/Attr VR. Experiments conducted on 12 widely-used benchmarks demonstrate the effectiveness of Attr VR in Section 6. Attr VR consistently outperforms other VR methods when using different encoder backbones or fewer training samples. Visualizations of the embedding space and individual samples with their top-matched attributes also substantiate the efficacy of Attr VR. Additional ablation, hyper-parameter (see Section 6) and aggregation studies (see Appendix C.3) further examine the contributions of different components within Attr VR. |
| Researcher Affiliation | Collaboration | Chengyi Cai1 Zesheng Ye1 Lei Feng2,3 Jianzhong Qi1 Feng Liu1 1The University of Melbourne 2Southeast University 3Idealism Technology (Beijing) |
| Pseudocode | Yes | Algorithm 1 Training Pipeline of Attr VR |
| Open Source Code | Yes | Our code is available at https://github.com/tmlr-group/Attr VR. |
| Open Datasets | Yes | Experiments conducted on 12 widely-used benchmarks demonstrate the effectiveness of Attr VR in Section 6. ... All image datasets are publicly available. Detailed task information and the batch size used for training VR are provided in Table 4. |
| Dataset Splits | Yes | This paper establishes benchmarks for downstream classification tasks following prior work (Oh et al., 2023), employing the same methodology to split the 16-shot training, validation, and test sets. |
| Hardware Specification | Yes | Experiments are conducted on a single A100 GPU. |
| Software Dependencies | No | The paper mentions using 'GPT-3.5 (Brown, 2020)' for attribute generation and 'SGD optimizer (Harold et al., 1997)' with a 'cosine annealing learning rate scheduler (Loshchilov & Hutter, 2016)' for training. While these are specific algorithms or models, the paper does not list specific software libraries or frameworks with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x) that are crucial for replication. |
| Experiment Setup | Yes | Regarding hyper-parameters in Attr VR, we set k = 3 and λ = 0.5 and will discuss their impact. ... For all VR baseline methods compared in the paper, we adopted the following uniform training settings: an initial learning rate of 40, a momentum of 0.9 using the SGD optimizer (Harold et al., 1997), and a cosine annealing learning rate scheduler (Loshchilov & Hutter, 2016). The total number of learning epochs was set to 200. |