LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation

Authors: Can Jin, Ying Li, Mingyu Zhao, Shiyu Zhao, Zhenting Wang, Xiaoxiao He, Ligong Han, Tong Che, Dimitris Metaxas

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across seven network architectures and four datasets demonstrate significant improvements in both performance and efficiency compared to state-of-the-art visual prompting methods, achieving up to 6 faster training times, utilizing 18 fewer visual prompt parameters, and delivering a 3.1% improvement in performance. We perform extensive experiments across a wide range of large-scale models and datasets.
Researcher Affiliation Collaboration Can Jin1, Ying Li2, Mingyu Zhao1, Shiyu Zhao1, Zhenting Wang1, Xiaoxiao He1, Ligong Han3,4, Tong Che5, Dimitris N. Metaxas1 1Rutgers University, 2Zhejiang University, 3Red Hat AI Innovation, 4MIT-IBM Watson AI Lab, 5NVIDIA Research
Pseudocode No The paper describes the methodology in prose and mathematical formulations (e.g., Equation 1 and 2), but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/jincan333/Lo R-VP.
Open Datasets Yes For pre-training, we utilize the Image Net-1K dataset (Deng et al., 2009), which contains 1K classes and 1.3M images, the Image Net-21K-P dataset (Ridnik et al., 2021), comprising 11K classes and 12M images, and the Image Net-21K dataset (Deng et al., 2009), which includes 21K classes and 14M images. We evaluate the effectiveness and efficiency of LOR-VP across four downstream datasets: Image Net-1K, Tiny-Image Net (Le & Yang, 2015), and CIFAR-10/100 (Krizhevsky et al., 2009). To assess the out-of-distribution robustness of LOR-VP , we conduct experiments on Image Net-R (Hendrycks et al., 2021a), Image Net-Sketch (Wang et al., 2019), Image Net-A (Hendrycks et al., 2021b), and Image Net-V2 (Recht et al., 2019). Additional details about the datasets are in Table 6.
Dataset Splits Yes Additional details about the datasets are in Table 6. Table 6: Dataset Information. Dataset Original Resolution # Training Set Images # Test Set Images # Classes Image Net-1K (Deng et al., 2009) Varies 1.3M 50K 1,000 Tiny-Image Net (Le & Yang, 2015) 64 64 100K 10K 200 CIFAR100 (Krizhevsky et al., 2009) 32 32 50K 10K 100 CIFAR10 (Krizhevsky et al., 2009) 32 32 50K 10K 10
Hardware Specification Yes All experiments are conducted on NVIDIA Quadro RTX8000 GPUs with 48GB of memory.
Software Dependencies No The weights for these models are all publicly available through the official Py Torch Model Zoo or the Hugging Face Timm Library. The paper mentions software like PyTorch and Timm but does not provide specific version numbers for these dependencies.
Experiment Setup Yes For LOR-VP , we resize all input images to 224 224 and use a rank of 4 in our VP design. As a result, the two sets of parameters in LOR-VP have dimensions of 3 224 4 and 3 4 224, respectively, meaning that the total number of parameters in the visual prompts is only 5K. The optimal hyperparameters for LOR-VP are determined through grid search. Table 8: Implementation Details. Network Pre-trained Data Downstream Data Resolution Optimizer LR Label Mapping LOR-VP Rank Epochs Batch Size Res Net-18 Image Net-1K CIFAR100 224 224 SGD 0.02 Linear Probing 4 20 256