Canonical Rank Adaptation: An Efficient Fine-Tuning Strategy for Vision Transformers
Authors: Lokesh Veeramacheneni, Moritz Wolter, Hilde Kuehne, Juergen Gall
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, Ca RA outperforms existing Parameter-Efficient Fine-Tuning (PEFT) methods in visual classification benchmarks such as the Visual Task Adaptation Benchmark (VTAB)-1k and the Fine-Grained Visual Categorization (FGVC) benchmark. |
| Researcher Affiliation | Collaboration | 1University of Bonn 2Tuebingen AI Center 3MIT-IBM Watson AI Lab 4Lamarr Institute for Machine Learning and Artificial Intelligence. Correspondence to: Lokesh Veeramacheneni <EMAIL>. |
| Pseudocode | No | The paper provides mathematical derivations for gradients in Section 3.3 and Appendix A, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code is available at https://github.com/Bonn Bytes/Ca RA. |
| Open Datasets | Yes | To evaluate the performance of Ca RA, we follow the experimental setup from (Jia et al., 2022) and benchmark on all VTAB-1k datasets (Zhai et al., 2019). FGVC is a collection of five large datasets: CUB-200-2011, NABirds, Oxford Flowers, Stanford Dogs and Stanford Cats. Following Kopiczko et al. (2024), we fine-tune on CIFAR100, Food101, Flowers102, and Resisc45 using 10 randomly sampled training examples per class. |
| Dataset Splits | Yes | Ca RA is trained on a subset of 1000 samples with an 80-20 split for training and validation, while the original test set is used for evaluation. The validation split is done with statistics from (Jia et al., 2022) with seed 0. Following Kopiczko et al. (2024), we fine-tune on CIFAR100, Food101, Flowers102, and Resisc45 using 10 randomly sampled training examples per class. Evaluation is performed on the CIFAR100, Food101, and Flowers102 test sets, and on the remaining samples for Resisc45. Further implementation and hyperparameter details are provided in Section C.4. ... We use the numpy random choice with seed 6 for sampling to ensure reproducibility. |
| Hardware Specification | Yes | We fine-tuned the Vi T on one Nvidia GA100 GPU for the VTAB-1k benchmark and one Nvidia H100 GPU for the FGVC benchmark. For evaluation, we use an Nvidia RTX A5000. In the case of language experiments, we use a maximum of 8 Nvidia GA100 GPUs for fine-tuning and evaluation. |
| Software Dependencies | No | The paper mentions software like Py Torch (Paszke et al., 2017), Tensorly (Kossaifi et al., 2019), and Py Torch Image Models (Wightman, 2019) but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We present the hyperparameters, such as rank, in Table 8 of the Appendix, and additional information on the datasets is further provided in Section C.2 of the Appendix. Ca RA is trained with rank 32 across all the datasets. Section C.3 of the Appendix provides more details about hyperparameters. Table 8: Hyperparameter details for the VTAB-1k benchmark using the Vi T-Base model. The standard deviation (std) is computed over 10 runs. Table 9: Hyperparameter details for the FGVC benchmark using the Vi T-Base model. Table 10: Hyperparameter details for four image classification datasets using the Vi T-Large model. The standard deviation (std) is computed over five runs. |