FATE: Feature-Adapted Parameter Tuning for Vision-Language Models
Authors: Zhengqin Xu, Zelin Peng, Xiaokang Yang, Wei Shen
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on 11 datasets covering a diverse set of visual recognition tasks demonstrate that FATE shows leading performance. Additionally, FATE demonstrates remarkable acceleration compared with the current prompt engineering and PEFT methods. |
| Researcher Affiliation | Academia | Zhengqin Xu, Zelin Peng*, Xiaokang Yang , Wei Shen Mo E Key Lab of Artifcial Intelligence, AI Institute, Shanghai Jiao Tong University EMAIL |
| Pseudocode | No | The paper describes the methodology using mathematical formulations and descriptive text, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about the availability of open-source code for the methodology described, nor does it include any links to code repositories. |
| Open Datasets | Yes | We evaluate the generalizability of our proposed FATE on 11 image classification datasets, including 2 general object recognition datasets: Image Net (Deng et al. 2009) and Caltech101 (Fei-Fei, Fergus, and Perona 2004); 5 fine-grained image recognition datasets: Oxford Pets (Parkhi et al. 2012), Standford Cars (Krause et al. 2013), Flowers102 (Nilsback and Zisserman 2008), Food101 (Bossard, Guillaumin, and Van Gool 2014), and FGVCAircraft (Maji et al. 2013); a scene understanding dataset: SUN397 (Xiao et al. 2010); a texture dataset: DTD (Cimpoi et al. 2014); a satellite-image recognition dataset: Euro SAT (Helber et al. 2019)and an action classification dataset: UCF101 (Soomro, Zamir, and Shah 2012). |
| Dataset Splits | Yes | In line with previous works (Zhou et al. 2022b,a; Khattak et al. 2023), we use a few-shot setting that randomly samples 16 shots for each class in all experiments. The model is trained using only the base classes in a few-shot setting while evaluation is conducted on base and novel categories to test generalizability. Cross-dataset Evaluation. As suggested in Co Co Op (Zhou et al. 2022a), we also use the 11 datasets mentioned above for cross-dataset evaluation, in which all models are trained on Image Net with 1000 categories (each category having 16 training samples) and directly transfer the model to evaluate on other datasets. |
| Hardware Specification | Yes | All models are trained with a cosine learning rate schedule on a single NVIDIA 3090 GPU. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks used to implement the methodology. |
| Experiment Setup | Yes | In line with previous works (Zhou et al. 2022b,a; Khattak et al. 2023), we use a few-shot setting that randomly samples 16 shots for each class in all experiments. The pre-trained Vi T-B/16 CLIP model is used throughout the experiments. We train FATE for 10 epochs with a batch size of 10 and an initial learning rate of 0.002 via an SGD solver. All models are trained with a cosine learning rate schedule on a single NVIDIA 3090 GPU. To maintain robust results, we report the results of Base and Novel class accuracy, and their harmonic mean (HM) averaged over three times with different seeds. Table. 7 shows that with α = 0.001, FATE achieves the optimal trade-off performance. |