RA-TTA: Retrieval-Augmented Test-Time Adaptation for Vision-Language Models
Authors: Youngjun Lee, Doyoung Kim, Junhyeok Kang, Jihwan Bang, Hwanjun Song, Jae-Gil Lee
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on 17 datasets demonstrate that the proposed RA-TTA outperforms the state-of-the-art methods by 3.01 9.63% on average. |
| Researcher Affiliation | Collaboration | 1 KAIST, 2 LG AI Research EMAIL EMAIL |
| Pseudocode | Yes | Its pseudocode is presented in Appendix B. |
| Open Source Code | Yes | The source code of RA-TTA is available at https://github.com/kaist-dmlab/RA-TTA. |
| Open Datasets | Yes | To evaluate the zero-shot transferability, we use standard transfer learning and natural distribution shift benchmarks, including 17 datasets that span a wide range of image classification tasks: Image Net (Deng et al., 2009), Flowers102 (Nilsback & Zisserman, 2008), DTD (Cimpoi et al., 2014), Oxford pets (Parkhi et al., 2012), Stanford cars (Krause et al., 2013), UCF101 (Soomro et al., 2012), Caltech101 (Fei-Fei et al., 2004), Food101 (Bossard et al., 2014), SUN397 (Xiao et al., 2010), FGVC aircraft (Maji et al., 2013), RESISC45 (Cheng et al., 2017), Caltech256 (Griffin et al., 2007), and CUB200 (Wah et al., 2011) as transfer learning benchmarks, and natural distribution shift benchmarks, including Image Net adversarial (Hendrycks et al., 2021b), Image Net V2 (Recht et al., 2019), Image Net rendition (Hendrycks et al., 2021a), and Image Net sketch (Wang et al., 2019). |
| Dataset Splits | Yes | We use the dataset configuration of Co Op (Zhou et al., 2022) except for Caltech256 and RESISC45, which Co Op does not handle. We use a test split from Su S-X (Udandarao et al., 2023) for Caltech256, and we adopt a test split from (Neumann et al., 2020) for RESISC45. |
| Hardware Specification | Yes | All implementations are conducted using Py Torch 2.3.0 on an NVIDIA RTX 4090. |
| Software Dependencies | Yes | All implementations are conducted using Py Torch 2.3.0 on an NVIDIA RTX 4090. |
| Experiment Setup | Yes | We set the augmentation size M to 100. We configure K for Top-K operations at KD = 20 and KS = 20. We use a temperature parameter τ of 0.01, which is the default scale value of CLIP. We set p = 0.75 for transfer learning datasets except Image Net and p = 0.90 for Image Net-based datasets. |