reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RA-TTA: Retrieval-Augmented Test-Time Adaptation for Vision-Language Models

Authors: Youngjun Lee, Doyoung Kim, Junhyeok Kang, Jihwan Bang, Hwanjun Song, Jae-Gil Lee

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on 17 datasets demonstrate that the proposed RA-TTA outperforms the state-of-the-art methods by 3.01 9.63% on average.
Researcher Affiliation	Collaboration	1 KAIST, 2 LG AI Research EMAIL EMAIL
Pseudocode	Yes	Its pseudocode is presented in Appendix B.
Open Source Code	Yes	The source code of RA-TTA is available at https://github.com/kaist-dmlab/RA-TTA.
Open Datasets	Yes	To evaluate the zero-shot transferability, we use standard transfer learning and natural distribution shift benchmarks, including 17 datasets that span a wide range of image classification tasks: Image Net (Deng et al., 2009), Flowers102 (Nilsback & Zisserman, 2008), DTD (Cimpoi et al., 2014), Oxford pets (Parkhi et al., 2012), Stanford cars (Krause et al., 2013), UCF101 (Soomro et al., 2012), Caltech101 (Fei-Fei et al., 2004), Food101 (Bossard et al., 2014), SUN397 (Xiao et al., 2010), FGVC aircraft (Maji et al., 2013), RESISC45 (Cheng et al., 2017), Caltech256 (Griffin et al., 2007), and CUB200 (Wah et al., 2011) as transfer learning benchmarks, and natural distribution shift benchmarks, including Image Net adversarial (Hendrycks et al., 2021b), Image Net V2 (Recht et al., 2019), Image Net rendition (Hendrycks et al., 2021a), and Image Net sketch (Wang et al., 2019).
Dataset Splits	Yes	We use the dataset configuration of Co Op (Zhou et al., 2022) except for Caltech256 and RESISC45, which Co Op does not handle. We use a test split from Su S-X (Udandarao et al., 2023) for Caltech256, and we adopt a test split from (Neumann et al., 2020) for RESISC45.
Hardware Specification	Yes	All implementations are conducted using Py Torch 2.3.0 on an NVIDIA RTX 4090.
Software Dependencies	Yes	All implementations are conducted using Py Torch 2.3.0 on an NVIDIA RTX 4090.
Experiment Setup	Yes	We set the augmentation size M to 100. We configure K for Top-K operations at KD = 20 and KS = 20. We use a temperature parameter τ of 0.01, which is the default scale value of CLIP. We set p = 0.75 for transfer learning datasets except Image Net and p = 0.90 for Image Net-based datasets.