DynaPrompt: Dynamic Test-Time Prompt Tuning
Authors: Zehao Xiao, Shilin Yan, Jack Hong, Jiayin Cai, Xiaolong Jiang, Yao Hu, Jiayi Shen, Qi Wang, Cees G Snoek
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we conduct experiments on fourteen benchmarks, covering typical evaluation scenarios such as domain generalization and cross-dataset. The results show the effectiveness of the proposed method. |
| Researcher Affiliation | Collaboration | 1AIM Lab, University of Amsterdam 2Xiaohongshu Inc. 3Department of Automation, Tsinghua University |
| Pseudocode | Yes | We provide an algorithm of our method in Appendix A. |
| Open Source Code | Yes | Codes are available at https://github.com/zzzx1224/Dyna Prompt. |
| Open Datasets | Yes | Fifteen datasets. Following previous methods (Shu et al., 2022; Samadh et al., 2023), we conduct experiments across two settings that suffer from distribution shifts to demonstrate the effectiveness of our method: domain generalization and cross-dataset shifts. For the domain generalization setting, we evaluate the method on Image Net (Deng et al., 2009) and its four variant datasets: Image Net-V2 (Recht et al., 2019), Image Net-(S)ketch (Wang et al., 2019), Image Net-A (Hendrycks et al., 2021b), and Image Net-R (Hendrycks et al., 2021a). For the cross-dataset setting, we evaluate our method on 10 image classification datasets covering various tasks: Caltech101 (Fei-Fei et al., 2004), Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), FGVCAircraft (Maji et al., 2013), SUN397 (Xiao et al., 2010), DTD (Cimpoi et al., 2014), Euro SAT (Helber et al., 2019), and UCF101 (Soomro et al., 2012). |
| Dataset Splits | Yes | For the domain generalization setting, we evaluate the method on Image Net (Deng et al., 2009) and its four variant datasets: Image Net-V2 (Recht et al., 2019), Image Net-(S)ketch (Wang et al., 2019), Image Net-A (Hendrycks et al., 2021b), and Image Net-R (Hendrycks et al., 2021a). For the cross-dataset setting, we evaluate our method on 10 image classification datasets... Following TPT (Shu et al., 2022), we generate 63 augmentations by random resize crops for each individual test image to construct a batch of 64 images including the original image. |
| Hardware Specification | Yes | Our method runs on an NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions "Adam W optimizer" and that it is "Based on the CLIP model with Vi T-Base-16", but does not specify version numbers for any key software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python) used. |
| Experiment Setup | Yes | Based on the CLIP model with Vi T-Base-16 (Dosovitskiy et al., 2020), we initialize our dynamic prompts with the manually crafted a photo of a and optimize the prompts online in the text input embedding space. The prompt set optimized by one test sample is utilized for the next sample. Following TPT (Shu et al., 2022), we generate 63 augmentations by random resize crops for each individual test image to construct a batch of 64 images including the original image. During the dynamic tuning, we calculate the entropy and augmentation probability differences over these 63 augmented images as the dynamic prompt selection metrics. The thresholds are obtained in the same way based on the initial prompt. We set the maximum number of the prompt set M as 10. We append new prompts to the dynamic prompt set when no appropriate prompt is selected for the test sample. Once the number of prompts in the prompt set V exceeds M, we remove the prompt that has been inactive for the longest time. For optimization, we select the top 10% confident samples among the batch and calculate the entropy of the averaged logits of the selected predictions following Shu et al. (2022). We utilize a learning rate of 0.005 for domain generalization and 0.003 for the cross-dataset settings with the Adam W optimizer. |