Beyond Entropy: Region Confidence Proxy for Wild Test-Time Adaptation
Authors: Zixuan Hu, Yichun Hu, Xiaotong Li, Shixiang Tang, Lingyu Duan
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments demonstrate the consistent superiority of Re CAP over existing methods across various datasets and wild scenarios. The source code will be available at https://github.com/hzcar/Re CAP. |
| Researcher Affiliation | Academia | 1School of Computer Science, Peking University, Beijing, China 2Peng Cheng Laboratory, Shenzhen, China 3The Chinese University of Hong Kong, Hongkong, China. Correspondence to: Ling-Yu Duan <EMAIL>. |
| Pseudocode | No | The paper describes mathematical formulations and derivations, such as Lemma 4.1, 4.2, Proposition 4.3, and 4.4, and an 'Overall Procedure of Re CAP' describing the loss function. However, it does not contain any clearly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | Yes | The source code will be available at https://github.com/hzcar/Re CAP. |
| Open Datasets | Yes | Datasets. We conduct our experiments on three datasets to evaluate the robustness and generalization capability of our method under diverse distribution shifts: 1) Image Net C (Hendrycks & Dietterich, 2019), a large-scale dataset categorized into 15 common corruption types and 5 severity levels for each type. 2) Image Net-R (Hendrycks et al., 2021) and 3) Vis DA-2021 (Bashkirova et al., 2022), two datasets which encompass diverse domain shifts due to varying styles and textures (e.g., sketch, cartoon), compared to Image Net-C to assess the efficacy for more challenging wild test scenarios in the Appendix B. |
| Dataset Splits | Yes | In this paper, we primarily evaluate the out-of-distribution (OOD) generalization ability of all methods using a widely adopted benchmark: Image Net-C (Hendrycks & Dietterich, 2019). Image Net-C is derived by applying a series of corruptions to the original Image Net (Deng et al., 2009) test set, making it a large-scale benchmark for assessing model robustness under real-world distribution shifts. |
| Hardware Specification | Yes | We assess TTA approaches for processing 50,000 images in Gaussian corruption type, using a single Nvidia RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions software components like 'timm (Wightman, 2019)' and models like 'Res Net50-GN (Wu & He, 2018) and Vi TBase LN (Dosovitskiy et al., 2020)', but it does not provide specific version numbers for these or other software libraries/environments (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | For the optimizer, we use SGD, batch size of 64 (except for batch size=1), with a momentum of 0.9, and a learning rate of 0.00025/0.001 for Res Net/Vi T. For our Re CAP, L0 and τRE in Eq. 9 is set to 0.7/1.0 ln C and 0.8/1.0 ln C (C is the number of classes) for Res Net/Vi T. The hyper-parameter τ in Eq. 4 is 1.2 and λ in Eq. 9 is 0.5 by default. For trainable parameters, according to common practices (Wang et al., 2020), we adapt the affine parameters of normalization layers. |