Test-Time Ensemble via Linear Mode Connectivity: A Path to Better Adaptation

Authors: Byungjai Kim, Chanho Ahn, Wissam Baddar, Kikyung Kim, HUIJIN LEE, Saehyun Ahn, Seungju Han, Sungjoo Suh, Eunho Yang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In extensive experiments, integration with TTE consistently outperformed baseline models across various challenging scenarios, demonstrating its effectiveness and general applicability. ... We conducted experiments with four benchmark datasets: Image Net-C (Apache-2.0 License) (Hendrycks & Dietterich, 2019) assesses adaptation performance under 15 types of corruptions at five severity levels... Table 1 presents classification accuracy across 15 distributions in Image Net-C... Table 5 shows classification accuracy as components are added sequentially.
Researcher Affiliation Collaboration 1AI Center, Samsung Electronics 2Korea Advanced Institute of Science and Technology EMAIL EMAIL EMAIL
Pseudocode Yes Algorithm 1: TTE + Tent (Wang et al., 2021) ... Algorithm 2: TTE + SAR (Niu et al., 2023) ... Algorithm 3: TTE + De YO (Lee et al., 2024)
Open Source Code No The paper does not provide an explicit statement or link for the open-sourcing of the code for the methodology described in this paper. It mentions that implementations of comparative methods were obtained from their public repositories, but not for TTE itself.
Open Datasets Yes We conducted experiments with four benchmark datasets: Image Net-C (Apache-2.0 License) (Hendrycks & Dietterich, 2019) ... Image Net-S (MIT License) (Wang et al., 2019a) and Image Net-R (MIT License) (Hendrycks et al., 2021) ... Image Net V2 (MIT License) (Recht et al., 2019)
Dataset Splits Yes We followed the three wild test scenarios from Niu et al. (2023) using Image Net-C: Label Shifts where batches are class-imbalanced with most samples belonging to the same class, Batch Size 1 where each batch contains only one sample, testing adaptation with minimal information, and Mix Shifts where batches contain samples from various distributions, testing adaptation with multiple shifts simultaneously.
Hardware Specification No The paper mentions 'GPU time' in Table 9, but does not provide any specific details about the type of GPU, CPU, or other hardware used for experiments.
Software Dependencies No We utilized pre-trained Vi T-base1 and Res Net50-GN2 obtained from the publicly available Py Torch Image Models repository Wightman (2019). The implementations of the comparative methods were obtained from their public repositories and followed the guidelines outlined in their original papers. No specific version numbers for PyTorch, Python, or other libraries used for the authors' own implementation are provided.
Experiment Setup Yes We evaluated two types of architectures: Vision Transformer Base (Vi TBase) and Res Net-50 with Group Normalization (Res Net50-GN). ... For adaptation, the affine parameters of normalization layers in each architecture were trainable. For the ensemble strategies in TTE, the temperature τ was set to 1.0 with a dropout ratio of 0.9 for Res Net50-GN, and τ = 10.0 with a dropout ratio of 0.4 for Vi TBase. The value of m0 was fixed at 1.0 for both Res Net and Vi T models. For de-biased knowledge distillation, n was set to 0.99 and α to 3.0. These settings were consistent across all test-time scenarios and baseline methods integrated with TTE to avoid over-tuned hyperparameter configuration. ... TTE employed SGD with a momentum of 0.9 as the optimizer. The learning rate was set to 0.00025 for Res Net50-GN and 0.001 for Vi TBase. For a batch size of 1, the learning rates were adjusted to 0.00025/16 for Res Net50-GN and 0.001/32 for Vi TBase.