reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Test-Time Ensemble via Linear Mode Connectivity: A Path to Better Adaptation

Authors: Byungjai Kim, Chanho Ahn, Wissam Baddar, Kikyung Kim, HUIJIN LEE, Saehyun Ahn, Seungju Han, Sungjoo Suh, Eunho Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In extensive experiments, integration with TTE consistently outperformed baseline models across various challenging scenarios, demonstrating its effectiveness and general applicability. ... We conducted experiments with four benchmark datasets: Image Net-C (Apache-2.0 License) (Hendrycks & Dietterich, 2019) assesses adaptation performance under 15 types of corruptions at five severity levels... Table 1 presents classification accuracy across 15 distributions in Image Net-C... Table 5 shows classification accuracy as components are added sequentially.
Researcher Affiliation	Collaboration	1AI Center, Samsung Electronics 2Korea Advanced Institute of Science and Technology EMAIL EMAIL EMAIL
Pseudocode	Yes	Algorithm 1: TTE + Tent (Wang et al., 2021) ... Algorithm 2: TTE + SAR (Niu et al., 2023) ... Algorithm 3: TTE + De YO (Lee et al., 2024)
Open Source Code	No	The paper does not provide an explicit statement or link for the open-sourcing of the code for the methodology described in this paper. It mentions that implementations of comparative methods were obtained from their public repositories, but not for TTE itself.
Open Datasets	Yes	We conducted experiments with four benchmark datasets: Image Net-C (Apache-2.0 License) (Hendrycks & Dietterich, 2019) ... Image Net-S (MIT License) (Wang et al., 2019a) and Image Net-R (MIT License) (Hendrycks et al., 2021) ... Image Net V2 (MIT License) (Recht et al., 2019)
Dataset Splits	Yes	We followed the three wild test scenarios from Niu et al. (2023) using Image Net-C: Label Shifts where batches are class-imbalanced with most samples belonging to the same class, Batch Size 1 where each batch contains only one sample, testing adaptation with minimal information, and Mix Shifts where batches contain samples from various distributions, testing adaptation with multiple shifts simultaneously.
Hardware Specification	No	The paper mentions 'GPU time' in Table 9, but does not provide any specific details about the type of GPU, CPU, or other hardware used for experiments.
Software Dependencies	No	We utilized pre-trained Vi T-base1 and Res Net50-GN2 obtained from the publicly available Py Torch Image Models repository Wightman (2019). The implementations of the comparative methods were obtained from their public repositories and followed the guidelines outlined in their original papers. No specific version numbers for PyTorch, Python, or other libraries used for the authors' own implementation are provided.
Experiment Setup	Yes	We evaluated two types of architectures: Vision Transformer Base (Vi TBase) and Res Net-50 with Group Normalization (Res Net50-GN). ... For adaptation, the affine parameters of normalization layers in each architecture were trainable. For the ensemble strategies in TTE, the temperature τ was set to 1.0 with a dropout ratio of 0.9 for Res Net50-GN, and τ = 10.0 with a dropout ratio of 0.4 for Vi TBase. The value of m0 was fixed at 1.0 for both Res Net and Vi T models. For de-biased knowledge distillation, n was set to 0.99 and α to 3.0. These settings were consistent across all test-time scenarios and baseline methods integrated with TTE to avoid over-tuned hyperparameter configuration. ... TTE employed SGD with a momentum of 0.9 as the optimizer. The learning rate was set to 0.00025 for Res Net50-GN and 0.001 for Vi TBase. For a batch size of 1, the learning rates were adjusted to 0.00025/16 for Res Net50-GN and 0.001/32 for Vi TBase.