reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Ranked Entropy Minimization for Continual Test-Time Adaptation

Authors: Jisu Han, Jaemin Na, Wonjun Hwang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed method is extensively evaluated across various benchmarks, demonstrating its effectiveness through empirical results. In this section, we extensively explore the effectiveness of our REM on CTTA protocol (Wang et al., 2022). The analysis includes comparisons with state-of-the-art baselines, verification of its intended functionality through visualizations, and an understanding of its working mechanisms through ablation studies.
Researcher Affiliation	Collaboration	1Ajou University 2Korea Telecom 3Korea University.
Pseudocode	No	The paper only describes the methods using equations and prose, without a distinct pseudocode or algorithm block.
Open Source Code	Yes	Our code is available at https://github.com/pilsHan/rem
Open Datasets	Yes	We construct experiments on Image Net-to Image Net C, CIFAR10-to-CIFAR10C, and CIFAR100-to-CIFAR100C. The source domains are Image Net (Deng et al., 2009) and CIFAR (Krizhevsky et al., 2009), while the corresponding robustness benchmarks (Hendrycks & Dietterich, 2018), Image Net C, CIFAR10C, and CIFAR100C, are used as the target domains.
Dataset Splits	Yes	We construct experiments on Image Net-to Image Net C, CIFAR10-to-CIFAR10C, and CIFAR100-to-CIFAR100C. The source domains are Image Net (Deng et al., 2009) and CIFAR (Krizhevsky et al., 2009), while the corresponding robustness benchmarks (Hendrycks & Dietterich, 2018), Image Net C, CIFAR10C, and CIFAR100C, are used as the target domains. The suffix C in these datasets indicates corruption, which includes 15 types of corruptions, each with 5 levels of severity. Following (Wang et al., 2022; Liu et al., 2024b;a), we adopt target domains with level 5 severity across all 15 corruption types for sequential domains.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models used for experiments.
Software Dependencies	No	The paper mentions using open-source code from Continual-MAE and ViDA, and refers to 'timm (Wightman, 2019)' for pre-trained weights, but does not provide specific version numbers for these software components or other libraries used for implementation.
Experiment Setup	Yes	Table 6 provides details on the implementation of our experiments, including optimizer settings, learning rates, batch sizes, model architectures, and hyperparameters. For CTTA experiments, we follow the Continual-MAE framework. Specifically, for CIFAR datasets, we resize the input images to 384 × 384, while for all other experiments, the images are resized to 224 × 224. ... Training Parameters: Optimizer Adam, Optimizer momentum (0.9, 0.999), Learning rate 1e-3, Batch size 50, Model architecture ViT-B/16. Algorithm Parameters: λ (Eq. 5) 1.0, m (Eq. 4) 0.