The Over-Certainty Phenomenon in Modern Test-Time Adaptation Algorithms

Authors: Fin Amin, Jung-Eun Kim

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In order to evaluate DEC, we conduct a series of experiments using three different backbone models across four datasets. Our primary evaluation metrics will be model accuracy, NLL and ECEbins=15 on the observations, allowing us to examine both the predictive performance and the calibration quality of the models.
Researcher Affiliation Academia Fin Amin EMAIL Department of Electrical and Computer Engineering North Carolina State University Jung-Eun Kim EMAIL Department of Computer Science North Carolina State University
Pseudocode Yes Algorithm 1 Compute Certainty Regularizer (CCR) ... Algorithm 2 Dynamic Entropy Control (DEC)
Open Source Code Yes In the interest of reproducibility, we release our code at https://github.com/Fin Amin Toast Crunch/Dynamic Entropy Control.
Open Datasets Yes The following publicly available TTA datasets are used in our experiments; we selected these because they are commonly used in existing works and provide a variety of domain shifts. 1. PACS Li et al. (2017) ... 2. Home Office Venkateswara et al. (2017) ... 3. Digits is a combination of 3 numbers datasets: USPS Hull (1994), MNIST Le Cun et al. (2010), and SVHN Netzer et al. (2011). ... 4. Tiny Image Net-C (TIN-C) Le & Yang (2015)
Dataset Splits Yes 1. PACS Li et al. (2017) has 4 domains: pictures, art, cartoon, sketch with 7 classes. Tested using LOO. ... 2. Home Office Venkateswara et al. (2017) ... Tested using LOO. ... 3. Digits is a combination of 3 numbers datasets: USPS Hull (1994), MNIST Le Cun et al. (2010), and SVHN Netzer et al. (2011). ... Tested using LOO by training on the source domains training sets and adapting to target domain s test set. ... 4. Tiny Image Net-C (TIN-C) Le & Yang (2015), has 15 domains with 200 classes. ... Backbones are trained on corruption-free (source) training set, adapted to and evaluated on corrupted (target) domains. For each target domain, there are 5 tiers of corruption.
Hardware Specification Yes We used Tensor Flow 2.9 Abadi et al. (2015) with Nvidia CUDNN version 11.3 on an RTX 3080 16GB laptop GPU with 32GB of system memory.
Software Dependencies Yes We used Tensor Flow 2.9 Abadi et al. (2015) with Nvidia CUDNN version 11.3 on an RTX 3080 16GB laptop GPU with 32GB of system memory. ... We do most initial training on the source domain using RMS_Prop(lr = 2e 4) Tieleman et al. (2012) ... Small CNN is compiled and initially trained with the Adam optimizer Kingma & Ba (2014)
Experiment Setup Yes We do most initial training on the source domain using RMS_Prop(lr = 2e 4) Tieleman et al. (2012) to minimize cross-entropy loss for epochs = {15, 15, 5, 25} for each enumerated dataset, respectively. ... For ETA, we set E_0 = 0.4 ln (C), as this was their recommended value, and ϵ = {0.6, 0.1, 0.4, 0.125} for each enumerated dataset, respectively. ... For T3A, we set the number of supports to retain, M = , as this provides the lowest calibration error. For So TTA, we set ρ = 0.05, C0 = {0.33, 0.33, 0.33, 0.66} for each dataset respectively to help their performance, and NSo T T A = 64 as per their recommendations. We use a batch size of 50 for our DEC for all experiments. ... All images are resized to (227, 227, 3) and scaled between [0, 255]. ... Mobile Net: We set tmin and tmax parameters to 1.20 and 2.75 respectively. ... All experiments are run three times using random_seed = 0, 1, 2, respectively.