Test-Time Adaptation with Source Based Auxiliary Tasks

Authors: Motasem Alfarra, Alvaro Correia, Bernard Ghanem, Christos Louizos

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We report a performance improvement over the state of the art by 1.5% and 6% on average across all corruptions on Image Net-C under episodic and continual evaluation, respectively. To combat the second scenario of limited data, we analyze the effectiveness of combining federated adaptation with our proposed auxiliary task across different models even when different clients observe different distribution shifts. We found that not only federated averaging enhances adaptation, but combining it with our auxiliary task provides a notable 6% performance gains over previous TTA methods.
Researcher Affiliation Collaboration Motasem Alfarra EMAIL Qualcomm AI Research Alvaro H.C. Correia EMAIL Qualcomm AI Research Bernard Ghanem EMAIL King Abdullah University of Science and Technology (KAUST) Christos Louizos EMAIL Qualcomm AI Research
Pseudocode No The paper describes methods using mathematical equations and descriptive text, but it does not contain a clearly labeled pseudocode block or algorithm.
Open Source Code No The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We conduct experiments on the Image Net-C benchmark (Hendrycks & Dietterich, 2019), where we fix fθ0 to be a Res Net-50 (He et al., 2016) pretrained on the Image Net dataset (Deng et al., 2009). We conduct comprehensive experimental analysis on the two standard and large-scale TTA benchmarks Image Net-C (Hendrycks & Dietterich, 2019) and Image Net-3DCC (Kar et al., 2022). We set Ds as Image Net-Sketch (Wang et al., 2019).
Dataset Splits No The paper mentions using subsets of the Image Net training set for source data (Ds) and the clean validation set of Image Net for evaluation, and also discusses splitting corrupted data into client streams for federated evaluation. However, it does not provide explicit overall training, validation, and test split percentages or methodology for the main datasets (ImageNet, ImageNet-C, ImageNet-3DCC) to reproduce the general data partitioning.
Hardware Specification Yes At last, it is worth noting that each experiment was run using a single Nvidia V100 GPU.
Software Dependencies No The paper mentions using an SGD optimizer, but it does not specify any software libraries or frameworks with their version numbers.
Experiment Setup Yes We use an SGD optimizer with a learning rate of 25 10 4 and a momentum of 0.9. For DISTA, we follow Niu et al. (2022) in setting ϵ = 5 10 2 in equation 4 but pick a higher value for E0; we set E0 = 0.5 log(1000) instead of 0.4 log(1000). For Aux-Tent, we set the learning rate to 5 10 4. We fix the severity level in our experiments to 5. We assume that the stream S reveals batches of data with a size of 64.