reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Test-Time Adaptation with Source Based Auxiliary Tasks

Authors: Motasem Alfarra, Alvaro Correia, Bernard Ghanem, Christos Louizos

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report a performance improvement over the state of the art by 1.5% and 6% on average across all corruptions on Image Net-C under episodic and continual evaluation, respectively. To combat the second scenario of limited data, we analyze the effectiveness of combining federated adaptation with our proposed auxiliary task across different models even when different clients observe different distribution shifts. We found that not only federated averaging enhances adaptation, but combining it with our auxiliary task provides a notable 6% performance gains over previous TTA methods.
Researcher Affiliation	Collaboration	Motasem Alfarra EMAIL Qualcomm AI Research Alvaro H.C. Correia EMAIL Qualcomm AI Research Bernard Ghanem EMAIL King Abdullah University of Science and Technology (KAUST) Christos Louizos EMAIL Qualcomm AI Research
Pseudocode	No	The paper describes methods using mathematical equations and descriptive text, but it does not contain a clearly labeled pseudocode block or algorithm.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	We conduct experiments on the Image Net-C benchmark (Hendrycks & Dietterich, 2019), where we fix fθ0 to be a Res Net-50 (He et al., 2016) pretrained on the Image Net dataset (Deng et al., 2009). We conduct comprehensive experimental analysis on the two standard and large-scale TTA benchmarks Image Net-C (Hendrycks & Dietterich, 2019) and Image Net-3DCC (Kar et al., 2022). We set Ds as Image Net-Sketch (Wang et al., 2019).
Dataset Splits	No	The paper mentions using subsets of the Image Net training set for source data (Ds) and the clean validation set of Image Net for evaluation, and also discusses splitting corrupted data into client streams for federated evaluation. However, it does not provide explicit overall training, validation, and test split percentages or methodology for the main datasets (ImageNet, ImageNet-C, ImageNet-3DCC) to reproduce the general data partitioning.
Hardware Specification	Yes	At last, it is worth noting that each experiment was run using a single Nvidia V100 GPU.
Software Dependencies	No	The paper mentions using an SGD optimizer, but it does not specify any software libraries or frameworks with their version numbers.
Experiment Setup	Yes	We use an SGD optimizer with a learning rate of 25 10 4 and a momentum of 0.9. For DISTA, we follow Niu et al. (2022) in setting ϵ = 5 10 2 in equation 4 but pick a higher value for E0; we set E0 = 0.5 log(1000) instead of 0.4 log(1000). For Aux-Tent, we set the learning rate to 5 10 4. We fix the severity level in our experiments to 5. We assume that the stream S reveals batches of data with a size of 64.