reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Layer Selection Approach to Test Time Adaptation

Authors: Sabyasachi Sahoo, Mostafa ElAraby, Jonas Ngnawe, Yann Batiste Pequignot, Frédéric Precioso, Christian Gagné

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we demonstrate that the proposed layer selection framework improves the performance of existing TTA approaches across multiple datasets, domain shifts, model architectures, and TTA losses.
Researcher Affiliation	Academia	1IID, Universit e Laval 2Mila 3Universit e de Montr eal 4Universit e Cote d Azur, CNRS, INRIA, I3S, Maasai 5Canada CIFAR AI Chair EMAIL
Pseudocode	No	The paper describes the proposed approach using mathematical equations and descriptive text, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code	No	The paper does not contain any explicit statements about releasing code, nor does it provide links to any code repositories or mention code in supplementary materials.
Open Datasets	Yes	This section compares our proposed approaches with existing baselines on Domainbed (Gulrajani and Lopez-Paz 2021), a popular benchmark with large single distribution shifts, and Continual TTA, a popular benchmark with multiple distribution shifts. TTA losses Two popular TTA losses are considered: Pseudo-Labeling (PL) (Lee et al. 2013) and SHOT (Liang, Hu, and Feng 2020).
Dataset Splits	Yes	For the experiments on Domainbed, we follow the evaluation protocol as described in Iwasawa and Matsuo (2021), including dataset splits for the following four datasets: PACS (Li et al. 2017), VLCS (Fang, Xu, and Rockmore 2013), Terra Incognita (Beery, Van Horn, and Perona 2018), and Office-Home (Venkateswara et al. 2017).
Hardware Specification	No	Computations were made on the cedar, and beluga supercomputers, managed by Calcul Qu ebec and the Digital Research Alliance of Canada (Alliance).
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	Implementational details We report results for GALA with window size of 20 and selection threshold of 0.75 with single-layer granularity. It appears that GALA is not overly sensitive to hyperparameters, and those values work well overall see Sec. 5 for more discussion on hyperparameter values and the design choices.