Temporal Test-Time Adaptation with State-Space Models

Authors: Mona Schirmer, Dan Zhang, Eric Nalisnick

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments on real-world temporal distribution shifts, we show that our method excels in handling small batch sizes and label shift. In Sec. 5, we conduct a comprehensive evaluation of STAD and prominent TTA baselines under authentic temporal shifts. Our results show that STAD excels in this setting (Tab. 2)...
Researcher Affiliation Collaboration Mona Schirmer EMAIL UvA-Bosch Delta Lab, University of Amsterdam; Dan Zhang EMAIL Bosch Center for AI; Eric Nalisnick EMAIL Johns Hopkins University
Pseudocode Yes Algorithm 1 STAD; Algorithm 2 EM for STAD-Gauss; Algorithm 3 EM for STAD-v MF
Open Source Code No The paper does not contain an explicit statement confirming the release of source code for the described methodology, nor does it provide a direct link to a code repository.
Open Datasets Yes Yearbook (Ginosar et al., 2015); EVIS (Zhou et al., 2022a); FMo W-Time (Koh et al., 2021); CIFAR-10.1 (Recht et al., 2019); Image Net V2 (Recht et al., 2019); CIFAR-10 (Krizhevsky et al., 2009); Image Net (Deng et al., 2009); CIFAR-10-C (Hendrycks & Dietterich, 2019)
Dataset Splits Yes Yearbook: Images from 1930 to 1969 are used for training; the years 1970 2013 for testing. EVIS: Models are trained on images from 2009-2011 and evaluated on images from 2012-2020. FMo W-Time: splitting 141,696 images into a training period (2002-2012) and a testing period (2013-2017).
Hardware Specification Yes All experiments are performed on NVIDIA RTX 6000 Ada with 48GB memory.
Software Dependencies No The paper mentions software like the 'Adam optimizer' and the 'timm library' (Wightman, 2019) but does not provide specific version numbers for these or any other software dependencies, which are necessary for full reproducibility.
Experiment Setup Yes Batch sizes are the same for all baselines. To ensure optimal performance on newly studied datasets, we conduct an extensive hyperparameter search for each baseline (see App. C.4) and report the best setting. For Yearbook we comprise all samples of a year in one batch resulting in a batch size of 2048. To create online class imbalance, we reduce the batch size to 64. We use a batch size of 100 for EVIS, CIFAR.10.1 and CIFAR-10-C and 64 for FMo W-Time and Image Net V2.