Self-Bootstrapping for Versatile Test-Time Adaptation

Authors: Shuaicheng Niu, Guohao Chen, Peilin Zhao, Tianyi Wang, Pengcheng Wu, Zhiqi Shen

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that, either independently or as a plug-and-play module, our method achieves superior results across classification, segmentation, and 3D monocular detection tasks with both transformer and CNN models.
Researcher Affiliation Academia 1College of Computing and Data Science, Nanyang Technological University, Singapore 2Joint We Bank-NTU Research Institute on Fintech, Singapore 3School of Artificial Intelligence, Shanghai Jiao Tong University, China. Correspondence to: <EMAIL>.
Pseudocode Yes Algorithm 1 Py Torch-style pseudocode of SPA.
Open Source Code Yes The cource code is availiable at https://github.com/mr-eggplant/SPA.
Open Datasets Yes Datasets and Models For classification, we conduct experiments on four benchmarks, i.e., Image Net-C (Hendrycks & Dietterich, 2019) (corrupted images in 15 types of 4 main categories, with the most severe corruption level 5), Image Net-R (artistic renditions of 200 Image Net classes) (Hendrycks et al., 2021a), Image Net Adversarial (Hendrycks et al., 2021b) and Image Net Sketch (Wang et al., 2019). We use Vi T-base (Dosovitskiy et al., 2021), trained on Image Net by timm repository (Wightman, 2019), as the source model. For 3D monocular object detection, we follow Mono TTA (Lin et al., 2024) to evaluate all methods on KITTI-C, constructed from a validation set of KITTI (Geiger et al., 2012) through the incorporation of 13 distinct types of data corruptions (Hendrycks & Dietterich, 2019). Each corruption has 3,769 images by following the original training and validation split of Mono Flex (Zhang et al., 2021). We use the model trained on KITTI by Mono Flex (Zhang et al., 2021) as the source model for TTA. For segmentation, we use the Segformer-B5 (Xie et al., 2021) model trained on Cityscape dataset (Cordts et al., 2016) as the source model and perform TTA on ACDC dataset (Sakaridis et al., 2021).
Dataset Splits Yes Each corruption has 3,769 images by following the original training and validation split of Mono Flex (Zhang et al., 2021). We use the model trained on KITTI by Mono Flex (Zhang et al., 2021) as the source model for TTA. For segmentation, we use the Segformer-B5 (Xie et al., 2021) model trained on Cityscape dataset (Cordts et al., 2016) as the source model and perform TTA on ACDC dataset (Sakaridis et al., 2021). ... ACDC contains four categories of images collected in adverse conditions, including fog, night, rain, and snow. Following Co TTA (Wang et al., 2022), we use 400 unlabeled images from each adverse condition for continuous TTA.
Hardware Specification Yes Adaptation Efficiency Notably, though SPA involves one more forward and backward propagation, it remains efficient and operates in real-time, achieving 79 FPS (vs. SPAI: 125 FPS) on a single A100 GPU with Vi T-Base and Image Net-C.
Software Dependencies No Algorithm 1 Py Torch-style pseudocode of SPA. ... We use Vi T-base (Dosovitskiy et al., 2021), trained on Image Net by timm repository (Wightman, 2019), as the source model. ... Following Co TTA (Wang et al., 2022) and Mono TTA (Lin et al., 2024), we apply SGD on classification and 3D detection, and Adam on segmentation, using the learning rate of 10 2/5 10 3/6 10 5.
Experiment Setup Yes Implementation Details We set the mask ratio m to 0.2 for all experiments. The noise factor γ is set to 0.4 for classification and 0.1 for segmentation and 3D detection. Following Co TTA (Wang et al., 2022) and Mono TTA (Lin et al., 2024), we apply SGD on classification and 3D detection, and Adam on segmentation, using the learning rate of 10 2/5 10 3/6 10 5. We only update norm layers following TENT. More details of SPA and details of baseline methods are put in Appendix B.2.