COME: Test-time Adaption by Conservatively Minimizing Entropy
Authors: Qingyang Zhang, Yatao Bian, Xinke Kong, Peilin Zhao, Changqing Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, our method achieves state-of-the-art performance on commonly used benchmarks, showing significant improvements in terms of classification accuracy and uncertainty estimation under various settings including standard, life-long and open-world TTA, i.e., up to 34.5% improvement on accuracy and 15.1% on false positive rate. Our code is available at: https://github.com/Blue Whale Lab/COME. |
| Researcher Affiliation | Collaboration | Qingyang Zhang1 , Yatao Bian2 , Xinke Kong1, Peilin Zhao2 and Changqing Zhang1 College of Intelligence and Computing, Tianjin University1 Tencent AI Lab2 |
| Pseudocode | Yes | Algorithm 1: Pseudo code of COME in a Py Torch-like style. |
| Open Source Code | Yes | Our code is available at: https://github.com/Blue Whale Lab/COME. |
| Open Datasets | Yes | We conduct experiments on standard covariate-shifted distribution datasets Image Net-C (a large-scale benchmark with 15 types of diverse corruption), Image Net-R and Image Net-S. Besides, we also consider open-world test-time adaption setting, where the test data distribution P test is a mixture of both normal covariate-shifted data P Cov and abnormal outliers P Outlier of which the true labels do not belong to any known classes in P train. Following previous work in open-set OOD generalization literature (Lee et al., 2023; Bai et al., 2023; Baek et al., 2024), P Outlier is a suit of diverse datasets introduced by (Yang et al., 2022), including i Naturalist, Open-Image, NINCO and SSB-Hard. |
| Dataset Splits | Yes | Following the common practice (Niu et al., 2022; 2023), we conduct experiments on standard covariate-shifted distribution datasets Image Net-C (a large-scale benchmark with 15 types of diverse corruption), Image Net-R and Image Net-S. The test batch size is 64. In open-world TTA, the test data distribution is a mixture of both normal covariate-shifted data and abnormal outliers. The mixture ratio of P Cov and P Outlier is 0.5 following previous work (Bai et al., 2023), i.e., P test = 0.5P Cov + 0.5P Outlier. |
| Hardware Specification | Yes | We run all the experiments on one single NVIDIA 4090 GPU. |
| Software Dependencies | No | This can be achieved by applying the detach operation which is a common used function in modern deep learning toolbox like Py Torch and Tensor Flow. |
| Experiment Setup | Yes | The test batch size is 64. Specifically, we use SGD as the update rule, with a momentum of 0.9, batch size of 64 and learning rate of 0.001/0.00025 for Vi T/Res Net models. The trainable parameters are all affine parameters of layer/batch normalization layers for Vi T/Res Net models. |