Test-Time Adaptation with Binary Feedback
Authors: Taeckyung Lee, Sorn Chottananurak, Junsu Kim, Jinwoo Shin, Taesik Gong, Sung-Ju Lee
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show Bi TTA achieves 13.3%p accuracy improvements over state-of-the-art baselines, demonstrating its effectiveness in handling severe distribution shifts with minimal labeling effort. The source code is available at https: //github.com/taeckyung/Bi TTA. |
| Researcher Affiliation | Academia | Corresponding authors. 1KAIST 2UNIST. Correspondence to: Taesik Gong <EMAIL>, Sung-Ju Lee <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Bi TTA Algorithm |
| Open Source Code | Yes | The source code is available at https: //github.com/taeckyung/Bi TTA. |
| Open Datasets | Yes | To evaluate the robustness of Bi TTA across various domain shifts, we used standard image corruption datasets CIFAR10-C, CIFAR100-C, and Tiny-Image Net C (Hendrycks & Dietterich, 2019). Additionally, we conducted experiments on the PACS dataset (Li et al., 2017), which is commonly used for domain adaptation tasks. |
| Dataset Splits | No | The paper does not explicitly provide specific percentages, sample counts, or citations for predefined training/validation/test splits within the main text. It mentions using 'standard image corruption datasets' and details about training the source model, but without explicit split specifications for reproducibility. |
| Hardware Specification | Yes | The experiments were mainly conducted on NVIDIA RTX 3090 and TITAN GPUs. |
| Software Dependencies | No | The paper mentions 'Torch Vision' and 'PyTorch' implicitly through citations but does not provide specific version numbers for these or other software libraries like Python or CUDA, which are required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | We configured Bi TTA to operate with minimal labeling effort, using only three binary feedback samples within each 64-sample test batch, accounting for less than 5%. We utilize a single value of balancing hyperparameters α = 2 and β = 1 for Bi TTA in all experiments. For CIFAR10-C/CIFAR100-C/Tiny-Image Net-C, we trained the model with the source data with a learning rate of 0.1/0.1/0.001 and a momentum of 0.9, with cosine annealing learning rate scheduling for 200 epochs. For PACS, we fine-tuned the pre-trained weights from Image Net on the selected source domains for 3,000 iterations using the Adam optimizer with a learning rate of 0.0001. During adaptation, we update all parameters, including BN stats, with an SGD optimizer with a learning rate/epoch of 0.001/3 (PACS), 0.0001/3 (CIFAR10-C, CIFAR100-C), and 0.00005/5 (Tiny-Image Net-C) on the entire model. We applied weight decay of 0.05 to PACS and 0.0 otherwise. ... With 4 dropout instances, we apply a dropout rate of 0.3 for small-scale datasets (e.g., CIFAR10-C, CIFAR100-C, PACS) and 0.1 for large-scale datasets (e.g., Tiny-Image Net-C, Image Net-C). |