PTTA: Purifying Malicious Samples for Test-Time Model Adaptation
Authors: Jing Ma, Hanlin Li, Xiang Xiang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on four types of TTA tasks as well as classification, segmentation, and adversarial defense demonstrate the effectiveness of our method. |
| Researcher Affiliation | Academia | 1National Key Lab of Multi Spectral Info. Intelligent Processing Tech., School of Artificial Intelligence and Automation, Huazhong University of Science and Tech. (HUST), Wuhan, China. 2Peng Cheng National Lab, Shenzhen, China. 3School of Computer Science and Technology, HUST, China. Correspondence to: Xiang Xiang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Purification for Test-Time Adaptation (PTTA) Input: Source model fθ with parameters θ. Test samples xt = {xti}Nbs i=1 Dtest arrived at time step t. A memory bank M. Benign samples {x+ ti, y+ ti} selected by basic TTA methods. Hyperparameters α. Learning rate η. Output: Predictions ˆyt = {ˆyti}Nbs i=1 for test samples xt. for xt Dtest do Compute predictions ˆyt = fθ(xt) and logit-saliency indicator z LEnt(fθ(xt)) ; // (Eq. 4) Incorporate selected benign samples {x+ ti, y+ ti, z LEnt(fθ(x+ ti))} into M; Retrieve x j from M using saliency distance Dsa(xi, xj) ; // (Eq. 3 and 5) Generate purified image x ij and its pseudo-label y ij using Mixup ; // (Eq. 6) Compute total loss Ltotal = Ltta + αLpur(x ij, y ij) ; // (Eq. 7) Update θ with θ θ η θLtotal; end |
| Open Source Code | Yes | Code is available at https: //github.com/HAIV-Lab/PTTA. |
| Open Datasets | Yes | We employ Image Net-C, CIFAR100-C (Hendrycks & Dietterich, 2018), Image Net (Deng et al., 2009) and its variants: -A (Hendrycks et al., 2021b), -V2. (Recht et al., 2019), -R. (Hendrycks et al., 2021a), -S. (Wang et al., 2019) to construct these tasks. Image Net-C contains 15 types of corruptions applied to the original Image Net validation images, each having 5 severity levels. We exploit the most severe level (5-th level) for experiments. The same applied to CIFAR100-C. ... Beyond image classification, we also consider the semantic segmentation task and employ Carla TTA dataset (Marsden et al., 2024a) for experiments. |
| Dataset Splits | Yes | In episodic task, a single batch of test samples is used to optimize the model, and then the updated model makes predictions for current batch. After that, model s parameters are reset to the source. For single and continual tasks, the model is iteratively updated in static and dynamically changing environments respectively. Furthermore, lifelong task extends the dynamic environment indefinitely, set as 10 rounds and a total of 150 corruptions. Following previous TTA methods, we employ Image Net-C, CIFAR100-C (Hendrycks & Dietterich, 2018), Image Net (Deng et al., 2009) and its variants: -A (Hendrycks et al., 2021b), -V2. (Recht et al., 2019), -R. (Hendrycks et al., 2021a), -S. (Wang et al., 2019) to construct these tasks. Image Net-C contains 15 types of corruptions applied to the original Image Net validation images, each having 5 severity levels. We exploit the most severe level (5-th level) for experiments. ... We run experiments on 3 random seeds and report the average accuracy and the standard deviation. |
| Hardware Specification | Yes | Hardware: CPU: Intel Xeon Silver 4210 @ 2.20GHz | GPU: NVIDIA Ge Force RTX 3090 | RAM: 256GB |
| Software Dependencies | Yes | Software: Py Torch 1.9.0 | CUDA 11.1 |
| Experiment Setup | Yes | We set λ = 1/(K + 1), where K decides that the top-K samples with the largest saliency distance are retrieved from the scope. We uniformly set K = 1 and conduct an ablation study on the value of K in Sec. 4.4. Overall, the total loss function is defined as Ltotal = Ltta + αLpur, where α is a hyperparameter to balance the two loss functions, and we conduct an ablation study on α in Sec. 4.4. ... We set α = 3.0 for sample-selection-based TTA methods and α = 1.0 for selection-free TTA methods. We build a memory bank for OOD retrieval, setting it as a first-in-first-out queue and limiting its maximum length to 1, 000. ... For Res Net50, we consistently use the SGD with a learning rate of 0.00025, a momentum of 0.9, a batch size of 64, and no weight decay. For Vi T-B/16, we consistently use the SGD with a learning rate of 0.001, a momentum of 0.9, a batch size of 64, and no weight decay. |