Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model
Authors: Huan Ma, Yan Zhu, Changqing Zhang, Peilin Zhao, Baoyuan Wu, Long-Kai Huang, Qinghua Hu, Bingzhe Wu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comparative analysis of the proposed method against various approaches which validates the significant superiority. Extensive experiments confirm that our approach significantly enhances the model stability against decision shortcuts compared to existing state-of-the-art methods. In the experiments, we evaluate different methods on different scenarios, including the real-word image data Tiny Image Net (Le and Yang 2015), CUB-200 (Wah et al. 2011) and the benchmark simulated data Waterbirds (Koh et al. 2021), and the datasets created by S2E Camel Deer and Spider Crab. We submit a subset of these datasets in supplementary materials limited by the maximum file size. |
| Researcher Affiliation | Collaboration | 1College of Intelligence and Computing, Tianjin University, Tianjin, China 2AI Lab, Tencent, Shenzhen, China |
| Pseudocode | No | The paper describes methods using mathematical equations and textual explanations (e.g., "We begin by presenting the basic methodology of visual-language prompt tuning. Subsequently, we will introduce our proposed approach..."), but it does not contain a clearly labeled pseudocode block or algorithm. |
| Open Source Code | Yes | 1Codes: https://github.com/Ma Huan AAA/SEraser |
| Open Datasets | Yes | In the experiments, we evaluate different methods on different scenarios, including the real-word image data Tiny Image Net (Le and Yang 2015), CUB-200 (Wah et al. 2011) and the benchmark simulated data Waterbirds (Koh et al. 2021), and the datasets created by S2E Camel Deer and Spider Crab. We submit a subset of these datasets in supplementary materials limited by the maximum file size. |
| Dataset Splits | No | The paper mentions evaluating performance on the "worst group" of datasets (e.g., Waterbirds, Camel Deer, Spider Crab) and refers to "the entire test set," implying the existence of test splits. However, it does not explicitly provide specific percentages or counts for training/validation/test splits, nor does it detail the methodology for creating these splits for reproduction (e.g., 80/10/10 split, specific random seeds, or citations to standard splits for all used datasets). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using "CLIP with a pre-trained Vi T-B-32 released by Open AI (Radford et al. 2021)" and "SAM model (Wang et al. 2023b)". However, it does not provide specific version numbers for these or any other key software components, programming languages, or libraries used in their implementation. |
| Experiment Setup | No | The paper describes the general approach of optimizing a learnable prompt and minimizing Kullback-Leibler divergence as the optimization goal. However, it does not provide specific details such as learning rates, batch sizes, number of epochs, optimizer types, or other hyperparameter values that would allow for direct reproduction of the experimental setup. |