Attention Bootstrapping for Multi-Modal Test-Time Adaptation
Authors: Yusheng Zhao, Junyu Luo, Xiao Luo, Jinsheng Huang, Jingyang Yuan, Zhiping Xiao, Ming Zhang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the benchmarks validate the effectiveness of the proposed ABPEM in comparison with competing baselines. |
| Researcher Affiliation | Academia | 1State Key Laboratory for Multimedia Information Processing, School of Computer Science, PKU-Anker LLM Lab, Peking University, Beijing, China 2Department of Computer Science, University of California, Los Angeles, CA, USA 3Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA |
| Pseudocode | Yes | Algorithm 1: Optimization Algorithm of ABPEM |
| Open Source Code | Yes | More details can be found at https://github.com/Yusheng Zhao/ABPEM. |
| Open Datasets | Yes | Benchmarks. The experiments are performed on two benchmarks: Kinetics50-C and VGGSound-C (Yang et al. 2024), which are based on the widely used Kinetics (Kay et al. 2017) and VGGSound (Chen et al. 2020) datasets. |
| Dataset Splits | No | The paper mentions 'pretrained on the corresponding training set (Kinetics or VGGSound)' and 'using unlabeled test data Dte', implying the existence of training and test sets. However, it does not explicitly provide specific dataset split percentages, sample counts, or direct citations for the splits used for these benchmarks. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Adam optimizer (Kingma and Ba 2014)' and 'CAVMAE (Gong et al. 2023) as the architecture of M', but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We set k in Eq. 10 to about 8 for Kinetics50-C and 30 for VGGSound-C, and λ to 1 by default. Moreover, we also use a class-balancing loss in alignment with (Yang et al. 2024). For optimization, we use Adam optimizer (Kingma and Ba 2014) and the model is optimized within a single epoch, with the learning rate of 1 × 10−4. |