Test-Time Selective Adaptation for Uni-Modal Distribution Shift in Multi-Modal Data

Authors: Mingcai Chen, Baoming Zhang, Zongbo Han, Wenyu Jiang, Yanmeng Wang, Shuai Feng, Yuntao Du., Bingkun Bao

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we validate the effectiveness of our proposed method through extensive experimental evaluations. Code available at https://github.com/chenmc1996/Uni Modal-Distribution-Shift. [...] We perform experiments on multi-modal datasets with uni-modal distribution shift, and discovery limited performance gain. [...] Our method is validated through extensive experiments on the uni-modal distribution shifted datasets, and the results show that our approach achieves superior performance.
Researcher Affiliation Academia 1Nanjing University of Posts and Telecommunications 2State Key Laboratory for Novel Software Technology at Nanjing University, Nanjing University 3College of Intelligence and Computing, Tianjin University 4Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR) & School of software, Shandong university. Correspondence to: Yuntao Du <EMAIL>, Bingkun Bao <EMAIL>.
Pseudocode No The paper describes the methodology using textual explanations and mathematical formulations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures with structured, code-like steps.
Open Source Code Yes Code available at https://github.com/chenmc1996/Uni Modal-Distribution-Shift.
Open Datasets Yes To validate the effectiveness of our method, we perform comparison experiments and on two multi-modal datasets, Kinetics50 (Kay et al., 2017) and VGGSound (Chen et al., 2020), with diverse domain shifts.
Dataset Splits No The paper mentions using the "training sets of Kinetics50 and VGGSound dataset" for pre-training and then evaluating on corrupted versions of these datasets. While it implies standard usage, it does not explicitly provide specific split percentages (e.g., 80/10/10) or sample counts for training, validation, and testing splits needed for reproducibility. It only details the dataset construction and corruption procedures, and video trimming.
Hardware Specification Yes We implement the network on a Ge Force RTX(TM) 3090 GPU and Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz.
Software Dependencies No For the software information and other experimental settings, please refer to our code https://github.com/chenmc1996/Uni-Modal-Distribution-Shift. The paper does not explicitly list specific software dependencies (e.g., Python, PyTorch, CUDA) with version numbers within the main text.
Experiment Setup Yes We mainly have the following hyper-parameters: The coefficient and threshold of self-training loss, the softmax temperature, the batch size. We use one set of hyper-parameters for the shift on one modality on each dataset ( we keep the temperature to 0.001 and loss coefficient as 0.5 across all experiments). For Kinetics50-C with video shift, the threshold as 0.9, the batch size as 16. For Kinetics50-C with audio shift, the threshold as 0.9, the batch size as 64. For VGGSound-C with video shift, the threshold as 0.8, the batch size as 128. For VGGSound-C with audio shift, the threshold as 0.8, the batch size as 64. During test-time, our model is updated using the Adam optimizer.