Test-Time Selective Adaptation for Uni-Modal Distribution Shift in Multi-Modal Data
Authors: Mingcai Chen, Baoming Zhang, Zongbo Han, Wenyu Jiang, Yanmeng Wang, Shuai Feng, Yuntao Du., Bingkun Bao
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we validate the effectiveness of our proposed method through extensive experimental evaluations. Code available at https://github.com/chenmc1996/Uni Modal-Distribution-Shift. [...] We perform experiments on multi-modal datasets with uni-modal distribution shift, and discovery limited performance gain. [...] Our method is validated through extensive experiments on the uni-modal distribution shifted datasets, and the results show that our approach achieves superior performance. |
| Researcher Affiliation | Academia | 1Nanjing University of Posts and Telecommunications 2State Key Laboratory for Novel Software Technology at Nanjing University, Nanjing University 3College of Intelligence and Computing, Tianjin University 4Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR) & School of software, Shandong university. Correspondence to: Yuntao Du <EMAIL>, Bingkun Bao <EMAIL>. |
| Pseudocode | No | The paper describes the methodology using textual explanations and mathematical formulations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures with structured, code-like steps. |
| Open Source Code | Yes | Code available at https://github.com/chenmc1996/Uni Modal-Distribution-Shift. |
| Open Datasets | Yes | To validate the effectiveness of our method, we perform comparison experiments and on two multi-modal datasets, Kinetics50 (Kay et al., 2017) and VGGSound (Chen et al., 2020), with diverse domain shifts. |
| Dataset Splits | No | The paper mentions using the "training sets of Kinetics50 and VGGSound dataset" for pre-training and then evaluating on corrupted versions of these datasets. While it implies standard usage, it does not explicitly provide specific split percentages (e.g., 80/10/10) or sample counts for training, validation, and testing splits needed for reproducibility. It only details the dataset construction and corruption procedures, and video trimming. |
| Hardware Specification | Yes | We implement the network on a Ge Force RTX(TM) 3090 GPU and Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz. |
| Software Dependencies | No | For the software information and other experimental settings, please refer to our code https://github.com/chenmc1996/Uni-Modal-Distribution-Shift. The paper does not explicitly list specific software dependencies (e.g., Python, PyTorch, CUDA) with version numbers within the main text. |
| Experiment Setup | Yes | We mainly have the following hyper-parameters: The coefficient and threshold of self-training loss, the softmax temperature, the batch size. We use one set of hyper-parameters for the shift on one modality on each dataset ( we keep the temperature to 0.001 and loss coefficient as 0.5 across all experiments). For Kinetics50-C with video shift, the threshold as 0.9, the batch size as 16. For Kinetics50-C with audio shift, the threshold as 0.9, the batch size as 64. For VGGSound-C with video shift, the threshold as 0.8, the batch size as 128. For VGGSound-C with audio shift, the threshold as 0.8, the batch size as 64. During test-time, our model is updated using the Adam optimizer. |