Towards Robust Multimodal Open-set Test-time Adaptation via Adaptive Entropy-aware Optimization
Authors: Hao Dong, Eleni Chatzi, Olga Fink
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across various domain shift scenarios demonstrate the efficacy and versatility of the AEO framework. Additionally, we highlight the strong performance of AEO in long-term and continual MM-OSTTA settings, both of which are challenging and highly relevant to real-world applications. |
| Researcher Affiliation | Academia | 1ETH Z urich 2EPFL EMAIL, EMAIL |
| Pseudocode | Yes | This section presents the pseudo-code for our AEO method. From Algorithm 1, test samples are coming batch by batch. |
| Open Source Code | Yes | Our source code is available at https://github.com/donghao51/AEO. |
| Open Datasets | Yes | We evaluate our proposed method across four benchmark datasets: EPIC-Kitchens and Human Animal-Cartoon (HAC) for multimodal action recognition with domain shifts, Kinetics-100-C for multimodal action recognition under corruptions, and the nu Scenes dataset for multimodal 3D semantic segmentation in Day-to-Night and USA-Singapore adaptation scenarios. |
| Dataset Splits | Yes | Kinetics100-C consists of 100 classes selected from Kinetics-600 dataset (Carreira et al., 2018), with 21181 videos for training and validation, and 3800 videos for testing... The scenes are split into 28, 130 training frames and 6, 019 validation frames. |
| Hardware Specification | Yes | Training is performed for 20 epochs on an RTX 3090 GPU, and the model with the best validation performance is selected. |
| Software Dependencies | No | The paper mentions 'The Adam optimizer (Kingma & Ba, 2015)' and various network architectures like 'Slow Fast network (Feichtenhofer et al., 2019)' and 'Res Net-18 (He et al., 2016)', but it does not provide specific version numbers for any software libraries or frameworks like PyTorch, TensorFlow, or the Adam optimizer's implementation. |
| Experiment Setup | Yes | The Adam optimizer (Kingma & Ba, 2015) is employed with a learning rate of 0.0001 and a batch size of 16. Training is performed for 20 epochs on an RTX 3090 GPU, and the model with the best validation performance is selected... We use a batch size of 64 and the Adam optimizer with a learning rate of 2e-5 for all experiments. We update the parameters of the last layer in each modality s feature encoder as well as the final classification layer. To ensure fairness, we update the same number of parameters for all baseline models. For hyperparameters in Wada, we set α to 0.8 and β to 4.0. For hyperparameters in the final loss LAEO, we set both γ1 and γ2 to 0.1. |