Towards Robust Multimodal Open-set Test-time Adaptation via Adaptive Entropy-aware Optimization

Authors: Hao Dong, Eleni Chatzi, Olga Fink

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across various domain shift scenarios demonstrate the efficacy and versatility of the AEO framework. Additionally, we highlight the strong performance of AEO in long-term and continual MM-OSTTA settings, both of which are challenging and highly relevant to real-world applications.
Researcher Affiliation Academia 1ETH Z urich 2EPFL EMAIL, EMAIL
Pseudocode Yes This section presents the pseudo-code for our AEO method. From Algorithm 1, test samples are coming batch by batch.
Open Source Code Yes Our source code is available at https://github.com/donghao51/AEO.
Open Datasets Yes We evaluate our proposed method across four benchmark datasets: EPIC-Kitchens and Human Animal-Cartoon (HAC) for multimodal action recognition with domain shifts, Kinetics-100-C for multimodal action recognition under corruptions, and the nu Scenes dataset for multimodal 3D semantic segmentation in Day-to-Night and USA-Singapore adaptation scenarios.
Dataset Splits Yes Kinetics100-C consists of 100 classes selected from Kinetics-600 dataset (Carreira et al., 2018), with 21181 videos for training and validation, and 3800 videos for testing... The scenes are split into 28, 130 training frames and 6, 019 validation frames.
Hardware Specification Yes Training is performed for 20 epochs on an RTX 3090 GPU, and the model with the best validation performance is selected.
Software Dependencies No The paper mentions 'The Adam optimizer (Kingma & Ba, 2015)' and various network architectures like 'Slow Fast network (Feichtenhofer et al., 2019)' and 'Res Net-18 (He et al., 2016)', but it does not provide specific version numbers for any software libraries or frameworks like PyTorch, TensorFlow, or the Adam optimizer's implementation.
Experiment Setup Yes The Adam optimizer (Kingma & Ba, 2015) is employed with a learning rate of 0.0001 and a batch size of 16. Training is performed for 20 epochs on an RTX 3090 GPU, and the model with the best validation performance is selected... We use a batch size of 64 and the Adam optimizer with a learning rate of 2e-5 for all experiments. We update the parameters of the last layer in each modality s feature encoder as well as the final classification layer. To ensure fairness, we update the same number of parameters for all baseline models. For hyperparameters in Wada, we set α to 0.8 and β to 4.0. For hyperparameters in the final loss LAEO, we set both γ1 and γ2 to 0.1.