Smoothing the Shift: Towards Stable Test-Time Adaptation under Complex Multimodal Noises
Authors: Zirun Guo, Tao Jin
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on two public datasets show the effectiveness and superiority over existing methods under the complex noise patterns in multimodal data. Code is available at https://github.com/zrguo/Su Mi. |
| Researcher Affiliation | Academia | Zirun Guo Tao Jin Zhejiang University EMAIL |
| Pseudocode | Yes | Algorithm 1 Su Mi |
| Open Source Code | Yes | Code is available at https://github.com/zrguo/Su Mi. |
| Open Datasets | Yes | Datasets. We use two widely used multimodal datasets, Kinetics50 (Kay et al., 2017) and VGGSound (Chen et al., 2020) for evaluation. Following previous work (Hendrycks & Dietterich, 2019; Yang et al., 2024), we introduce 15 different types of corruptions and 6 types for audio to simulate the distribution shifts in real-world applications. |
| Dataset Splits | Yes | Following Yang et al. (2024), we use a subset of Kinetics which consists of 50 classes, 29,204 training pairs and 2,466 test pairs. |
| Hardware Specification | No | The paper does not provide specific hardware details for running its experiments. It mentions using a pre-trained model and an optimizer but no information about GPUs, CPUs, or other computing resources. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers. It mentions using the Adam optimizer and the pre-trained CAV-MAE model, but no versions for frameworks like PyTorch or TensorFlow, or other libraries. |
| Experiment Setup | Yes | We use Adam optimizer with a learning rate of 1e-4/1e-5 and batch size of 16/64 for Kinetics50-C and VGGSound-C, respectively. The multimodal threshold γm in Equation 4 and the normalization factor Ent0 in Equation 7 are set to 0.4 ln C following Niu et al. (2022) by default where C is the number of task classes. The unimodal threshold γu in Equation 4 is set to e 1 by default. The smoothing coefficient β is set to 0.6/0.9, the weighting term λ is set to 5.0 and the unimodal assistance µ is set to 1.0 by default for Kinetics50-C and VGGSound-C. For strong OOD adaptation, we set the mutual information sharing term t0 as iter/2. Following previous work (Niu et al., 2023; Gong et al., 2023a; Chen et al., 2024; Guo et al., 2024b), we update the affine parameters of normalization layers. |