Mitigating Pervasive Modality Absence Through Multimodal Generalization and Refinement

Authors: Wuliang Huang, Yiqiang Chen, Xinlong Jiang, Chenlong Gao, Teng Zhang, Qian Chen, Yifan Wang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on four benchmark datasets demonstrate that the proposed MGR framework outperforms stateof-the-art methods, effectively mitigating the impact of pervasive modality absence. Experiments Experimental Setup Benchmark Datasets. Four multimodal datasets are adopted to evaluate the MGR framework. Comparison Results We compare the proposed MGR framework to the aforementioned baselines. The results are presented in Table 1 and Table 2. Ablation Studies and Sensitivity Analysis
Researcher Affiliation Academia 1Institute of Computing Technology, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3Peng Cheng Laboratory 4Beijing Key Laboratory of Mobile Computing and Pervasive Device 5Tsinghua Shenzhen International Graduate School, Tsinghua University EMAIL, EMAIL
Pseudocode No The paper describes its methodology using mathematical equations and figures, such as Figure 2 illustrating the MGR framework and Figure 3 detailing the graph-based redistribution operation. However, there are no clearly labeled pseudocode or algorithm blocks.
Open Source Code No No explicit statement or link for open-source code release is provided. The paper states: "In future work, we will explore the potential applications of MGR in other multimodal models."
Open Datasets Yes Four multimodal datasets are adopted to evaluate the MGR framework. CMU MOSI (Zadeh et al. 2016), CMU MOSEI (Zadeh et al. 2018; Liang et al. 2018), and UR FUNNY (Hasan et al. 2019) are video-based datasets encompassing three modalities (V, A, T). CMU MOSEI and UR FUNNY are among the largest datasets within their respective domains. AV MNIST (Liang et al. 2021) is a synthetic noisy dataset with two modalities (V, A).
Dataset Splits Yes The data preprocessing and the partitioning of training, validation, and testing sets follows Multi Bench (Liang et al. 2021).
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for conducting the experiments.
Software Dependencies No The framework is implemented using Py Torch (Paszke et al. 2019). The Adam W optimizer (Loshchilov and Hutter 2017) with a learning rate of 1e-4 is utilized. The paper mentions PyTorch and AdamW, but does not provide specific version numbers for PyTorch or any other libraries or software components.
Experiment Setup Yes The Adam W optimizer (Loshchilov and Hutter 2017) with a learning rate of 1e-4 is utilized. Regarding modality absence, the model is evaluated at two prevalence ratios, κ = 0.7 and κ = 1, which indicate the percentage of samples with one modality absence in training, validation, and testing phases. κ = 1 signifies that all samples exhibit one modality absence. Each experiment is conducted using four random seeds for the absence schema and trained under four random seeds, resulting in each experiment being replicated 16 times to ensure reliable results. The average performance and standard deviation across these 16 runs are reported.