Mitigating Pervasive Modality Absence Through Multimodal Generalization and Refinement
Authors: Wuliang Huang, Yiqiang Chen, Xinlong Jiang, Chenlong Gao, Teng Zhang, Qian Chen, Yifan Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on four benchmark datasets demonstrate that the proposed MGR framework outperforms stateof-the-art methods, effectively mitigating the impact of pervasive modality absence. Experiments Experimental Setup Benchmark Datasets. Four multimodal datasets are adopted to evaluate the MGR framework. Comparison Results We compare the proposed MGR framework to the aforementioned baselines. The results are presented in Table 1 and Table 2. Ablation Studies and Sensitivity Analysis |
| Researcher Affiliation | Academia | 1Institute of Computing Technology, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3Peng Cheng Laboratory 4Beijing Key Laboratory of Mobile Computing and Pervasive Device 5Tsinghua Shenzhen International Graduate School, Tsinghua University EMAIL, EMAIL |
| Pseudocode | No | The paper describes its methodology using mathematical equations and figures, such as Figure 2 illustrating the MGR framework and Figure 3 detailing the graph-based redistribution operation. However, there are no clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | No explicit statement or link for open-source code release is provided. The paper states: "In future work, we will explore the potential applications of MGR in other multimodal models." |
| Open Datasets | Yes | Four multimodal datasets are adopted to evaluate the MGR framework. CMU MOSI (Zadeh et al. 2016), CMU MOSEI (Zadeh et al. 2018; Liang et al. 2018), and UR FUNNY (Hasan et al. 2019) are video-based datasets encompassing three modalities (V, A, T). CMU MOSEI and UR FUNNY are among the largest datasets within their respective domains. AV MNIST (Liang et al. 2021) is a synthetic noisy dataset with two modalities (V, A). |
| Dataset Splits | Yes | The data preprocessing and the partitioning of training, validation, and testing sets follows Multi Bench (Liang et al. 2021). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for conducting the experiments. |
| Software Dependencies | No | The framework is implemented using Py Torch (Paszke et al. 2019). The Adam W optimizer (Loshchilov and Hutter 2017) with a learning rate of 1e-4 is utilized. The paper mentions PyTorch and AdamW, but does not provide specific version numbers for PyTorch or any other libraries or software components. |
| Experiment Setup | Yes | The Adam W optimizer (Loshchilov and Hutter 2017) with a learning rate of 1e-4 is utilized. Regarding modality absence, the model is evaluated at two prevalence ratios, κ = 0.7 and κ = 1, which indicate the percentage of samples with one modality absence in training, validation, and testing phases. κ = 1 signifies that all samples exhibit one modality absence. Each experiment is conducted using four random seeds for the absence schema and trained under four random seeds, resulting in each experiment being replicated 16 times to ensure reliable results. The average performance and standard deviation across these 16 runs are reported. |