Modalities Contribute Unequally: Enhancing Medical Multi-modal Learning through Adaptive Modality Token Re-balancing
Authors: Jie Peng, Jenna L. Ballard, Mohan Zhang, Sukwon Yun, Jiayi Xin, Qi Long, Yanyong Zhang, Tianlong Chen
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on both medical and general multi-modal datasets demonstrate the effectiveness and generalizability of AMC. We demonstrate the effectiveness of AMC through extensive experiments on several real-world datasets, including the MIMIC-IV dataset, the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, and a subset of the TCGA benchmark covering five different cancer types. |
| Researcher Affiliation | Academia | 1University of Science and Technology of China 2University of Pennsylvania 3University of North Carolina at Chapel Hill. Correspondence to: Yanyong Zhang <EMAIL>. |
| Pseudocode | No | The paper describes the operations and steps of AMC within the main text (e.g., Section 4.2 Modality Importance Calculation, Section 4.3 Customized Token Fusion) and through figures (e.g., Figure 2 and 3), but it does not include a distinct, labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Code is available at https://github. com/Peng Jieb/amc. |
| Open Datasets | Yes | We demonstrate the effectiveness of AMC through extensive experiments on several real-world datasets, including the MIMIC-IV dataset, the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, and a subset of the TCGA benchmark covering five different cancer types. We select Enhanced Rico (ENRICO) dataset (Leiva et al., 2021) evaluates the generalizability of AMC. |
| Dataset Splits | Yes | For the dataset split, we use 70% for training, 15% for validation, and 15% for testing. |
| Hardware Specification | Yes | All experiments were conducted using RTX 3090 GPUs. |
| Software Dependencies | No | The paper describes the implementation and experimental setup but does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | Setup. To ensure a fair comparison with baselines, we use the best hyper-parameter settings from the original papers. If these are not available, we conduct hyper-parameter searches, including learning rate, hidden dimension, and batch size, with ranges of [1e-3, 1e-4, 5e-5, 1e-5], [32, 64, 128], and [32, 64, 128], respectively. For our proposed method, we additionally search for the number of experts and the weights of LI, LT and the load balancing loss of SMo E, with ranges of [4, 8, 16], [1.0, 0.1], [1.0, 0.1], and [1.0, 0.1], respectively. The final hyper-parameter settings for AMC are in Appendix B.1. (Referring to Table 7): The hyper-parameter setup for AMC. ADNI MIMIC-IV TCGA ENRICO UCEC LUAD LGG BRCA BLCA Learning rate 1e-4 1e-3 1e-3 1e-3 1e-3 1e-3 1e-3 5e-3 # of Experts 8 8 8 8 8 8 8 8 Top-K 2 2 2 2 2 2 2 2 # of Transformer Layers 2 2 2 2 2 2 2 4 Training Epochs 30 100 30 30 30 30 30 100 Warm-up Epochs 5 10 5 5 5 5 5 5 Hidden dimension 64 64 128 64 64 64 64 128 Batch Size 32 64 64 64 64 64 64 128 # of Attention Heads 8 8 8 8 8 8 8 8 |