Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach
Authors: Changdae Oh, Zhen Fang, Shawn Im, Xuefeng Du, Yixuan Li
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on real benchmark datasets, spanning 61 shift scenarios, empirically validate our theoretical insights. Code: [...] Finally, we conduct comprehensive validation for the theoretical framework and show that our theorems empirically hold in real-world benchmarks. |
| Researcher Affiliation | Academia | 1Department of Computer Sciences, University of Wisconsin Madison, WI, USA 2Faculty of Engineering & Information Technology, University of Technology Sydney, Sydney, Australia. Correspondence to: Yixuan Li <EMAIL>. |
| Pseudocode | No | The paper describes methods and theoretical derivations using mathematical formulas and prose but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Extensive experiments on real benchmark datasets, spanning 61 shift scenarios, empirically validate our theoretical insights. Code: |
| Open Datasets | Yes | Specifically, we adopt LLaVA-1.5 (Liu et al., 2023) and LLaVA-NeXT (Liu et al., 2024a) in 7B and 13B sizes as our target MLLM, with LLaVA-Bench COCO (Liu et al., 2023) serving as the ID dataset [...] We adopt LLaVA-Bench Wild (Liu et al., 2023) to vary visual input semantics [...] we adopt LLaVA-Med instruction dataset (Li et al., 2024) as a domain-specific open-ended benchmark on the medical images and corresponding questions. |
| Dataset Splits | No | The paper describes how out-of-distribution scenarios were constructed and evaluated (e.g., "34 synthetic and 27 natural shifts spanning 61 shift scenarios in total"), but it does not provide specific train/test/validation splits (e.g., percentages or exact counts) for the underlying datasets like LLaVA-Bench COCO, nor does it refer to standard predefined splits with citations for reproducibility of data partitioning. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU or CPU models, memory details) used to run the experiments. |
| Software Dependencies | No | The paper mentions several tools and models like CLUB (Cheng et al., 2020), RJSD (Hoyos-Osorio & Sanchez-Giraldo, 2023), CLIP-ViT-B/32 (Radford et al., 2021), XLM-RoBERTa-Base (Conneau, 2019), and GPT-4 (Hurst et al., 2024). However, it does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | we parameterize the qψ as a multi-variate Gaussian distribution and estimate the mean and variance parameters of Gaussian with separated two-layer MLPs with 250 hidden dimension size. During mini-batch training, those MLPs consume the concatenated input and response embeddings {[zxi, zyi]}N i=1 to produce a scalar estimate of MI, and they are simultaneously optimized by AdamW optimizer with learning rate 0.001 and batch size 1,024 for 5,000 iterations. |