reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Authors: Ziyu Liu, Yuhang Zang, Xiaoyi Dong, Pan Zhang, Yuhang Cao, Haodong Duan, Conghui He, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments (Fig. 1(b)) demonstrate that MIA-DPO is agnostic to different LVLM architectures (LLa VA-v1.5 (Liu et al., 2024a) and Intern LM-XC2.5 (Zhang et al., 2024)), boosts the performance on multiple multi-image benchmarks while maintaining the original single-image understanding capabilities. [...] We evaluate our method on the following representative benchmarks. First, we select five multi-image benchmarks: MMMU (Yue et al., 2024), BLINK (Fu et al., 2024), Mantis (Jiang et al., 2024), NLVR2 (Suhr et al., 2018), and MVBench (Li et al., 2024c). [...] We also test the model on several single-image benchmarks: MMStar (Chen et al., 2024a), Science QA (Lu et al., 2022), MMVet (Yu et al., 2023), POPE (Li et al., 2023c), MMBench (Liu et al., 2023), Math Vista (Lu et al., 2023), AI2D (Kembhavi et al., 2016), and OCRBench (Liu et al., 2024c).
Researcher Affiliation	Collaboration	Ziyu Liu1,2, Yuhang Zang2B, Xiaoyi Dong2, Pan Zhang2, Yuhang Cao2, Haodong Duan2, Conghui He2, Yuanjun Xiong4, Dahua Lin2,3,6, Jiaqi Wang2,5B 1 SJTU, 2 Shanghai AI Laboratory, 3CUHK, 4 MThreads, Inc, 5 Shanghai Innovation Institute, 6 CPII under Inno HK EMAIL, EMAIL
Pseudocode	No	The paper describes the MIA-DPO framework through textual explanations, diagrams (Figures 1, 3, 4), and mathematical formulations (Equations 1, 2, 3, 5). However, it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Github: https://github.com/Liuziyu77/MIA-DPO
Open Datasets	Yes	we efficiently convert existing single-image datasets, such as LLa VA-665k (Liu et al., 2024a). [...] We evaluate our method on the following representative benchmarks. First, we select five multi-image benchmarks: MMMU (Yue et al., 2024), BLINK (Fu et al., 2024), Mantis (Jiang et al., 2024), NLVR2 (Suhr et al., 2018), and MVBench (Li et al., 2024c). [...] Subsequently, we also test the model on several single-image benchmarks: MMStar (Chen et al., 2024a), Science QA (Lu et al., 2022), MMVet (Yu et al., 2023), POPE (Li et al., 2023c), MMBench (Liu et al., 2023), Math Vista (Lu et al., 2023), AI2D (Kembhavi et al., 2016), and OCRBench (Liu et al., 2024c).
Dataset Splits	No	In constructing our MIA-DPO dataset with three types of multi-image data (Sequence Data, Grid Collage Data, and Pic-in-Pic Data), we used the LLa Va665k (Liu et al., 2024b) dataset as the foundational single-image data. [...] The final data volume used for DPO is summarized in Tab. 8. [...] We constructed a VQA test set of 500 questions using images and questions from LLa VA-665k but are mutually exclusive with the MIA-DPO training data.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments. It mentions 'All single-image experimental results presented in Tab 2 are obtained using the VLMEval Kit (Duan et al., 2024)' but does not detail the hardware setup for these evaluations or for the main training.
Software Dependencies	No	The paper mentions using 'VLMEval Kit' for evaluations and refers to DPO algorithms and models like LLaVA-v1.5 and Intern LM-XC2.5, but it does not specify any software versions for programming languages, libraries, or frameworks (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	The models are trained on 3 epochs, with a learning rate of 5e 5, temperature parameter (in Eq. 3) β = 0.1, and NLL loss coefficient (in Eq. 5) γ = 0.1. For more experimental details, please refer to appendix Sec. A.