MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

Authors: Yunfei Xie, Ce Zhou, Lang Gao, Juncheng Wu, Xianhang Li, Hong-Yu Zhou, Sheng Liu, Lei Xing, James Y Zou, Cihang Xie, Yuyin Zhou

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose LLaVA-Tri by pretraining LLaVA on Med Trinity-25M, achieving state-of-the-art performance on VQA-RAD, SLAKE, and Path VQA, surpassing representative SOTA multimodal large language models. Furthermore, Med Trinity-25M can also be utilized to support large-scale pre-training of multimodal medical AI models, contributing to the development of future foundation models in the medical domain.
Researcher Affiliation Academia 1Huazhong University of Science and Technology 2UC Santa Cruz 3Harvard University 4Stanford University
Pseudocode No The paper describes the data construction pipeline in Section 3.2 and illustrates it in Figure 2, but it does not include a dedicated section or figure explicitly labeled 'Pseudocode' or 'Algorithm'. Figure 14 shows a 'Prompt used to generate multigranular annotations', which is a template for MLLM input, not traditional pseudocode for an algorithm.
Open Source Code No The abstract states: "The dataset is publicly available at https://yunfeixie233.github.io/Med Trinity-25M/". While the dataset is open, the paper does not explicitly provide a link or statement for the open-sourcing of the LLaVA-Tri model's implementation code or the data construction pipeline's code.
Open Datasets Yes The dataset is publicly available at https://yunfeixie233.github.io/Med Trinity-25M/.
Dataset Splits Yes We conduct extensive evaluations across three external medical visual QA datasets representing different sub-pathologies. LLaVA-Tri achieved state-of-the-art results in all three VQA benchmarks, with 81.6% accuracy on VQA-RAD (Lau et al., 2018b), 87.8% on SLAKE (Liu et al., 2021), and 82.8% on Path VQA (He et al., 2020a). The model is fine-tuned for three epochs on each of the three VQA datasets and evaluated accordingly.
Hardware Specification No We thank the Microsoft Accelerate Foundation Models Research Program, the Open AI Researcher Access Program, TPU Research Cloud (TRC) program, Google Cloud Research Credits program, AWS Cloud Credit for Research program, and Lambda Cloud for supporting our computing needs.
Software Dependencies No The paper mentions several models and libraries such as LLaVA (Liu et al., 2024), LLaMA3 (Team, 2024), Med-CPT (Jin et al., 2023), and Faiss (Johnson et al., 2019c), but it does not specify explicit version numbers for these software components.
Experiment Setup Yes The model is fine-tuned for three epochs on each of the three VQA datasets and evaluated accordingly.