reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

Authors: Yunfei Xie, Ce Zhou, Lang Gao, Juncheng Wu, Xianhang Li, Hong-Yu Zhou, Sheng Liu, Lei Xing, James Y Zou, Cihang Xie, Yuyin Zhou

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose LLaVA-Tri by pretraining LLaVA on Med Trinity-25M, achieving state-of-the-art performance on VQA-RAD, SLAKE, and Path VQA, surpassing representative SOTA multimodal large language models. Furthermore, Med Trinity-25M can also be utilized to support large-scale pre-training of multimodal medical AI models, contributing to the development of future foundation models in the medical domain.
Researcher Affiliation	Academia	1Huazhong University of Science and Technology 2UC Santa Cruz 3Harvard University 4Stanford University
Pseudocode	No	The paper describes the data construction pipeline in Section 3.2 and illustrates it in Figure 2, but it does not include a dedicated section or figure explicitly labeled 'Pseudocode' or 'Algorithm'. Figure 14 shows a 'Prompt used to generate multigranular annotations', which is a template for MLLM input, not traditional pseudocode for an algorithm.
Open Source Code	No	The abstract states: "The dataset is publicly available at https://yunfeixie233.github.io/Med Trinity-25M/". While the dataset is open, the paper does not explicitly provide a link or statement for the open-sourcing of the LLaVA-Tri model's implementation code or the data construction pipeline's code.
Open Datasets	Yes	The dataset is publicly available at https://yunfeixie233.github.io/Med Trinity-25M/.
Dataset Splits	Yes	We conduct extensive evaluations across three external medical visual QA datasets representing different sub-pathologies. LLaVA-Tri achieved state-of-the-art results in all three VQA benchmarks, with 81.6% accuracy on VQA-RAD (Lau et al., 2018b), 87.8% on SLAKE (Liu et al., 2021), and 82.8% on Path VQA (He et al., 2020a). The model is fine-tuned for three epochs on each of the three VQA datasets and evaluated accordingly.
Hardware Specification	No	We thank the Microsoft Accelerate Foundation Models Research Program, the Open AI Researcher Access Program, TPU Research Cloud (TRC) program, Google Cloud Research Credits program, AWS Cloud Credit for Research program, and Lambda Cloud for supporting our computing needs.
Software Dependencies	No	The paper mentions several models and libraries such as LLaVA (Liu et al., 2024), LLaMA3 (Team, 2024), Med-CPT (Jin et al., 2023), and Faiss (Johnson et al., 2019c), but it does not specify explicit version numbers for these software components.
Experiment Setup	Yes	The model is fine-tuned for three epochs on each of the three VQA datasets and evaluated accordingly.