reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks

Authors: Lehan Wang, Haonan Wang, Honglong Yang, Jiaji Mao, Zehong Yang, Jun Shen, Xiaomeng Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that our model can not only accomplish powerful performance across various medical vision-language tasks in bilingual settings, but also recognize and detect structures in multimodal medical scans, boosting the interpretability and user interactivity of medical MLLMs.
Researcher Affiliation	Academia	1The Hong Kong University of Science and Technology. 2Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University. Corresponding to Xiaomeng Li (EMAIL).
Pseudocode	Yes	Algorithm 1 Region-Aligned Evaluation
Open Source Code	No	Our project page is https://medrega.github.io. The paper provides a project page URL, which typically serves as a demonstration or overview, rather than explicitly stating it hosts the source code for the described methodology.
Open Datasets	Yes	We first formulate Region-Centric tasks and construct a largescale dataset, Med Reg Instruct...Combining our collected dataset with other medical multimodal corpora for training...MIMIC-CXR dataset (Johnson et al., 2019), and our in-house clinical data...The Region-Text dataset is sourced from SA-Med2D-20M (Ye et al., 2023)...
Dataset Splits	Yes	For the MIMIC-CXR dataset, we follow previous works (Wu et al., 2023) to utilize both frontal and lateral images...For our in-house dataset, we extract central slices from each 3D scan to formulate the 2D inputs...Following the official split, we use 45,000 samples for training. For single-label classification, Med Reg A outperforms existing models by a large margin from 15.32% to 30.98%.
Hardware Specification	Yes	The model is trained on 16 NVIDIA H800 GPUs for 1 epoch in the alignment stage and 2 epochs in the instruction tuning stage.
Software Dependencies	Yes	We employ Intern VL 1.2 (Chen et al., 2024b) as our general-domain foundation to begin training, which is composed of Intern Vi T-6B as the vision encoder, and Nous-Hermes-2-Yi-34B as the language model...We follow the official instruction for finetuning Intern VL, and leverage Lo RA with Deep Speed Ze RO Stage 3 to optimize model parameters.
Experiment Setup	Yes	Our training process is divided into two steps: alignment training and instruction tuning. During the alignment training phase, we freeze the vision encoder and language model, only fine-tuning the alignment module with medical image captioning datasets...In the instruction tuning stage, we apply both public datasets and our Region-Centric datasets, Med Reg Instruct, to optimize the language model, while keeping the other components unchanged. The language model loss is applied as the loss function...The model is trained on 16 NVIDIA H800 GPUs for 1 epoch in the alignment stage and 2 epochs in the instruction tuning stage.