CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models

Authors: Wei Dai, Peilin Chen, Malinda Lu, Daniel A Li, Haowen Wei, Hejie Cui, Paul Pu Liang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive empirical evaluation, we demonstrate that multitask pretraining significantly improves performance on understudied domains, achieving up to 29% improvement in ultrasound and 23% in ECG analysis over singletask learning.
Researcher Affiliation Academia 1Massachusetts Institute of Technology 2Athinoula A. Martinos Center for Biomedical Imaging 3Harvard Medical School 4Stanford University. Correspondence to: Wei Dai <EMAIL>.
Pseudocode No The paper describes the CLIMB framework and experimental procedures but does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code is released at this link.
Open Datasets Yes CLIMB unifies diverse public clinical datasets into a unified benchmark designed specifically for developing and evaluating multimodal medical AI systems.
Dataset Splits Yes Split: For multitask training, we use the Bench MD split, which includes label remapping to 7 diagnostic categories. This split consists of 17,476 records in the training set and 4,361 records in the test set, totaling 21,837 records.
Hardware Specification Yes All experiments are ran on a GPU server with 8x H200 141GB GPUs.
Software Dependencies No All experiments were conducted using the Py Torch framework.
Experiment Setup Yes Depending on the model sizes, we use a parameter search to identify the optimal learning rate from 1e-5 to 1e-3 for all experiments. The weight decay was set to 1e-3.