reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models

Authors: Wei Dai, Peilin Chen, Malinda Lu, Daniel A Li, Haowen Wei, Hejie Cui, Paul Pu Liang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive empirical evaluation, we demonstrate that multitask pretraining significantly improves performance on understudied domains, achieving up to 29% improvement in ultrasound and 23% in ECG analysis over singletask learning.
Researcher Affiliation	Academia	1Massachusetts Institute of Technology 2Athinoula A. Martinos Center for Biomedical Imaging 3Harvard Medical School 4Stanford University. Correspondence to: Wei Dai <EMAIL>.
Pseudocode	No	The paper describes the CLIMB framework and experimental procedures but does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is released at this link.
Open Datasets	Yes	CLIMB unifies diverse public clinical datasets into a unified benchmark designed specifically for developing and evaluating multimodal medical AI systems.
Dataset Splits	Yes	Split: For multitask training, we use the Bench MD split, which includes label remapping to 7 diagnostic categories. This split consists of 17,476 records in the training set and 4,361 records in the test set, totaling 21,837 records.
Hardware Specification	Yes	All experiments are ran on a GPU server with 8x H200 141GB GPUs.
Software Dependencies	No	All experiments were conducted using the Py Torch framework.
Experiment Setup	Yes	Depending on the model sizes, we use a parameter search to identify the optimal learning rate from 1e-5 to 1e-3 for all experiments. The weight decay was set to 1e-3.