reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MedRAX: Medical Reasoning Agent for Chest X-ray

Authors: Adibvafa Fallahpour, Jun Ma, Alif Munim, Hongwei Lyu, Bo Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that Med RAX achieves state-of-the-art performance compared to both open-source and proprietary models, representing a significant step toward the practical deployment of automated CXR interpretation systems. ... 5. Experiments
Researcher Affiliation	Collaboration	1Department of Computer Science, University of Toronto, Toronto, Canada 2Vector Institute, Toronto, Canada 3University Health Network, Toronto, Canada 4Cohere, Toronto, Canada 5Cohere Labs, Toronto, Canada 6Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada.
Pseudocode	Yes	Algorithm 1 Med RAX Re Act Framework
Open Source Code	Yes	Data and code have been publicly available at https: //github.com/bowang-lab/Med RAX.
Open Datasets	Yes	To rigorously evaluate its capabilities, we introduce Chest Agent Bench, a comprehensive benchmark containing 2,500 complex medical queries across 7 diverse categories. ... We utilize Eurorad, the largest peer-reviewed radiological case report database maintained by the European Society of Radiology (ESR). ... MIMIC-CXR Radiology Report Generation... SLAKE VQA, which evaluates medical visual question answering...
Dataset Splits	No	The paper primarily describes evaluation on established test sets for benchmarks like MIMIC-CXR (test set) and SLAKE VQA (test samples), and introduces a new evaluation benchmark (Chest Agent Bench) without providing explicit training, validation, and test splits for a model trained by the authors. Med RAX is an agent framework that integrates pre-trained models.
Hardware Specification	Yes	Med RAX uses GPT-4o as its backbone LLM, and we deploy it on a single NVIDIA RTX 6000 GPU using the same configuration as described in Section 3.
Software Dependencies	No	The paper states: "Med RAX is built on the Lang Chain and Lang Graph frameworks." and "Med RAX uses GPT-4o as its backbone LLM". However, it does not provide specific version numbers for these frameworks or any other software libraries or programming languages used.
Experiment Setup	Yes	The algorithm implements a Re Act (Reasoning and Acting) loop... Input: ...tmax: Maximum allowed time. ...Med RAX employs the following system prompt to guide the reasoning engine: You are an expert medical AI assistant who can answer any medical questions and analyze medical images similar to a doctor. Solve using your own vision and reasoning and use tools to complement your reasoning...