Visual and Domain Knowledge for Professional-level Graph-of-Thought Medical Reasoning

Authors: Rina Bao, Shilong Dong, Zhenfang Chen, Sheng He, Ellen Grant, Yangming Ou

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation of current large vision-language models (LVLMs) shows limited performance on this benchmark, highlighting both the challenges of the task and the importance of this benchmark for advancing medical AI. Furthermore, we propose a novel Clinical Graph of Thoughts model, which integrates domain-specific medical knowledge and clinical reasoning processes with the interpretive abilities of LVLMs. The model demonstrates promising results, achieving around 15% absolute gain on the most important neurocognitive outcome task
Researcher Affiliation Collaboration 1Boston Children s Hospital and Harvard Medical School, Boston, USA 2New York University 3MIT-IBM Watson AI Lab. Correspondence to: Zhenfang Chen <EMAIL>, Yangming Ou <EMAIL>.
Pseudocode No The paper describes the 'Clinical Graph of Thought Model' and its reasoning flow, but it does not present this as a structured pseudocode block or algorithm.
Open Source Code Yes Project page: https://github.com/ i3-research/HIE-Reasoning
Open Datasets Yes The HIE-Reasoning is the first publicly available HIE dataset that integrates MRIs, clinical information, neurocognitive outcomes, and includes question-answer (QA) pairs along with comprehensive MRI interpretation summaries.
Dataset Splits No The paper states the total number of individuals and QA pairs in the dataset (133 individuals, 749 QA pairs) but does not provide specific training, validation, or test splits for this dataset used in experiments.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments.
Software Dependencies No The paper mentions several baseline LVLMs and the DRAMMS tool but does not specify version numbers for any software dependencies or libraries crucial for replication.
Experiment Setup No For the evaluated LVLMs, the paper states: 'All settings and hyperparameters are configured according to the specifications of the released versions.' However, no specific hyperparameters (e.g., learning rate, batch size, number of epochs) are provided for the proposed CGoT model or the general experimental setup.