Visual and Domain Knowledge for Professional-level Graph-of-Thought Medical Reasoning
Authors: Rina Bao, Shilong Dong, Zhenfang Chen, Sheng He, Ellen Grant, Yangming Ou
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation of current large vision-language models (LVLMs) shows limited performance on this benchmark, highlighting both the challenges of the task and the importance of this benchmark for advancing medical AI. Furthermore, we propose a novel Clinical Graph of Thoughts model, which integrates domain-specific medical knowledge and clinical reasoning processes with the interpretive abilities of LVLMs. The model demonstrates promising results, achieving around 15% absolute gain on the most important neurocognitive outcome task |
| Researcher Affiliation | Collaboration | 1Boston Children s Hospital and Harvard Medical School, Boston, USA 2New York University 3MIT-IBM Watson AI Lab. Correspondence to: Zhenfang Chen <EMAIL>, Yangming Ou <EMAIL>. |
| Pseudocode | No | The paper describes the 'Clinical Graph of Thought Model' and its reasoning flow, but it does not present this as a structured pseudocode block or algorithm. |
| Open Source Code | Yes | Project page: https://github.com/ i3-research/HIE-Reasoning |
| Open Datasets | Yes | The HIE-Reasoning is the first publicly available HIE dataset that integrates MRIs, clinical information, neurocognitive outcomes, and includes question-answer (QA) pairs along with comprehensive MRI interpretation summaries. |
| Dataset Splits | No | The paper states the total number of individuals and QA pairs in the dataset (133 individuals, 749 QA pairs) but does not provide specific training, validation, or test splits for this dataset used in experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions several baseline LVLMs and the DRAMMS tool but does not specify version numbers for any software dependencies or libraries crucial for replication. |
| Experiment Setup | No | For the evaluated LVLMs, the paper states: 'All settings and hyperparameters are configured according to the specifications of the released versions.' However, no specific hyperparameters (e.g., learning rate, batch size, number of epochs) are provided for the proposed CGoT model or the general experimental setup. |