Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval

Authors: Pengcheng Jiang, Cao (Danica) Xiao, Minhao Jiang, Parminder Bhatia, Taha Kass-Hout, Jimeng Sun, Jiawei Han

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
Researcher Affiliation Collaboration UIUC GE Health Care
Pseudocode Yes Algorithm 1 Dynamic Graph Retrieval and Augmentation
Open Source Code Yes 2Our code is available at: https://github.com/pat-jj/KARE
Open Datasets Yes We utilize the publicly available MIMIC-III (Johnson et al., 2016) (v1.4) and MIMIC-IV (Johnson et al., 2020) (v2.0) EHR datasets
Dataset Splits Yes Both datasets are split into training, validation, and test sets in a 0.8/0.1/0.1 ratio by patient, ensuring that all samples from the same patient are confined to a single subset, preventing data leakage.
Hardware Specification Yes The experiments were conducted on a system with an AMD EPYC 7513 32-Core Processor and 1.0 TB of RAM. The setup includes eight NVIDIA A100 80GB PCIe GPUs, each with 81920 MiB of memory, providing a total of 640 GB GPU memory. The system s root partition has 32 GB of storage.
Software Dependencies Yes Our fine-tuning framework is implemented using the TRL (von Werra et al., 2020), Transformers (Wolf et al., 2020), and Flash Attention-2 (Dao, 2024) Python libraries. We use Mistral-7B-Instruct-v0.3 (Jiang et al., 2023) as our local LLM... For dense retrieval from Pub Med abstracts, we utilize the local embedding model Nomic (dimension = 768) (Nussbaum et al., 2024). We use Amazon Bedrock11 to access the Claude model. The optimal cosine distance thresholds θe and θr are both found to be 0.14, resulting in 513,867 triples in total after clustering. We employ Graspy (Chung et al., 2019) to implement the hierarchical Leiden algorithm, setting the maximum size for each top-level community (max cluster size) to 5. Using Claude 3.5 Sonnet as the LLM, we generate 147,264 community summaries (including both general and theme-specific summaries) with the prompts shown in Figure 12 and 13.
Experiment Setup Yes Parameter Value model name or path mistralai/Mistral-7B-Instruct-v0.3 torch dtype bfloat16 use flash attention 2 true preprocessing num workers 12 bf16 true gradient accumulation steps 4 gradient checkpointing true learning rate 5.0e-06 max seq length 6000 num train epochs 3 per device train batch size 1 lr scheduler type cosine warmup ratio 0.1