Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval
Authors: Pengcheng Jiang, Cao (Danica) Xiao, Minhao Jiang, Parminder Bhatia, Taha Kass-Hout, Jimeng Sun, Jiawei Han
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions. |
| Researcher Affiliation | Collaboration | UIUC GE Health Care |
| Pseudocode | Yes | Algorithm 1 Dynamic Graph Retrieval and Augmentation |
| Open Source Code | Yes | 2Our code is available at: https://github.com/pat-jj/KARE |
| Open Datasets | Yes | We utilize the publicly available MIMIC-III (Johnson et al., 2016) (v1.4) and MIMIC-IV (Johnson et al., 2020) (v2.0) EHR datasets |
| Dataset Splits | Yes | Both datasets are split into training, validation, and test sets in a 0.8/0.1/0.1 ratio by patient, ensuring that all samples from the same patient are confined to a single subset, preventing data leakage. |
| Hardware Specification | Yes | The experiments were conducted on a system with an AMD EPYC 7513 32-Core Processor and 1.0 TB of RAM. The setup includes eight NVIDIA A100 80GB PCIe GPUs, each with 81920 MiB of memory, providing a total of 640 GB GPU memory. The system s root partition has 32 GB of storage. |
| Software Dependencies | Yes | Our fine-tuning framework is implemented using the TRL (von Werra et al., 2020), Transformers (Wolf et al., 2020), and Flash Attention-2 (Dao, 2024) Python libraries. We use Mistral-7B-Instruct-v0.3 (Jiang et al., 2023) as our local LLM... For dense retrieval from Pub Med abstracts, we utilize the local embedding model Nomic (dimension = 768) (Nussbaum et al., 2024). We use Amazon Bedrock11 to access the Claude model. The optimal cosine distance thresholds θe and θr are both found to be 0.14, resulting in 513,867 triples in total after clustering. We employ Graspy (Chung et al., 2019) to implement the hierarchical Leiden algorithm, setting the maximum size for each top-level community (max cluster size) to 5. Using Claude 3.5 Sonnet as the LLM, we generate 147,264 community summaries (including both general and theme-specific summaries) with the prompts shown in Figure 12 and 13. |
| Experiment Setup | Yes | Parameter Value model name or path mistralai/Mistral-7B-Instruct-v0.3 torch dtype bfloat16 use flash attention 2 true preprocessing num workers 12 bf16 true gradient accumulation steps 4 gradient checkpointing true learning rate 5.0e-06 max seq length 6000 num train epochs 3 per device train batch size 1 lr scheduler type cosine warmup ratio 0.1 |