HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation

Authors: Tengfei Liu, Jiapu Wang, Yongli Hu, Mingjie Li, Junfei Yi, Xiaojun Chang, Junbin Gao, Baocai Yin

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the Longitudinal-MIMIC dataset demonstrate that our method achieves state-of-the-art performance on most NLG metrics, validating its effectiveness. Additionally, our method achieves superior results compared to other approaches without using historical data during testing and can be adapted to various multimodal large model frameworks, demonstrating strong applicability.
Researcher Affiliation Academia Tengfei Liu1, Jiapu Wang1, Yongli Hu1*, Mingjie Li2, Junfei Yi3, Xiaojun Chang4, Junbin Gao5, Baocai Yin1 1School of Information Science and Technology, Beijing University of Technology, Beijing, China 2Stanford University, Palo Alto CA 94305 USA 3School of Electrical and Information Engineering, Hunan University, Hunan, China 4School of Information Science and Technology, University of Science and Technology of China, Hefei, China 5University of Sydney Business School, The University of Sydney, Camperdown, NSW 2006, Australia EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology using textual explanations and mathematical formulas, but it does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Further implementation details can be found at https://github.com/Tengfei Liu966/HC-LLM.
Open Datasets Yes Dataset: Building on the dataset presented in (Zhu et al. 2023b), we utilized the Longitudinal-MIMIC dataset, which is derived from MIMIC-CXR, for our evaluation.
Dataset Splits Yes The dataset was divided into training (26,156 patients and 92,374 samples), validation (203 patients and 737 samples), and test (266 patients and 2,058 samples) sets.
Hardware Specification Yes The training process was executed on a single NVIDIA A800 80GB GPU using mixed precision for 5 epochs on the Longitudinal-MIMIC dataset, with a minibatch size of 4 and a learning rate of 1e-4.
Software Dependencies No The paper mentions specific models like Swin Transformer, LLAMA2-7B, and Bio Med GPT-LM-7B, along with links to their Hugging Face repositories. However, it does not provide specific version numbers for the underlying software libraries or programming environments (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes The coefficients were set to β1 = 1.0, β2 = 0.8, and β3 = 1.0, respectively. The training process was executed on a single NVIDIA A800 80GB GPU using mixed precision for 5 epochs on the Longitudinal-MIMIC dataset, with a minibatch size of 4 and a learning rate of 1e-4. For the testing phase, we employed a beam search strategy with a beam size of 3.