LLM-RG4: Flexible and Factual Radiology Report Generation Across Diverse Input Contexts

Authors: Zhuhao Wang, Yihua Sun, Zihan Li, Xuan Yang, Fang Chen, Hongen Liao

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that LLM-RG4 achieves stateof-the-art performance in both clinical efficiency and natural language generation on the MIMIC-RG4 and MIMIC-CXR datasets. We quantitatively demonstrate that our model has minimal input-agnostic hallucinations, whereas current opensource models commonly suffer from this problem.
Researcher Affiliation Academia Zhuhao Wang1, Yihua Sun1, Zihan Li1, Xuan Yang1, Fang Chen2, Hongen Liao1,2* 1School of Biomedical Engineering, Tsinghua University, Beijing, China 2School of Biomedical Engineering, and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Detailed Procedure of Token Weight C Input: Report T = [t1, t2, t3, . . . , t L], Che Xbert fc Output: C = [c1, c2, c3, . . . , c L] 1: Initialize ci = 1 2: Get Y = [y1, y2, y3, . . . , y13] = fc(T) 3: For yj in G: 4: if yj = 1 or 1: then 5: c i = IGi(x) 6: ci = max(ci, c i) 7: Define gk = 1 8: Split C into M sentences Cs = [c1, c2, c3, . . . , c M], cn is the nth sentence s weights with length Ln, cn = [cn 1, cn 2, cn 3, . . . , cn Ln]. 9: if cn i > threshold then 10: cn = λ and λ > 1 11: else 12: cn = 1 13: end if 14: return C
Open Source Code Yes Code https://github.com/zh-Wang-Med/LLM-RG4
Open Datasets Yes We utilize the MIMIC-CXR dataset (Johnson et al. 2019), which is the only publicly available dataset that encompasses both multi-view and longitudinal information, to generate the MIMIC-RG4 dataset.
Dataset Splits Yes Table 1: Percentage (%) of reports with single image no longitudinal setting, that encompass various categories of information. PC: Prior Comparison; PP: Prior Procedure; Comm: Communication; Tr: train; Ts: test. Tr/172.6K 0.30 0.30 0.12 0.00 Val/1.4K 0.07 0.07 0.14 0.00 Ts/2.4K 0.42 0.42 0.04 0.04
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or processing speeds used for running the experiments.
Software Dependencies Yes We adopt RAD-DINO (P erez-Garc ıa et al. 2024) as the image encoder and Biomed VLP-CXRBERT (Boecking et al. 2022) as the text encoder, with Vicuna 7B v1.5 (Chiang et al. 2023) as the text decoder.
Experiment Setup Yes The number of learnable variable tokens in the perceiver is set to 128, threshold is set to 0.4 and λ is set to 1.75. Following LLAVA (Liu et al. 2024b), we employ a two-stage training strategy. Initially, we only train the ATF with sn data to achieve modality alignment. Subsequently, we conduct instruction tuning on the MIMICRG4 dataset, training the ATF, and applying Lo RA (Hu et al. 2021) for fine-tuning Vicuna.