reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation

Authors: Tengfei Liu, Jiapu Wang, Yongli Hu, Mingjie Li, Junfei Yi, Xiaojun Chang, Junbin Gao, Baocai Yin

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on the Longitudinal-MIMIC dataset demonstrate that our method achieves state-of-the-art performance on most NLG metrics, validating its effectiveness. Additionally, our method achieves superior results compared to other approaches without using historical data during testing and can be adapted to various multimodal large model frameworks, demonstrating strong applicability.
Researcher Affiliation	Academia	Tengfei Liu1, Jiapu Wang1, Yongli Hu1*, Mingjie Li2, Junfei Yi3, Xiaojun Chang4, Junbin Gao5, Baocai Yin1 1School of Information Science and Technology, Beijing University of Technology, Beijing, China 2Stanford University, Palo Alto CA 94305 USA 3School of Electrical and Information Engineering, Hunan University, Hunan, China 4School of Information Science and Technology, University of Science and Technology of China, Hefei, China 5University of Sydney Business School, The University of Sydney, Camperdown, NSW 2006, Australia EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology using textual explanations and mathematical formulas, but it does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Further implementation details can be found at https://github.com/Tengfei Liu966/HC-LLM.
Open Datasets	Yes	Dataset: Building on the dataset presented in (Zhu et al. 2023b), we utilized the Longitudinal-MIMIC dataset, which is derived from MIMIC-CXR, for our evaluation.
Dataset Splits	Yes	The dataset was divided into training (26,156 patients and 92,374 samples), validation (203 patients and 737 samples), and test (266 patients and 2,058 samples) sets.
Hardware Specification	Yes	The training process was executed on a single NVIDIA A800 80GB GPU using mixed precision for 5 epochs on the Longitudinal-MIMIC dataset, with a minibatch size of 4 and a learning rate of 1e-4.
Software Dependencies	No	The paper mentions specific models like Swin Transformer, LLAMA2-7B, and Bio Med GPT-LM-7B, along with links to their Hugging Face repositories. However, it does not provide specific version numbers for the underlying software libraries or programming environments (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	The coefficients were set to β1 = 1.0, β2 = 0.8, and β3 = 1.0, respectively. The training process was executed on a single NVIDIA A800 80GB GPU using mixed precision for 5 epochs on the Longitudinal-MIMIC dataset, with a minibatch size of 4 and a learning rate of 1e-4. For the testing phase, we employed a beam search strategy with a beam size of 3.