Radiology Report Generation via Multi-objective Preference Optimization

Authors: Ting Xiao, Lei Shi, Peng Liu, Zhe Wang, Chenjia Bai

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on two public datasets show the proposed method can generate reports that cater to different preferences in a single model and achieve state-of-the-art performance. Experiments Experiment Settings Datasets. We evaluate our method on two public datasets, IU-Xray (Demner-Fushman et al. 2016) and MIMIC-CXR (Johnson et al. 2019).
Researcher Affiliation Collaboration 1East China University of Science and Technology 2Harbin Institute of Technology 3Institute of Artificial Intelligence (Tele AI), China Telecom EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology using prose, mathematical formulas, and block diagrams (Figure 1), but does not contain any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information for source code, such as a repository link, an explicit code release statement, or mention of code in supplementary materials.
Open Datasets Yes We evaluate our method on two public datasets, IU-Xray (Demner-Fushman et al. 2016) and MIMIC-CXR (Johnson et al. 2019). IU-Xray consists of 7,470 chest Xray images, accompanied by 3,955 reports. MIMIC-CXR, the largest publicly available dataset for RRG, contains 337,110 chest X-ray images and 227,835 corresponding reports.
Dataset Splits Yes Datasets are randomly split into 7:1:2 for train, val, and test. MIMIC-CXR, the largest publicly available dataset for RRG, contains 337,110 chest X-ray images and 227,835 corresponding reports. We adhere to the official dataset splits to ensure a fair comparison.
Hardware Specification Yes Our method was implemented in Py Torch and trained on an NVIDIA 4090 GPU with 24GB of memory.
Software Dependencies No The paper mentions 'Py Torch' but does not specify a version number for this or any other software dependency, which is required for reproducible software details.
Experiment Setup Yes The initial learning rate for Res Net101 and remaining networks are 1 10 6 and 1 10 5, respectively. We use the Adam optimizer for training and include a beam search of width 3. Maximum report lengths are set to 60 words for IU-Xray and 100 words for MIMIC-CXR. Our model undergoes Maximum Likelihood Estimation(MLE) training for 50 epochs on IU-Xray and 30 epochs on MIMIC-CXR to regularize the action space, followed by an RL training phase using the same optimizer. During training, the sampling interval of the preference vector on both datasets is 0.1, For the IU-Xray dataset, we use NLG metrics as the multi-objective rewards, the batch size is 8, and α = 3. For the MIMICCXR dataset, we combine NLG and CE metrics as the multiobjective rewards, the batch size is 6, and α = 0.5.