reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Radiology Report Generation via Multi-objective Preference Optimization

Authors: Ting Xiao, Lei Shi, Peng Liu, Zhe Wang, Chenjia Bai

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on two public datasets show the proposed method can generate reports that cater to different preferences in a single model and achieve state-of-the-art performance. Experiments Experiment Settings Datasets. We evaluate our method on two public datasets, IU-Xray (Demner-Fushman et al. 2016) and MIMIC-CXR (Johnson et al. 2019).
Researcher Affiliation	Collaboration	1East China University of Science and Technology 2Harbin Institute of Technology 3Institute of Artificial Intelligence (Tele AI), China Telecom EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology using prose, mathematical formulas, and block diagrams (Figure 1), but does not contain any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access information for source code, such as a repository link, an explicit code release statement, or mention of code in supplementary materials.
Open Datasets	Yes	We evaluate our method on two public datasets, IU-Xray (Demner-Fushman et al. 2016) and MIMIC-CXR (Johnson et al. 2019). IU-Xray consists of 7,470 chest Xray images, accompanied by 3,955 reports. MIMIC-CXR, the largest publicly available dataset for RRG, contains 337,110 chest X-ray images and 227,835 corresponding reports.
Dataset Splits	Yes	Datasets are randomly split into 7:1:2 for train, val, and test. MIMIC-CXR, the largest publicly available dataset for RRG, contains 337,110 chest X-ray images and 227,835 corresponding reports. We adhere to the official dataset splits to ensure a fair comparison.
Hardware Specification	Yes	Our method was implemented in Py Torch and trained on an NVIDIA 4090 GPU with 24GB of memory.
Software Dependencies	No	The paper mentions 'Py Torch' but does not specify a version number for this or any other software dependency, which is required for reproducible software details.
Experiment Setup	Yes	The initial learning rate for Res Net101 and remaining networks are 1 10 6 and 1 10 5, respectively. We use the Adam optimizer for training and include a beam search of width 3. Maximum report lengths are set to 60 words for IU-Xray and 100 words for MIMIC-CXR. Our model undergoes Maximum Likelihood Estimation(MLE) training for 50 epochs on IU-Xray and 30 epochs on MIMIC-CXR to regularize the action space, followed by an RL training phase using the same optimizer. During training, the sampling interval of the preference vector on both datasets is 0.1, For the IU-Xray dataset, we use NLG metrics as the multi-objective rewards, the batch size is 8, and α = 3. For the MIMICCXR dataset, we combine NLG and CE metrics as the multiobjective rewards, the batch size is 6, and α = 0.5.