reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RRG-Mamba: Efficient Radiology Report Generation with State Space Model

Authors: Xiaodi Hou, Xiaobo Li, Mingyu Lu, Simiao Wang, Yijia Zhang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on publicly available datasets, including IU X-Ray and MIMIC-CXR, demonstrate that RRG-Mamba achieves a 3.7% improvement in BLEU-4 score over existing models, along with significant gains in computational and memory efficiency. Our code is available at https://github.com/Eleanorhxd/RRG-Mamba. Additionally, Sections 5, 5.2, and 5.3 discuss 'Experiments and Analysis', 'Main Experiment', and 'Ablation Study' respectively, presenting performance tables and figures.
Researcher Affiliation	Academia	Xiaodi Hou1 , Xiaobo Li2 , Mingyu Lu1 , Simiao Wang1 and Yijia Zhang2 1School of Artificial Intelligence, Dalian Maritime University 2School of Information Science and Technology, Dalian Maritime University EMAIL, EMAIL
Pseudocode	No	The paper describes the model architecture and its components using mathematical equations and block diagrams (Figure 2), but it does not include any explicit pseudocode blocks or algorithm sections detailing the step-by-step procedure in a structured, code-like format.
Open Source Code	Yes	Our code is available at https://github.com/Eleanorhxd/RRG-Mamba.
Open Datasets	Yes	We evaluate our model on two publicly available RRG datasets: IU X-Ray [Shin et al., 2016] and MIMIC-CXR [Johnson et al., 2019].
Dataset Splits	No	The paper mentions the use of 'publicly available RRG datasets: IU X-Ray [Shin et al., 2016] and MIMIC-CXR [Johnson et al., 2019]' but does not explicitly provide the specific percentages or sample counts for training, validation, and test splits used in their experiments. It does not refer to a standard split by citation within the paper, nor does it specify any splitting methodology like cross-validation or stratified splitting.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for conducting the experiments. While it discusses computational efficiency, it lacks concrete information about the physical computing resources.
Software Dependencies	No	The paper mentions using specific pre-trained models like "Dense Net-121 [Huang et al., 2017]" and "Res Net-101 [He et al., 2016]" as visual encoders, and refers to "Chexpert [Irvin et al., 2019]" for pseudo-label generation. However, it does not specify any general software dependencies or programming languages with their version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experimental environment.
Experiment Setup	Yes	We configure the word identification ratio as k=0.5, thereby controlling the proportion of significant words. According to [Gu et al., 2022], we design three versions of GDLM with different structures (tiny, samll and base) to explore the impact of model capacity on RRG-Mamba.