reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generative Medical Segmentation

Authors: Jiayu Huo, Xi Ouyang, Sébastien Ourselin, Rachel Sparks

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental analysis across five open-source datasets in different medical imaging domains demonstrates GMS outperforms existing discriminative and generative segmentation models.
Researcher Affiliation	Collaboration	1School of Biomedical Engineering and Imaging Sciences (BMEIS), King s College London, London, UK 2Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
Pseudocode	No	The paper describes the model architecture and mathematical formulations for its components and loss functions, but it does not provide a clearly labeled pseudocode block or algorithm steps in a structured format.
Open Source Code	Yes	Code https://github.com/King-HAW/GMS
Open Datasets	Yes	We evaluated the performance of GMS on five public datasets: BUS, BUSI, Gla S, HAM10000, and Kvasir Instrument. BUS (Yap et al. 2017) and BUSI (Al-Dhabyani et al. 2020) are breast lesion ultrasound datasets that contain 163 and 647 images, respectively. Gla S (Sirinukunwattana et al. 2017) is a colon histology segmentation challenge dataset divided into 85 images for training and 80 images for testing. HAM10000 (Tschandl, Rosendahl, and Kittler 2018) is a large dermatoscopic dataset that consists of 10015 images with skin lesion segmentation masks. The Kvasir Instrument dataset (Jha et al. 2021) contains 590 endoscopic images with tool segmentation masks.
Dataset Splits	Yes	For all datasets except Gla S, we randomly select 80% of the images for training and the remaining 20% for testing. We keep the official training and testing set split of Gla S.
Hardware Specification	Yes	Our framework is implemented using Py Torch v1.13, and all model training was performed on an NVIDIA A100 40G GPU.
Software Dependencies	Yes	Our framework is implemented using Py Torch v1.13, and all model training was performed on an NVIDIA A100 40G GPU.
Experiment Setup	Yes	We use Adam W (Loshchilov and Hutter 2019) as the training optimizer. We utilize the cosine annealing learning rate scheduler to adjust the learning rate in each epoch with the initial learning rate set to 2e 3. For all experiments, the batch size was set to 8 and the total training epochs were 1000. The input image is resized to 224 224, and on-the-fly data augmentations were performed during training including random flip, random rotation, and color jittering in the HSV domain. We set a threshold of 0.5 binarize the predicted values.