reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Authors: Yimu Pan, Sitao Zhang, Alison D. Gernand, Jeffery A. Goldstein, James Z. Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we validate the superiority of our approach in improving segmentation performance under diverse conditions.
Researcher Affiliation	Academia	1The Pennsylvania State University, University Park 2Northwestern University, Chicago
Pseudocode	No	The paper includes 'Figure 2: Illustration of the proposed S2S2 framework' which is a diagram illustrating the method, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code	Yes	Code https://github.com/ymp5078/Semantic-Stacking
Open Datasets	Yes	For RGB images, we utilized two polyp segmentation datasets: CVC-Clinic DB (Bernal et al. 2015) and Kvasir SEG (Jha et al. 2020). CVC-Clinic DB comprises 612 labeled images, while Kvasir-SEG includes 1,000 labeled images. These datasets, originating from distinct sites and captured using different devices, provide variability in the data. The processing of RGB datasets adhered to the methods described in previous studies (Sanderson and Matuszewski 2022). For CT images, we evaluated using the Synapse multi-organ segmentation dataset 1, which includes 30 abdominal CT scans with comprehensive annotations for multi-organ segmentation tasks. In the MRI category, our evaluation encompassed several datasets focused on abdominal and cardiac segmentation. The Combined Healthy Abdominal Organ Segmentation (CHAOS) (Kavur et al. 2021) dataset consists of 20 T2-SPIR MRI images focused on abdominal organ segmentation. For cardiac segmentation, we included a dataset (Zhuang et al. 2022) comprising 45 late gadolinium enhanced (LGE) MRI images and 45 balanced steady-state free precession (b SSFP) MRI images, alongside the Automatic Cardiac Diagnosis Challenge (ACDC) (Bernard et al. 2018) dataset, which features 100 cases of Cine MRI images. 1https://www.synapse.org/#!Synapse:syn3193805/wiki/217789
Dataset Splits	No	The paper mentions the total number of images in the datasets (e.g., 'CVC-Clinic DB comprises 612 labeled images', 'Kvasir-SEG includes 1,000 labeled images', '30 abdominal CT scans', '20 T2-SPIR MRI images', '100 cases of Cine MRI images'). It also states 'The processing of RGB datasets adhered to the methods described in previous studies (Sanderson and Matuszewski 2022).' However, it does not explicitly provide specific percentages, sample counts, or a detailed methodology for how these datasets were split into training, validation, or test sets within the main text.
Hardware Specification	No	The acknowledgements section mentions 'cluster computers at the National Center for Supercomputing Applications through an allocation from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program' and 'Extreme Science and Engineering Discovery Environment (XSEDE)'. This indicates the use of high-performance computing resources but does not provide specific hardware details like GPU or CPU models.
Software Dependencies	No	The paper mentions using 'Stable Diffusion model (Rombach et al. 2022)' and 'Control Net (Zhang, Rao, and Agrawala 2023)' for image generation. However, it does not provide specific version numbers for these or other key software components (e.g., programming languages, deep learning frameworks like PyTorch or TensorFlow, or CUDA versions) used for running the experiments.
Experiment Setup	Yes	Synthetic images were generated using Stable Diffusion 2.5 fine-tuned on training images with segmentation-map-controlled Control Net for 100 epochs. Further details are provided in the Appendix. The final loss function is formulated as L = Lseg + αenc Lenc sc + αdec Ldec sc , (6) where Lseg represents the segmentation loss derived from any chosen method. The variables αenc and αdec are the weights for the consistency losses. For simplicity, we define the distance function as D(ti, tj) = 1 Cos Sim(ti, tj) where Cos Sim is cosine similarity. From the analysis presented in Fig. 5, it is observed that αenc exerts a relatively consistent influence on in-domain performance, with the most notable improvement in out-domain performance is observed at αenc = 0.4.