S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Authors: Yimu Pan, Sitao Zhang, Alison D. Gernand, Jeffery A. Goldstein, James Z. Wang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we validate the superiority of our approach in improving segmentation performance under diverse conditions.
Researcher Affiliation Academia 1The Pennsylvania State University, University Park 2Northwestern University, Chicago
Pseudocode No The paper includes 'Figure 2: Illustration of the proposed S2S2 framework' which is a diagram illustrating the method, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code Yes Code https://github.com/ymp5078/Semantic-Stacking
Open Datasets Yes For RGB images, we utilized two polyp segmentation datasets: CVC-Clinic DB (Bernal et al. 2015) and Kvasir SEG (Jha et al. 2020). CVC-Clinic DB comprises 612 labeled images, while Kvasir-SEG includes 1,000 labeled images. These datasets, originating from distinct sites and captured using different devices, provide variability in the data. The processing of RGB datasets adhered to the methods described in previous studies (Sanderson and Matuszewski 2022). For CT images, we evaluated using the Synapse multi-organ segmentation dataset 1, which includes 30 abdominal CT scans with comprehensive annotations for multi-organ segmentation tasks. In the MRI category, our evaluation encompassed several datasets focused on abdominal and cardiac segmentation. The Combined Healthy Abdominal Organ Segmentation (CHAOS) (Kavur et al. 2021) dataset consists of 20 T2-SPIR MRI images focused on abdominal organ segmentation. For cardiac segmentation, we included a dataset (Zhuang et al. 2022) comprising 45 late gadolinium enhanced (LGE) MRI images and 45 balanced steady-state free precession (b SSFP) MRI images, alongside the Automatic Cardiac Diagnosis Challenge (ACDC) (Bernard et al. 2018) dataset, which features 100 cases of Cine MRI images. 1https://www.synapse.org/#!Synapse:syn3193805/wiki/217789
Dataset Splits No The paper mentions the total number of images in the datasets (e.g., 'CVC-Clinic DB comprises 612 labeled images', 'Kvasir-SEG includes 1,000 labeled images', '30 abdominal CT scans', '20 T2-SPIR MRI images', '100 cases of Cine MRI images'). It also states 'The processing of RGB datasets adhered to the methods described in previous studies (Sanderson and Matuszewski 2022).' However, it does not explicitly provide specific percentages, sample counts, or a detailed methodology for how these datasets were split into training, validation, or test sets within the main text.
Hardware Specification No The acknowledgements section mentions 'cluster computers at the National Center for Supercomputing Applications through an allocation from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program' and 'Extreme Science and Engineering Discovery Environment (XSEDE)'. This indicates the use of high-performance computing resources but does not provide specific hardware details like GPU or CPU models.
Software Dependencies No The paper mentions using 'Stable Diffusion model (Rombach et al. 2022)' and 'Control Net (Zhang, Rao, and Agrawala 2023)' for image generation. However, it does not provide specific version numbers for these or other key software components (e.g., programming languages, deep learning frameworks like PyTorch or TensorFlow, or CUDA versions) used for running the experiments.
Experiment Setup Yes Synthetic images were generated using Stable Diffusion 2.5 fine-tuned on training images with segmentation-map-controlled Control Net for 100 epochs. Further details are provided in the Appendix. The final loss function is formulated as L = Lseg + αenc Lenc sc + αdec Ldec sc , (6) where Lseg represents the segmentation loss derived from any chosen method. The variables αenc and αdec are the weights for the consistency losses. For simplicity, we define the distance function as D(ti, tj) = 1 Cos Sim(ti, tj) where Cos Sim is cosine similarity. From the analysis presented in Fig. 5, it is observed that αenc exerts a relatively consistent influence on in-domain performance, with the most notable improvement in out-domain performance is observed at αenc = 0.4.