reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SeeDiff: Off-the-Shelf Seeded Mask Generation from Diffusion Models

Authors: Joon Hyun Park, Kumju Jo, Sungyong Baik

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate outstanding performance of segmentation networks trained with our generated image-mask pairs. The results underline the effectiveness of our proposed approach in generating high-quality finegrained pixel-level annotations, without the need for a pretrained segmentation network, text prompt tuning, training a new module, or learning procedures. Experimental Settings Datasets. Following the settings of the previous work Diffu Mask (Wu et al. 2023b), we evaluated our model on the following two datasets Pascal-VOC2012 (Everingham et al. 2010) and Cityscapes (Cordts et al. 2016). Table 1: Semantic segmentation results on VOC 2012 val. Ablation Study In this section, extensive ablation studies are conducted to assess the effectiveness of each proposed module.
Researcher Affiliation	Academia	Joon Hyun Park1, Kumju Jo1, Sungyong Baik1, 2 1 Dept. of Artificial Intelligence, Hanyang University, South Korea 2 Dept. of Data Science, Hanyang University, South Korea EMAIL
Pseudocode	No	The paper describes the method using textual explanations, mathematical equations (e.g., Eq. 1-12), and figures (e.g., Figure 2: Overall framework), but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/BAIKLAB/See Diff.git
Open Datasets	Yes	Datasets. Following the settings of the previous work Diffu Mask (Wu et al. 2023b), we evaluated our model on the following two datasets Pascal-VOC2012 (Everingham et al. 2010) and Cityscapes (Cordts et al. 2016).
Dataset Splits	Yes	In the Pascal VOC-2012 (Everingham et al. 2010) setting, we generate 2k and 3k images per class, using a total number of images (40.0k and 60.0k) identical to those used in previous studies such as Diffusion Dataset (Nguyen et al. 2023) and Diffu Mask (Wu et al. 2023b). Table 1 shows the results of semantic segmentation on the VOC 2012 dataset. Table 2: Module Ablations. We perform ablations of our module with VOC 2012 val, using Mask2Former with Swin B. Table 3: Result of Semantic Segmentation on Cityscapes val.
Hardware Specification	Yes	All experiments, including image generation and evaluation, have been conducted on NVIDIA 4090 RTX GPU.
Software Dependencies	No	The paper mentions using "Stable Diffusion 2-base version" and evaluating with "Mask2Former (Cheng et al. 2022)". However, it does not provide specific version numbers for underlying programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or other software tools that would be needed for full reproducibility.
Experiment Setup	Yes	We utilize the Stable Diffusion 2-base version to generate images with T = 50 timesteps as denoising step. We utilize α = 0.5 as a threshold parameter to extract the seeds and β = 0.3 as a threshold parameter to discretize a soft mask to a final mask. The settings required for training and evaluating Mask2Former, including initialization, data augmentation, batch size, weight decay, and learning rate, are configured according to the original paper.