reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stable Segment Anything Model

Authors: Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Di ZHANG, Pengfei Wan, Yu-Wing Tai, Chi-Keung Tang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate the effectiveness and advantages of our approach, underscoring Stable-SAM as a more robust solution for segmenting anything. Codes are at https://github.com/fanq15/Stable-SAM. (...) We evaluate the segmentation accuracy and stability of the Vi T-Large based SAM with different prompt types and qualities, including box prompts with added noise (...) The evaluation utilizes four segmentation datasets as in HQ-SAM: DIS (...) (validation set), Thin Object-5K (...) (test set), COIFT (...), and HR-SOD (...). Table 1 tabulates that SAM s segmentation accuracy and stability significantly decrease with low-quality prompts (...). We perform detailed analysis on Stable-SAM on its network modules, model scalability, low-shot generalization, point prompt quality, backbone variants, relation to other methods, and stability visualization.
Researcher Affiliation	Collaboration	1 Nanjing University, 2 Kuaishou Technology, 3 Carnegie Mellon University, 4 EPFL, 5 Dartmouth College, 6 The Hong Kong University of Science and Technology
Pseudocode	No	The paper describes methods verbally and uses mathematical formulations (e.g., Equation 1, 2, 3, 4, 5, 6) but does not present any pseudocode or algorithm blocks.
Open Source Code	Yes	Codes are at https://github.com/fanq15/Stable-SAM.
Open Datasets	Yes	The evaluation utilizes four segmentation datasets as in HQ-SAM: DIS (Qin et al., 2022) (validation set), Thin Object-5K (Liew et al., 2021) (test set), COIFT (Mansilla & Miranda, 2019), and HR-SOD (Zeng et al., 2019). Furthermore, we validate the model s zero-shot generalization ability on three challenging segmentation benchmarks, including COCO (Lin et al., 2014), SGin W (Zou et al., 2023) and MESS (Blumenstiel et al., 2023).
Dataset Splits	Yes	The evaluation utilizes four segmentation datasets as in HQ-SAM: DIS (Qin et al., 2022) (validation set), Thin Object-5K (Liew et al., 2021) (test set), COIFT (Mansilla & Miranda, 2019), and HR-SOD (Zeng et al., 2019). (...) For every input image and prompt type, we randomly select 20 prompts to compute their segmentation stability (...). (...) we train all models on HQSeg-44K dataset, and evaluate their performance on four fine-grained segmentation datasets (...). (...) All models are trained with RTS by 1 training epoch, with 220/440 train images.
Hardware Specification	No	The paper does not mention any specific hardware used for training or inference, such as GPU models, CPU models, or memory specifications, other than general memory usage of the models (e.g., 7.6 G).
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	All our Stable-SAM models are trained by just one epoch for fast adaptation unless otherwise stated. All other models are trained 12 epochs. (...) To address inaccurate prompts, our RTS incorporates prompts of varying qualities during training. These prompts include groundtruth boxes, box prompts with added noise (noise scale 0.4), and point prompts with varying numbers of points (1, 3, 10 positive points randomly chosen from the ground truth mask).