reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation

Authors: Pengfei Chen, Lingxi Xie, xinyue huo, Xuehui Yu, XIAOPENG ZHANG, Yingfei Sun, Zhenjun Han, Qi Tian

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that SAM-CP achieves semantic, instance, and panoptic segmentation in both open and closed domains. In particular, it achieves the state-of-the-art performance in open-vocabulary segmentation. Our research offers a novel and generalized methodology for equipping vision foundation models like SAM with multi-grained semantic perception abilities. Codes are released on github.com/ucas-vg/SAM-CP. (...) We train SAM-CP on COCO Lin & Maire (2014) and ADE20K Zhou et al. (2017) and evaluate it on these two datasets as well as Cityscapes Cordts et al. (2016) for both open-vocabulary and closed-domain segmentation. (...) Extensive experiments demonstrate SAM-CP s ability to cover semantic, instance, and panoptic segmentation with a single model.
Researcher Affiliation	Collaboration	Pengfei Chen1,2 Lingxi Xie2 Xinyue Huo2 Xuehui Yu1 Xiaopeng Zhang2 Yingfei Sun1 Zhenjun Han1 Qi Tian2 1 University of Chinese Academy of Sciences 2 Huawei Inc.
Pseudocode	Yes	Algorithm 1 Affinity Similarity Calculation Input: Query vectors Q, Patch features K, Head number η, Stage number ω. Output: Affinity similarity ˆA. Note: Q RM D, K RN D, where M and N is the number of Q and K. D is the feature dimension, which is a multiple of η. s R1, b0 RD and b1 RD are the learnable scaling factor and bias parameters to initialize the score to 0.01 for the focal loss.
Open Source Code	Yes	Codes are released on github.com/ucas-vg/SAM-CP.
Open Datasets	Yes	The datasets are public datasets; their links are provided here: COCO: https://cocodataset.org/ ADE20K: http://groups.csail.mit.edu/vision/datasets/ADE20K/ Cityscapes: https://www.cityscapes-dataset.com/
Dataset Splits	Yes	We train SAM-CP on the COCO-Panoptic Lin & Maire (2014) and ADE20K Zhou et al. (2017) datasets (...) COCO-Panoptic (the 2017 version) has 118K training and 5K validation images with 80 thing and 53 stuff categories.
Hardware Specification	Yes	We use 8 Tesla-V100 GPUs (4/2 images per GPU) for open-vocabulary/closed-domain experiments. (...) All experiments were carried out using an RTX 4090.
Software Dependencies	Yes	We use the implementation of the MMDetection Chen et al. (2019) (v3.0) library.
Experiment Setup	Yes	We use loss weights of 2.0, 1.0, and 1.0 for Lcls, Lmfl, and Ldice, and metric weights of 2.0, 1.0, 1.0, 1.0, and 1.0 for the cls, mfl, dice, bbox, and giou terms in the Hungarian matching algorithm. The batch size is 32, where we use 8 Tesla-V100 GPUs (4 images per GPU) for the R50 experiments and 32 GPUs (1 image per GPU) for the Swin-L experiments. The learning rate is set to be 10 5 at the beginning and is multiplied by 0.1 after the 8/16/40-th epoch when the total number of epochs is 12/24/50. The threshold τ in Section 3.2.3 is set to be 0.8. The number of patch encoders and unified affinity decoders are both set to be 6.