reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Boosting Segment Anything Model Towards Open-Vocabulary Learning

Authors: Xumeng Han, Longhui Wei, Xuehui Yu, Zhiyang Dou, Xin He, Kuiran Wang, Yingfei Sun, Zhenjun Han, Qi Tian

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We follow the GLIP (Li et al. 2022a) protocol and conduct experiments to comprehensively evaluate the effectiveness of Sambor in open-vocabulary object detection. Benefiting from the effective designs, Sambor demonstrates superior open-vocabulary detection performance on COCO (Lin et al. 2014) and LVIS (Gupta, Dollar, and Girshick 2019) benchmarks.
Researcher Affiliation	Collaboration	1University of Chinese Academy of Sciences 2Huawei Inc. EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology in text and illustrates it with architectural diagrams (Figure 1 and Figure 2), but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states that "MMDetection (Chen et al. 2019) code-base is used." but does not provide any explicit statement about releasing the code for the methodology described in this paper or a link to a code repository.
Open Datasets	Yes	For object detection, we use the Objects365 (Shao et al. 2019) dataset (referred to as O365), comprising 365 categories. For phrase grounding, we use the Gold G (Kamath et al. 2021) dataset... COCO Benchmark (Lin et al. 2014)... LVIS Benchmark (Gupta, Dollar, and Girshick 2019)
Dataset Splits	Yes	COCO Benchmark (Lin et al. 2014), comprising 80 common object categories... LVIS Benchmark (Gupta, Dollar, and Girshick 2019) contains 1,203 categories... We report the Fixed AP (Dave et al. 2021) on both the Mini Val (Kamath et al. 2021) subset, comprising 5,000 images, and the complete validation set v1.0... fine-tune for 1 epoch on approximately one-fifth of the O365 dataset.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for conducting the experiments.
Software Dependencies	No	The paper mentions using the "MMDetection (Chen et al. 2019) code-base" and several models like "SAM with Vi T-B (Dosovitskiy et al. 2020)" and "CLIP with RN50 64 (He et al. 2016)", and optimizers like "Adam W (Loshchilov and Hutter 2019)". However, it does not specify version numbers for MMDetection or any other software libraries or frameworks used.
Experiment Setup	Yes	We pre-train our models using SAM with Vi T-B (Dosovitskiy et al. 2020) as the backbone and CLIP with RN50 64 (He et al. 2016), using a batch size of 64. We select Adam W (Loshchilov and Hutter 2019) optimizer with a 0.05 weight decay, an initial learning rate 4 10 4, and a cosine annealing learning rate decay. The default training schedule is 12 epochs. The input image size is 1,024 1,024 with standard scale jittering (Ghiasi et al. 2021). ... we use a 32 32 grid of points to fine-tune for 1 epoch on approximately one-fifth of the O365 dataset. Maintaining all other hyper-parameters constant, employing a reduced learning rate of 4 10 5 contributes to the efficacy of fine-tuning.