reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Revisit the Open Nature of Open Vocabulary Semantic Segmentation

Authors: Qiming Huang, Han Hu, Jianbo Jiao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental evaluations show that the proposed mask-wise protocol provides a more effective and reliable evaluation framework for OVS models compared to the previous pixel-wise approach on the perspective of open-world. Moreover, analysis of mismatched mask pairs reveals that a large amount of ambiguous categories exist in commonly used OVS datasets. Interestingly, we find that reducing these ambiguities during both training and inference enhances capabilities of OVS models. These findings and the new evaluation protocol encourage further exploration of the open nature of OVS, as well as broader open-world challenges. Project page: https://qiming-huang.github.io/Revisit OVS/.
Researcher Affiliation	Academia	Qiming Huang, Han Hu, Jianbo Jiao The MIx Group, School of Computer Science University of Birmingham {qxh366, hxh864}.EMAIL, EMAIL
Pseudocode	Yes	A.1 PSEUDOCODE OF THE MASK-WISE EVALUATION PROTOCOL Algorithm 1 Mask-Wise Evaluation Protocol
Open Source Code	No	Project page: https://qiming-huang.github.io/Revisit OVS/. The paper provides a project page URL, but it does not explicitly state that the code for the methodology is released or provide a direct link to a code repository within the text.
Open Datasets	Yes	Following previous OVS works (Cho et al., 2023; Xie et al., 2023; Xu et al., 2023), we train the models on the COCO-Stuff171 (Caesar et al., 2018) dataset with 171 categories and perform zero-shot evaluation on ADE20K (Zhou et al., 2019) and PASCAL-Context (Mottaghi et al., 2014) datasets.
Dataset Splits	Yes	Following previous OVS works (Cho et al., 2023; Xie et al., 2023; Xu et al., 2023), we train the models on the COCO-Stuff171 (Caesar et al., 2018) dataset with 171 categories and perform zero-shot evaluation on ADE20K (Zhou et al., 2019) and PASCAL-Context (Mottaghi et al., 2014) datasets.
Hardware Specification	Yes	The experiments were conducted on a NVIDIA A100 GPU.
Software Dependencies	No	The paper mentions various models and frameworks (e.g., CLIP, FCNs, Transformer-based, LSeg, Open Seg, CAT-Seg, Mask CLIP, SED, MAFT+) but does not provide specific version numbers for any programming languages, libraries, or software environments used in the experiments.
Experiment Setup	Yes	The threshold ˆτ is set to 0.8.