Revisit the Open Nature of Open Vocabulary Semantic Segmentation

Authors: Qiming Huang, Han Hu, Jianbo Jiao

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental evaluations show that the proposed mask-wise protocol provides a more effective and reliable evaluation framework for OVS models compared to the previous pixel-wise approach on the perspective of open-world. Moreover, analysis of mismatched mask pairs reveals that a large amount of ambiguous categories exist in commonly used OVS datasets. Interestingly, we find that reducing these ambiguities during both training and inference enhances capabilities of OVS models. These findings and the new evaluation protocol encourage further exploration of the open nature of OVS, as well as broader open-world challenges. Project page: https://qiming-huang.github.io/Revisit OVS/.
Researcher Affiliation Academia Qiming Huang, Han Hu, Jianbo Jiao The MIx Group, School of Computer Science University of Birmingham {qxh366, hxh864}.EMAIL, EMAIL
Pseudocode Yes A.1 PSEUDOCODE OF THE MASK-WISE EVALUATION PROTOCOL Algorithm 1 Mask-Wise Evaluation Protocol
Open Source Code No Project page: https://qiming-huang.github.io/Revisit OVS/. The paper provides a project page URL, but it does not explicitly state that the code for the methodology is released or provide a direct link to a code repository within the text.
Open Datasets Yes Following previous OVS works (Cho et al., 2023; Xie et al., 2023; Xu et al., 2023), we train the models on the COCO-Stuff171 (Caesar et al., 2018) dataset with 171 categories and perform zero-shot evaluation on ADE20K (Zhou et al., 2019) and PASCAL-Context (Mottaghi et al., 2014) datasets.
Dataset Splits Yes Following previous OVS works (Cho et al., 2023; Xie et al., 2023; Xu et al., 2023), we train the models on the COCO-Stuff171 (Caesar et al., 2018) dataset with 171 categories and perform zero-shot evaluation on ADE20K (Zhou et al., 2019) and PASCAL-Context (Mottaghi et al., 2014) datasets.
Hardware Specification Yes The experiments were conducted on a NVIDIA A100 GPU.
Software Dependencies No The paper mentions various models and frameworks (e.g., CLIP, FCNs, Transformer-based, LSeg, Open Seg, CAT-Seg, Mask CLIP, SED, MAFT+) but does not provide specific version numbers for any programming languages, libraries, or software environments used in the experiments.
Experiment Setup Yes The threshold ˆτ is set to 0.8.