Excluding the Impossible for Open Vocabulary Semantic Segmentation

Authors: Shiyuan Zhao, Baodi Liu, Yu Bai, Weifeng Liu, Shuai Shao

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments In this section, we present the experiments to address the four questions: Q1. How does ELSE-Net perform in comparison to current SOTAs? (A1. See Tab. 1) Q2. Is the EXP-Block genuinely model-agnostic, and what impact does it have? (A2. See Tab. 2) Q3. How do different components influence the outcomes? (A3. See Tab. 3 and Fig. 4) Q4. What is the extra consumption of the EXP-Block for the existing method? (A4. See Tab. 4)
Researcher Affiliation Academia 1 China University of Petroleum (East China) 2Zhejiang Lab EMAIL, EMAIL, baiyu EMAIL
Pseudocode No The paper describes methods using mathematical formulations and block diagrams (e.g., Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/shishiyuanzhao/ELSE-Net
Open Datasets Yes We adhere to standard settings (Xu et al. 2023), training our model on the COCOStuff dataset (Caesar, Uijlings, and Ferrari 2018) and evaluating it across several datasets: Pascal VOC (VOC) (Everingham et al. 2015), ADE20K-847 (ADE-847) (Zhou et al. 2017), ADE20K-150 (ADE-150) (Zhou et al. 2017), and Pascal Context-59 (PC-59) (Mottaghi et al. 2014).
Dataset Splits No The paper mentions using "standard settings" for training and evaluation datasets and cites other papers, but does not explicitly provide the specific percentages or sample counts for dataset splits (e.g., 80/10/10 split) within the text.
Hardware Specification Yes All experiments were executed on a high-performance system featuring an NVIDIA RTX 4090 GPU with 24GB of memory, paired with a 16 v CPU Intel (R) Xeon (R) Gold 6430 processor.
Software Dependencies No The paper mentions the use of "Adam W optimizer", "CLIP and CLIPN models", and "Vi T-B/16" as backbone architecture, but does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch version, Python version).
Experiment Setup Yes All models were trained on the COCO Stuff dataset using the Adam W optimizer, which was configured with an initial learning rate of 1e-4 and a weight decay of 1e-4. For both the CLIP and CLIPN models, the backbone architecture employed was Vi T-B/16, leveraging its strong capabilities in visual feature representation. Additionally, the R2-Adapter was designed with 2 layers and a feature dimension of 2048, ensuring adequate flexibility and capacity for feature adaptation. The adapter weight parameter α was fixed at 0.2 to balance the contribution of the adapter to the overall learning process. In the probability correction procedure, a threshold value of 0.2 was used, enabling effective filtering and adjustment of predictions. The batch size for training was set to 4 to accommodate the computational requirements while ensuring stability during optimization.