Excluding the Impossible for Open Vocabulary Semantic Segmentation
Authors: Shiyuan Zhao, Baodi Liu, Yu Bai, Weifeng Liu, Shuai Shao
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments In this section, we present the experiments to address the four questions: Q1. How does ELSE-Net perform in comparison to current SOTAs? (A1. See Tab. 1) Q2. Is the EXP-Block genuinely model-agnostic, and what impact does it have? (A2. See Tab. 2) Q3. How do different components influence the outcomes? (A3. See Tab. 3 and Fig. 4) Q4. What is the extra consumption of the EXP-Block for the existing method? (A4. See Tab. 4) |
| Researcher Affiliation | Academia | 1 China University of Petroleum (East China) 2Zhejiang Lab EMAIL, EMAIL, baiyu EMAIL |
| Pseudocode | No | The paper describes methods using mathematical formulations and block diagrams (e.g., Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/shishiyuanzhao/ELSE-Net |
| Open Datasets | Yes | We adhere to standard settings (Xu et al. 2023), training our model on the COCOStuff dataset (Caesar, Uijlings, and Ferrari 2018) and evaluating it across several datasets: Pascal VOC (VOC) (Everingham et al. 2015), ADE20K-847 (ADE-847) (Zhou et al. 2017), ADE20K-150 (ADE-150) (Zhou et al. 2017), and Pascal Context-59 (PC-59) (Mottaghi et al. 2014). |
| Dataset Splits | No | The paper mentions using "standard settings" for training and evaluation datasets and cites other papers, but does not explicitly provide the specific percentages or sample counts for dataset splits (e.g., 80/10/10 split) within the text. |
| Hardware Specification | Yes | All experiments were executed on a high-performance system featuring an NVIDIA RTX 4090 GPU with 24GB of memory, paired with a 16 v CPU Intel (R) Xeon (R) Gold 6430 processor. |
| Software Dependencies | No | The paper mentions the use of "Adam W optimizer", "CLIP and CLIPN models", and "Vi T-B/16" as backbone architecture, but does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch version, Python version). |
| Experiment Setup | Yes | All models were trained on the COCO Stuff dataset using the Adam W optimizer, which was configured with an initial learning rate of 1e-4 and a weight decay of 1e-4. For both the CLIP and CLIPN models, the backbone architecture employed was Vi T-B/16, leveraging its strong capabilities in visual feature representation. Additionally, the R2-Adapter was designed with 2 layers and a feature dimension of 2048, ensuring adequate flexibility and capacity for feature adaptation. The adapter weight parameter α was fixed at 0.2 to balance the contribution of the adapter to the overall learning process. In the probability correction procedure, a threshold value of 0.2 was used, enabling effective filtering and adjustment of predictions. The batch size for training was set to 4 to accommodate the computational requirements while ensuring stability during optimization. |