reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Excluding the Impossible for Open Vocabulary Semantic Segmentation

Authors: Shiyuan Zhao, Baodi Liu, Yu Bai, Weifeng Liu, Shuai Shao

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments In this section, we present the experiments to address the four questions: Q1. How does ELSE-Net perform in comparison to current SOTAs? (A1. See Tab. 1) Q2. Is the EXP-Block genuinely model-agnostic, and what impact does it have? (A2. See Tab. 2) Q3. How do different components influence the outcomes? (A3. See Tab. 3 and Fig. 4) Q4. What is the extra consumption of the EXP-Block for the existing method? (A4. See Tab. 4)
Researcher Affiliation	Academia	1 China University of Petroleum (East China) 2Zhejiang Lab EMAIL, EMAIL, baiyu EMAIL
Pseudocode	No	The paper describes methods using mathematical formulations and block diagrams (e.g., Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/shishiyuanzhao/ELSE-Net
Open Datasets	Yes	We adhere to standard settings (Xu et al. 2023), training our model on the COCOStuff dataset (Caesar, Uijlings, and Ferrari 2018) and evaluating it across several datasets: Pascal VOC (VOC) (Everingham et al. 2015), ADE20K-847 (ADE-847) (Zhou et al. 2017), ADE20K-150 (ADE-150) (Zhou et al. 2017), and Pascal Context-59 (PC-59) (Mottaghi et al. 2014).
Dataset Splits	No	The paper mentions using "standard settings" for training and evaluation datasets and cites other papers, but does not explicitly provide the specific percentages or sample counts for dataset splits (e.g., 80/10/10 split) within the text.
Hardware Specification	Yes	All experiments were executed on a high-performance system featuring an NVIDIA RTX 4090 GPU with 24GB of memory, paired with a 16 v CPU Intel (R) Xeon (R) Gold 6430 processor.
Software Dependencies	No	The paper mentions the use of "Adam W optimizer", "CLIP and CLIPN models", and "Vi T-B/16" as backbone architecture, but does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch version, Python version).
Experiment Setup	Yes	All models were trained on the COCO Stuff dataset using the Adam W optimizer, which was configured with an initial learning rate of 1e-4 and a weight decay of 1e-4. For both the CLIP and CLIPN models, the backbone architecture employed was Vi T-B/16, leveraging its strong capabilities in visual feature representation. Additionally, the R2-Adapter was designed with 2 layers and a feature dimension of 2048, ensuring adequate flexibility and capacity for feature adaptation. The adapter weight parameter α was fixed at 0.2 to balance the contribution of the adapter to the overall learning process. In the probability correction procedure, a threshold value of 0.2 was used, enabling effective filtering and adjustment of predictions. The batch size for training was set to 4 to accommodate the computational requirements while ensuring stability during optimization.