reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unveiling the Knowledge of CLIP for Training-Free Open-Vocabulary Semantic Segmentation

Authors: Yajie Liu, Guodong Wang, Jinjin Zhang, Qingjie Liu, Di Huang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conducted on 9 segmentation benchmarks with various CLIP models demonstrate that CLIPSeg consistently outperforms all training-free methods by substantial margins, e.g., a 7.8% improvement in average m Io U for CLIP with a Vi T-L backbone, and competes with learning-based counterparts in generalizing to novel concepts in an efficient way. Ablation studies in this section, we conduct ablation studies to evaluate the effects of core components of the proposed method.
Researcher Affiliation	Academia	1 State Key Laboratory of Complex and Critical Software Environment, Beihang University, Beijing 100191, China 2 School of Computer Science and Engineering, Beihang University, Beijing 100191, China EMAIL
Pseudocode	No	The paper describes the Coherence enhanced Residual Attention (CRA) and Deep Semantic Integration (DSI) modules using mathematical formulas and descriptive text (e.g., Eq. 3, Eq. 4) but does not include any formal pseudocode blocks or algorithms.
Open Source Code	No	The paper does not contain an explicit statement about releasing its own source code, nor does it provide a direct link to a code repository. It mentions generating results for other methods using 'their official released code', but not for its own proposed method.
Open Datasets	Yes	We employ the COCO-Stuff dataset to evaluate the intra-image feature coherence. We follow the widely-used evaluation protocol, as introduced in TCL (Cha, Mun, and Roh 2023) to evaluate our method across 9 segmentation benchmarks in a zero-shot manner. These benchmarks are categorized into two groups: (i) without background class including Pascal VOC20 (Everingham et al. 2010) with 20 classes (denoted as V20), Pascal Context (Mottaghi et al. 2014) with 459 classes in the full version (C459) and the most frequent 59 classes in the C59 version, COCO-Stuff (Caesar, Uijlings, and Ferrari 2018) with 171 classes (STUFF), ADE20k (Zhou et al. 2019) with 847 classes in the full version (A847) and A150 version with the most frequent 150 classes and Cityscapes (CITY) (Cordts et al. 2016) with 19 classes. (ii) with a background class including Pascal Context 60 (C60) and COCO object with 80 classes (COCO).
Dataset Splits	Yes	We follow the widely-used evaluation protocol, as introduced in TCL (Cha, Mun, and Roh 2023) to evaluate our method across 9 segmentation benchmarks in a zero-shot manner.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions using various CLIP models with ViT-B and ViT-L backbones, which are model architectures, not hardware.
Software Dependencies	No	The paper mentions applying the framework to '8 widely-used CLIP models, namely CLIP, Open CLIP, Meta CLIP, with both Vi TB (denoted as -B) and Vi T-L (-L) backbones' but does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers.
Experiment Setup	Yes	The cosine distance threshold ϵ is empirically set to 0.75. The temperature in Eq. 3 is set to 6 across all models and datasets. If not specified, we ensemble the dense output of last three layers to construct V.