reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Test-time Contrastive Concepts for Open-world Semantic Segmentation with Vision-Language Models

Authors: Monika Wysoczańska, Antonin Vobecky, Amaia Cardiel, Tomasz Trzcinski, Renaud Marlet, Andrei Bursuc, Oriane Siméoni

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct our experiments on six datasets widely used for the task of zero-shot semantic segmentation Cha et al. (2023), fully-annotated COCO-Stuff Caesar et al. (2018), Cityscapes Cordts et al. (2016) and ADE20K Zhou et al. (2019) and object-centric VOC Everingham et al. (2012), COCO-Object Caesar et al. (2018) and Context Mottaghi et al. (2014)... We compare results when using different CC s proposed in this work. We also include results when having access to privileged information (CCP I)... Table 1: Benefits of CC measured in Io U-single. Table 2: m Io U results. Table 3: Ablation studies.
Researcher Affiliation	Collaboration	1Warsaw University of Technology 2valeo.ai 3CIIRC CTU Prague 4FEE CTU Prague 5Tooploox 6LIGM, École des Ponts et Chaussées, IP Paris, CNRS, France 7Université Grenoble Alpes
Pseudocode	Yes	We present a pseudo-code of our metric in Algorithm 1.
Open Source Code	No	The paper mentions using "MMSegmentation implementation Contributors (2020)", "Detectron Wu et al. (2019)", and "Mixtral-8x7B-Instruct model Jiang et al. (2024)" via the "Hugging Face transformers library". These are third-party tools. The paper does not contain an explicit statement or link to the authors' own source code for the methodology described.
Open Datasets	Yes	We conduct our experiments on six datasets widely used for the task of zero-shot semantic segmentation Cha et al. (2023), fully-annotated COCO-Stuff Caesar et al. (2018), Cityscapes Cordts et al. (2016) and ADE20K Zhou et al. (2019) and object-centric VOC Everingham et al. (2012), COCO-Object Caesar et al. (2018) and Context Mottaghi et al. (2014)... For CCD generation, we use the statistics gathered by Udandarao et al. (2024) for four thousand common concepts in the LAION-400M dataset, which is a subset of LAION-2B Schuhmann et al. (2022) and which is used to train CLIP Radford et al. (2021).
Dataset Splits	Yes	We treat the input images following the protocol of Cha et al. (2023), which we detail in Appendix A... We conduct our experiments on six datasets widely used for the task of zero-shot semantic segmentation.
Hardware Specification	Yes	This work was supported by the National Centre of Science (Poland) Grant No. 2022/45/B/ST6/02817 and by the grant from NVIDIA providing one RTX A5000 24GB used for this project... computed on a machine equipped with Intel(R) i7 CPU and a Nvidia RTX A5000 GPU
Software Dependencies	Yes	We use the recent Mixtral-8x7B-Instruct model Jiang et al. (2024)... More precisely, we rely on the v0.1 version of its open weights available via the Hugging Face transformers library. We run the LLM in 4-bit precision with flash attention to speedup inference.
Experiment Setup	Yes	We use MMSegmentation implementation Contributors (2020) with a sliding window strategy and resize input images to have a shorter side of 448. In the case of CAT-Seg, we retain the original model framework and integrate Io U-single into Detectron Wu et al. (2019)... For CCD generation... We filter contrastive concepts using a low co-occurrence threshold γ = 0.01 and a high CLIP similarity threshold δ = 0.8. In the classic m Io U scenario, we use a threshold β = 0.9... We discuss the selection of these values in Appendix C.1.