reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Simple Framework for Open-Vocabulary Zero-Shot Segmentation

Authors: Thomas Stegmüller, Tim Lebailly, Nikola Đukić, Behzad Bozorgtabar, Tinne Tuytelaars, Jean-Philippe Thiran

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 EXPERIMENTS In this section, we investigate the properties of Sim ZSS through various experiments. Additional experiments can be found in Appendix A. 4.1 EXPERIMENTAL SETUP 4.2 ZERO-SHOT SEGMENTATION OF FOREGROUND 4.3 ZERO-SHOT SEGMENTATION 4.4 ZERO-SHOT CLASSIFICATION
Researcher Affiliation	Academia	Thomas Stegmüller1 Tim Lebailly2 Nikola Ðuki c2 Behzad Bozorgtabar1,3 Tinne Tuytelaars2 Jean-Philippe Thiran1,3 1EPFL 2KU Leuven 3CHUV 1{firstname}.{lastname}@epfl.ch 2{firstname}.{lastname}@esat.kuleuven.be
Pseudocode	No	The paper describes its methodology through textual explanations and mathematical equations (1) to (10), but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor does it present structured, code-like formatted steps for its procedures.
Open Source Code	Yes	Our code and pretrained models are publicly available at https://github.com/tileb1/simzss.
Open Datasets	Yes	4.1.1 PRETRAINING DATASETS We train our models on two distinct datasets: COCO Captions (Lin et al., 2014; Chen et al., 2015) and LAION-400M (Schuhmann et al., 2021). ... We report the m Io U scores across five standard datasets, namely Pascal VOC (Everingham et al., 2012), Pascal Context (Mottaghi et al., 2014), COCO-Stuff (Caesar et al., 2018), Cityscapes (Cordts et al., 2016) and ADE20K (Zhou et al., 2017). ... All datasets used in our work are publicly available.
Dataset Splits	Yes	We follow the MMSegmentation (Contributors, 2020) implementation of Cha et al. (2023). ... Pascal VOC 2012 The Pascal VOC dataset (Everingham et al., 2010) contains 20 classes with semantic segmentation annotations. The training set consists of 1,464 images, while the validation set includes 1,449 images. ... Pascal Context ... The training set contains approximately 4,998 images, while the validation set includes around 5,105 images. ... COCO-Stuff ... It includes over 164K images for training and 20K images for validation... ADE20K ... The training set includes 20,210 images, and the validation set consists of 2,000 images. Cityscapes ... The training set includes 2,975 images, and the validation set includes 500 images...
Hardware Specification	Yes	Experiments are conducted on a single node with 4x AMD MI250x GPUs (2 compute dies per GPU, i.e., worldsize = 8) with a memory usage of 38GB per compute die.
Software Dependencies	No	We use the en_core_web_trf model from Spa Cy (Honnibal et al., 2020) as part-of-speech tagger to identify noun phrases. ... We follow the MMSegmentation (Contributors, 2020) implementation of Cha et al. (2023). The paper mentions specific software components like SpaCy and MMSegmentation, including a specific SpaCy model ('en_core_web_trf'), but it does not provide explicit version numbers for these software dependencies as required for reproducibility.
Experiment Setup	Yes	For COCO Captions, we conduct training over 4M processed samples ( 6.6 epochs) using a global batchsize of 16,384. We incorporate a warm-up strategy spanning 10% of the training steps, linearly ramping up the learning rate until it reaches its peak value, chosen from the set {8e-5, 3e-5, 8e-6, 3e-6}1. Subsequently, we employ a cosine decay schedule for the remaining steps. Similarly, for LAION-400M, we train for 1 epoch with a global batchsize of 32,768, and we set the learning rate from the options {3e-5, 8e-6, 3e-6, 8e-7, 3e-7}. ... The overall objective of Sim ZSS, denoted as Ltot, is a weighted sum of the global and local consistency objectives: Ltot = Lg + λLl (10) where λ is a weighting parameter whose impact is ablated in Table 9. ... where τ is a temperature parameter that regulates the sharpness of the similarity distribution (see Tab. 9). ... After a grid search on λ, τ, and the learning rate, we find that the best-performing setting for training on COCO Captions is (λ = 0.05, τ = 0.1).