reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Class Distribution-induced Attention Map for Open-vocabulary Semantic Segmentations

Authors: Dong Un Kang, Hayeon Kim, Se Young Chun

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our CDAM method on the three widely used benchmark datasets that include a background class: PASCAL VOC (Everingham et al., 2010), PASCAL Context (Mottaghi et al., 2014) and COCO-Object (Lin et al., 2014). All three datasets include a background class, which is separate from the foreground classes. These datasets have 20, 59, and 80 foreground classes, respectively. The validation sets contain 1449, 5105, and 5000 images, respectively. We also use three additional benchmark datasets that do not include a background class: City Scapes (Cordts et al., 2016), ADE20K (Zhou et al., 2017), and COCO-Stuff (Lin et al., 2014), which have 19, 150, and 171 classes, respectively.
Researcher Affiliation	Academia	Dong Un Kang1, Hayeon Kim1, Se Young Chun1,2, 1Department of ECE, 2INMC & IPAI, Seoul National University EMAIL
Pseudocode	No	The paper describes the methodology using textual explanations, mathematical formulations, and diagrams (e.g., Figure 1 for the overall pipeline). However, it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured, code-like steps.
Open Source Code	Yes	Code is available at https://janeyeon.github.io/cdamclip.
Open Datasets	Yes	We evaluate our CDAM method on the three widely used benchmark datasets that include a background class: PASCAL VOC (Everingham et al., 2010), PASCAL Context (Mottaghi et al., 2014) and COCO-Object (Lin et al., 2014). We also use three additional benchmark datasets that do not include a background class: City Scapes (Cordts et al., 2016), ADE20K (Zhou et al., 2017), and COCO-Stuff (Lin et al., 2014).
Dataset Splits	Yes	The validation sets contain 1449, 5105, and 5000 images, respectively. We follow the unified evaluation protocol by TCL (Cha et al., 2023) in open-vocabulary semantic segmentation. This protocol ensures no access to target data before evaluation.
Hardware Specification	Yes	All measurements were performed on an NVIDIA A100 GPU.
Software Dependencies	No	The paper mentions using "CLIP Vi T/B-16 model from Open CLIP (Radford et al., 2021)" but does not specify software dependencies like programming language versions (e.g., Python 3.x) or library versions (e.g., PyTorch 1.x) with version numbers.
Experiment Setup	Yes	The input image is resized to 224 x 224 pixels, and the patch size is set to 16 x 16 pixels. Following the experimental settings of Group Vi T (Xu et al., 2022a), we resize input images to have the shorter side of 448 pixels and employ the mean Intersection-over-Union (m Io U) metric, which is generally used for evaluating semantic segmentation performance. The temperature τ and the modulation of entropy α are set to 0.1 and 2.5, respectively. The set of scaling factor M is {0.25, 0.37, 0.5, 0.63, 0.75, 0.87, 1.0}.