reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

InvSeg: Test-Time Prompt Inversion for Semantic Segmentation

Authors: Jiayi Lin, Jiabo Huang, Jian Hu, Shaogang Gong

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that Inv Seg achieves state-of-the-art performance on the PASCAL VOC, PASCAL Context and COCO Object datasets. ... We benchmark Inv Seg on three segmentation datasets PASCAL VOC 2012 (Everingham et al. 2010), PASCAL Context (Mottaghi et al. 2014) and COCO Object (Lin et al. 2014) for OVSS task.
Researcher Affiliation	Collaboration	1Queen Mary University of London 2Sony AI
Pseudocode	No	The paper describes the methodology in narrative text and figures, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	Yes	Project Page https://jylin8100.github.io/Inv Seg Project
Open Datasets	Yes	We benchmark Inv Seg on three segmentation datasets PASCAL VOC 2012 (Everingham et al. 2010), PASCAL Context (Mottaghi et al. 2014) and COCO Object (Lin et al. 2014), containing 20,59,80 foreground classes, respectively.
Dataset Splits	Yes	The experiments are performed on the validation sets, including 1449, 5105, and 5000 images.
Hardware Specification	Yes	We optimize each image for 15 steps on single H100 GPU, with the inference time of around 7.9 seconds per image
Software Dependencies	Yes	We validate our approach using Stable Diffusion v2.1 (Rombach et al. 2022) with frozen pre-trained parameters. ... To obtain category names, we follow Diff Segmenter (Wang et al. 2023) that use BLIP (Li et al. 2022) and CLIP (Radford et al. 2021) to generate the category names out of all the candidate categories in the dataset.
Experiment Setup	Yes	We employ the Adam optimizer with a learning rate of 0.01. The weights for LEtrp is α = 1. We optimize each image for 15 steps ... the time step for each iteration is sampled from a range [5, 300] where the model can learn a more robust prompt from different time steps. While the time step for inference is 50, falling in the range during adaption.