InvSeg: Test-Time Prompt Inversion for Semantic Segmentation

Authors: Jiayi Lin, Jiabo Huang, Jian Hu, Shaogang Gong

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that Inv Seg achieves state-of-the-art performance on the PASCAL VOC, PASCAL Context and COCO Object datasets. ... We benchmark Inv Seg on three segmentation datasets PASCAL VOC 2012 (Everingham et al. 2010), PASCAL Context (Mottaghi et al. 2014) and COCO Object (Lin et al. 2014) for OVSS task.
Researcher Affiliation Collaboration 1Queen Mary University of London 2Sony AI
Pseudocode No The paper describes the methodology in narrative text and figures, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code Yes Project Page https://jylin8100.github.io/Inv Seg Project
Open Datasets Yes We benchmark Inv Seg on three segmentation datasets PASCAL VOC 2012 (Everingham et al. 2010), PASCAL Context (Mottaghi et al. 2014) and COCO Object (Lin et al. 2014), containing 20,59,80 foreground classes, respectively.
Dataset Splits Yes The experiments are performed on the validation sets, including 1449, 5105, and 5000 images.
Hardware Specification Yes We optimize each image for 15 steps on single H100 GPU, with the inference time of around 7.9 seconds per image
Software Dependencies Yes We validate our approach using Stable Diffusion v2.1 (Rombach et al. 2022) with frozen pre-trained parameters. ... To obtain category names, we follow Diff Segmenter (Wang et al. 2023) that use BLIP (Li et al. 2022) and CLIP (Radford et al. 2021) to generate the category names out of all the candidate categories in the dataset.
Experiment Setup Yes We employ the Adam optimizer with a learning rate of 0.01. The weights for LEtrp is α = 1. We optimize each image for 15 steps ... the time step for each iteration is sampled from a range [5, 300] where the model can learn a more robust prompt from different time steps. While the time step for inference is 50, falling in the range during adaption.