reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation

Authors: Zhihua Liu, Amrutha Saseendran, Lei Tong, Xilin He, Fariba Yousefi, Nikolay Burlutskiy, Dino Oglic, Tom Diethe, Philip Alexander Teare, Huiyu Zhou, Chen Jin

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed approach is effective, generalizes across different open-set segmentation tasks, and achieves state-of-the-art results of 52.5 (+6.8 relative) m Io U on Pascal Context 59, 67.73 (+25.73 relative) c Io U on g Ref COCO, and 67.4 (+1.1 relative to fine-tuned methods) m Io U on Gran Df, which is the most complex open-set grounded segmentation task in the field.
Researcher Affiliation	Collaboration	1School of Computing and Mathematical Sciences, University of Leicester, UK 2Centre for AI, Data Science & Artificial Intelligence, Bio Pharmaceuticals R&D, Astra Zeneca, Cambridge, UK 3Shenzhen University. Correspondence to: Huiyu Zhou <EMAIL>, Chen Jin <EMAIL>.
Pseudocode	Yes	Algorithm 1 Segment Anyword (pseudo code)
Open Source Code	Yes	Project page, code, and data are available at https://zhihualiued.github.io/segment_anyword
Open Datasets	Yes	We perform extensive experiments on six multi-modal image segmentation datasets, including open-set language grounded segmentation dataset Gran Df (Rasheed et al., 2024), multi object reference image segmentation dataset g Ref COCO (Liu et al., 2023), single object reference image segmentation dataset Ref COCO, Ref COCO+ and Ref COCOg (Kazemzadeh et al., 2014), open-vocabulary semantic segmentation on Pascal Context (Mottaghi et al., 2014).
Dataset Splits	Yes	Gran Df ... comprises 214K image-grounded text pairs, along with 2.5K validation samples and 5K test samples... Ref COCO ... divided into 120,624 training, 10,834 validation, 5,657 test A, and 5,095 test B samples. Ref COCO+ ... with 120,624 training, 10,758 validation, 5,726 test A, and 4,889 test B samples. Ref COCOg ... comprises 104,560 referring expressions for 54,822 objects across 26,711 images... g Ref COCO ... The validation set contains 1,485 images with 5,324 sentences, while test A includes 750 images with 8,825 sentences, and test B consists of 749 images with 5,744 sentences. PASCAL Context ... with 5,100 images in validation set.
Hardware Specification	Yes	Our experiments were executed on a single 40G A100 GPU with a batch size of 8. ... All experiments were conducted on a single NVIDIA A100 40GB GPU.
Software Dependencies	Yes	We choose the fine-tuned version of Vicuna-7B-v1.5 (Zheng et al., 2023) as our large language model (LLM) to parse the text prompt and generate the noun phrases... For the post-processing module, we utilize a frozen SAM with Vi T-H as the promptable mask generator.
Experiment Setup	Yes	The base learning rate for textual embedding was set to 0.005. The hyper-parameters of textual embedding updating remains the same in LDM and MCPL, with the temperature and scaling term (τ, γ) of (0.3, 0.00075). We use BERT (Devlin, 2018) to generate token embeddings. For words included in BERT s pre-trained vocabulary, we directly use their pre-trained embeddings. ... With Lo RA fine-tuned BERT text encoder, Segment Anywordf achieve a fast inference time text domain adaptation, decreasing textual embedding update steps from 1100 to 50...