INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation

Authors: Jian Hu, Zixu Cheng, Shaogang Gong

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The effectiveness of our INT is validated on six datasets, including camouflaged objects and medical images, showing its robustness and applicability. Experiments on six datasets demonstrate the effectiveness of our method. To assess performance in the first three tasks, we employ the following metrics: Mean Absolute Error (M), adaptive F-measure (Fβ), mean E-measure (Eϕ), and structure measure (Sα).
Researcher Affiliation Academia Jian Hu, Zixu Cheng, Shaogang Gong Queen Mary University of London EMAIL
Pseudocode No The paper describes the methodology using textual descriptions and mathematical equations, but it does not include any clearly labeled pseudocode blocks or algorithm sections.
Open Source Code No The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We tested INT on three benchmark datasets: CHAMELEON [Skurowski et al., 2018], CAMO [Le et al., 2019], and COD10K [Fan et al., 2021a]. We utilized datasets such as CVC-Colon DB [Tajbakhsh et al., 2015] and Kvasir [Jha et al., 2020] for polyp image segmentation, and ISIC [Codella et al., 2019] for skin lesion segmentation.
Dataset Splits Yes The CAMO dataset contains 1,250 images, divided into 1,000 training images and 250 testing images. The COD10K dataset comprises 3,040 training samples and 2,026 testing samples in total.
Hardware Specification Yes Our experiments are conducted on a single NVIDIA A100 GPU, with further details provided in the appendix.
Software Dependencies Yes For the VLM models, we employ LLa VA-1.5-13B for evaluation. For image processing, we use the CS-Vi T-B/16 model pre-trained with CLIP, and for image inpainting module, we deploy stable-diffusion2-inpainting. Our experiments are conducted on a single NVIDIA A100 GPU, with further details provided in the appendix.
Experiment Setup Yes All tasks undergo training-free test-time adaptation, iterating for four epochs, except for the polyp image segmentation task, which extends to six epochs. We utilize the Vi T-H/16 model for promptable segmentation methods. The task-generic prompts for the COD task are specified as camouflaged animal. The MIS task includes two sub-tasks: polyp image segmentation and skin lesion segmentation, each prompted by polyp and skin lesion respectively. where w is a hyperparameter set to 0.3.