Explore In-Context Segmentation via Latent Diffusion Models
Authors: Chaoyang Wang, Xiangtai Li, Henghui Ding, Lu Qi, Jiangning Zhang, Yunhai Tong, Chen Change Loy, Shuicheng Yan
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments validate the effectiveness of our approach, demonstrating comparable or even stronger results than previous specialist or visual foundation models. We conduct extensive ablation studies and compare our method with previous works to demonstrate its effectiveness. |
| Researcher Affiliation | Collaboration | Chaoyang Wang1, Xiangtai Li2,3*, Henghui Ding4, Lu Qi5, Jiangning Zhang6, Yunhai Tong1, Chen Change Loy3, Shuicheng Yan2,3 1School of Intelligence Science and Technology, Peking University, China 2Skywork AI, Singapore 3Nanyang Technological University, Singapore 4Institute of Big Data, Fudan University, China 5Wuhan University, China 6Zhejiang University, China |
| Pseudocode | No | The paper describes methods with equations and text, but no structured pseudocode or algorithm blocks are provided. |
| Open Source Code | No | Project page https://wang-chaoyang.github.io/project/refldmseg is provided, but it is a general project page and does not explicitly state that the source code for the methodology described in the paper is provided or linked there. There is no direct link to a code repository or an unambiguous statement about code release. |
| Open Datasets | Yes | We adopt several popular datasets as part of our benchmark (Tab. 2), including PASCAL (Everingham et al. 2010), COCO (Lin et al. 2014), DAVIS-16 (Perazzi et al. 2016) and VSPW (Miao et al. 2021). |
| Dataset Splits | Yes | Table 2: Details of the combined datasets. We choose two image semantic segmentation (ISS) datasets, one video object segmentation (VOS) dataset, and one video semantic segmentation (VSS) dataset. (N) means the equivalent number of images. The table provides specific numbers for 'Train' and 'Val' splits for each dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions 'SD 1.5 model as the initialization' and 'Alpha CLIP Vi T-L is adopted as the prompt encoder', which are specific models. However, it does not provide details on specific ancillary software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) needed to replicate the experiment. |
| Experiment Setup | Yes | Our model is jointly trained on the combined dataset for 160K iterations with a batch size of 64. We employ an Adam W optimizer... We set the CFG coefficient for query and instructions as 1.5 and 7, respectively. |