High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity
Authors: Qian Yu, Peng-Tao Jiang, Hao Zhang, Jinwei Chen, Bo Li, Lihe Zhang, Huchuan Lu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the DIS5K dataset demonstrate the superiority of Diff DIS, achieving state-of-the-art results through a streamlined inference process. The source code will be publicly available at Diff DIS. In this section, we analyze the effects of each component and evaluate the impact of various pretrained parameters and denoising paradigms on experimental accuracy. All results are tested on the DIS-VD dataset. Table 2: Quantitative comparison of DIS5K with 11 representative methods. Table 3: Ablation experiments of components. Table 4: Ablation experiments of the pre-trained parameters and denoising steps. Figure 6: Visual comparison of different DIS methods. |
| Researcher Affiliation | Collaboration | Qian Yu1,2 Peng-Tao Jiang2 Hao Zhang2 Jinwei Chen2 Bo Li2 Lihe Zhang1 Huchuan Lu1 1Dalian University of Technology 2vivo Mobile Communication Co., Ltd {ms.yuqian}@mail.dlut.edu.cn, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Training Process Input: cond: conditional image latent, m: mask latent, e: edge latent, dlab: discriminative labels while not converged do t = T ϵ N(0, I) demb = BDE(dlab) mt = αtm + 1 αtϵ et = αte + 1 αtϵ ϵpredm, ϵprede = ϵθ(mt, et, cond, t, demb) lpredm = mt 1 αt ϵpredm / αt lprede = et 1 αt ϵprede / αt Perform Gradient descent steps on θLtotal(θ) end while return θ |
| Open Source Code | No | The source code will be publicly available at Diff DIS. |
| Open Datasets | Yes | Experiments on the DIS5K dataset demonstrate the superiority of Diff DIS... Similar to previous works (Yu et al., 2024; Kim et al., 2022), we conducted training on the DIS5K training dataset, which consists of 3,000 images spanning 225 categories. Validation and testing were performed on the DIS5K validation and test datasets, referred to as DIS-VD and DIS-TE, respectively. ... Xuebin Qin, Hang Dai, Xiaobin Hu, Deng-Ping Fan, Ling Shao, and Luc Van Gool. Highly accurate dichotomous image segmentation. In European Conference on Computer Vision, pp. 38 56. Springer, 2022. |
| Dataset Splits | Yes | Validation and testing were performed on the DIS5K validation and test datasets, referred to as DIS-VD and DIS-TE, respectively. The DIS-TE dataset is further divided into four subsets (DIS-TE1, DIS-TE2, DIS-TE3, DIS-TE4), each containing 500 images with progressively more complex morphological structures. |
| Hardware Specification | Yes | Experiments were implemented in Py Torch and conducted on a single NVIDIA H800 GPU. |
| Software Dependencies | No | Experiments were implemented in Py Torch and conducted on a single NVIDIA H800 GPU. |
| Experiment Setup | Yes | During training, the original images were resized to 1024 × 1024 for training. We use SD V2.1 (Rombach et al., 2022) as our backbone, and initialize the model with the parameters from SD-Turbo (Sauer et al., 2023). For optimization, we use the Adam optimizer, setting the initial learning rate to 3 × 10^-5. The batch size is configured as 4. The maximum number of training epochs was set to 90. |