Multi-Task Dense Predictions via Unleashing the Power of Diffusion

Authors: Yuqi Yang, Peng-Tao Jiang, Qibin Hou, Hao Zhang, Jinwei Chen, Bo Li

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments on PASCAL-Context and NYUD-v2. Experiments show that our method outperforms the previous state-of-the-art methods on all the tasks. ... We present the quantitative comparisons between the proposed method and previous state-of-the-art methods. Our method performs clearly better than most of the previous methods for all tasks on both PASCAL-Context and NYUDv2. ... We ablate on the effectiveness of different components of our joint denoising diffusion process. The results are shown in Tab. 3.
Researcher Affiliation Collaboration Yuqi Yang1,2 , Peng-Tao Jiang2 , Qibin Hou1 , Hao Zhang2, Jinwei Chen2, Bo Li2 1VCIP, School of Computer Science, Nankai University 2vivo Mobile Communication Co., Ltd EMAIL
Pseudocode Yes Algorithm 1 Task Diffusion training def train(images, masks_gts): ... Algorithm 2 Task Diffusion inference def infer(images, steps): ...
Open Source Code Yes Our code is available at https://github.com/Yuqi Yang213/Task Diffusion.
Open Datasets Yes We conduct experiments on two public multi-task datasets, including PASCAL-Context (Chen et al., 2014) and NYUD-v2 (Silberman et al., 2012).
Dataset Splits Yes The PASCAL-Context dataset contains 4,998 training images, 5,105 test images and annotations of five dense prediction tasks... The NYUD-v2 dataset contains 795 training images, 654 test images, and annotations of four dense prediction tasks...
Hardware Specification Yes All the experiments are trained with 2 NVIDIA V100 GPUs for 40000 iterations.
Software Dependencies No The paper mentions using Vi T-large and Vi T-base as backbones and various loss functions (l1 loss, cross-entropy loss), but it does not specify version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages used for implementation.
Experiment Setup Yes The batch size is set to 4 and all experiments are trained for 40000 iterations. The initial learning rate is set to 2e-5 for PASCAL-Context and 1e-5 for NYUD-v2. The weight decay is set as 1e-6 for both datasets. We use a polynomial learning rate scheduler following the previous method (Ye & Xu, 2022b). For the tasks (e.g. depth estimation and surface normal prediction) which have continuous labels, we use the l1 loss. We use the cross-entropy loss for the other tasks with discrete labels (e.g., semantic segmentation, human parsing, saliency object detection, and boundary). To balance the training losses for different tasks, we follow the previous work (Ye & Xu, 2022b) to set the loss weights.