reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?

Authors: Guangkai Xu, yongtao ge, Mingyu Liu, Chengxiang Fan, Kangyang Xie, Zhiyue Zhao, Hao Chen, Chunhua Shen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on a diverse set of dense visual perceptual tasks, including monocular depth estimation, surface normal estimation, image segmentation, and matting, are performed to demonstrate the remarkable adaptability and effectiveness of our proposed method.
Researcher Affiliation	Collaboration	1 Zhejiang University, China 2 Ant Group
Pseudocode	No	The paper describes methods and processes in paragraph text and figures (Figure 1 and 2), but does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks or structured code-like procedures.
Open Source Code	Yes	Code: https://github.com/aim-uofa/Gen Percept
Open Datasets	Yes	The evaluation is performed on five zero-shoft datasets including KITTI (Geiger et al., 2013), NYU (Silberman et al., 2012), Scan Net (Dai et al., 2017), DIODE (Vasiljevic et al., 2019), and ETH3D (Schops et al., 2017). We choose DIS5K (Qin et al., 2022) as the training and testing dataset. For training, we utilized the indoor synthetic dataset, Hyper Sim (Roberts et al., 2021), which comprises 40 semantic segmentation class labels. we test the model s performance on Hypersim (Roberts et al., 2021) and zero-shot ability on a subset of the ADE20k (Zhou et al., 2017) validation set, which contains overlapping classes.
Dataset Splits	Yes	We utilize DIS-TR for training and evaluate our model on DIS-VD and DIS-TE subsets. ... we test the model s performance on Hypersim (Roberts et al., 2021) and zero-shot ability on a subset of the ADE20k (Zhou et al., 2017) validation set, which contains overlapping classes.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for the experiments.
Software Dependencies	No	The paper mentions fine-tuning 'Stable Diffusion v2.1' and using a 'U-Net', but it does not specify programming languages, libraries, or other software components with their version numbers.
Experiment Setup	Yes	Unless specified otherwise, we freeze the VAE Auto Encoder and fine-tune the U-Net of Stable Diffusion v2.1 to estimate the ground-truth label latent for 30000 iterations, with a resolution of (768, 768), a batch size of 32, and a learning rate of 3e 5.