Human Body Restoration with One-Step Diffusion Model and A New Benchmark

Authors: Jue Gong, Jingkai Wang, Zheng Chen, Xing Liu, Hong Gu, Yulun Zhang, Xiaokang Yang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that OSDHuman outperforms existing methods in both visual quality and quantitative metrics. The dataset and code are available at: https: //github.com/gobunu/OSDHuman.
Researcher Affiliation Collaboration 1Shanghai Jiao Tong University, China 2vivo Mobile Communication Co., Ltd, China. Correspondence to: Yulun Zhang <EMAIL>.
Pseudocode No The paper describes the methodology and architecture of OSDHuman, HQ-ACF pipeline, HFIE, and VSD regularizer in detail using explanatory text and diagrams (Figure 3, 4, 5). However, it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The dataset and code are available at: https: //github.com/gobunu/OSDHuman.
Open Datasets Yes Using this pipeline, we constructed a person-based restoration with sophisticated objects and natural activities (PERSONA) dataset, which includes training, validation, and test sets. The dataset and code are available at: https: //github.com/gobunu/OSDHuman. Automated Cropping and Filtering Pipeline. As illustrated in Fig. 3, we first collect a series of commonly used and publicly available large-scale object detection datasets, including COCO (Lin et al., 2014), OID (Kuznetsova et al., 2020; Krasin et al., 2017), Object365 (Shao et al., 2019) and Crowd Human (Shao et al., 2018), comprising approximately 4 million images.
Dataset Splits Yes Using this pipeline, we constructed a person-based restoration with sophisticated objects and natural activities (PERSONA) dataset, which includes training, validation, and test sets. ... which comprises 109,053 HQ 512 512 human images for training. This pipeline also provides images for validation and testing. ... The test data includes PERSONA-Val and PERSONA-Test, both generated by our HQ-ACF pipeline. The HQ images in the validation set are specially selected from those that comply with the pipeline, ensuring that no images in the validation set share sources with the training set. A total of 4,216 images are used, and the degraded LQ images are generated using the same degradation pipeline as during training. The test set is derived from the VOC dataset (Everingham et al., 2010) by performing a partial crop using the HQACF pipeline, followed by sampling under predefined IQA thresholds, yielding 3,000 images with real-world LQ.
Hardware Specification Yes Training is conducted for 35K iterations on 4 NVIDIA A800 GPUs.
Software Dependencies No The paper mentions using the YOLO11 model (Jocher & Qiu, 2024) and the Stable Diffusion v2-1 model (Stability AI, 2022) as components, but it does not specify version numbers for general programming languages or libraries like Python, PyTorch, or CUDA, which are essential for full reproducibility.
Experiment Setup Yes The OSDHuman model is trained by Adam W optimizer (Loshchilov & Hutter, 2019) with a batch size of 16 and 5e-5 learning rate. The Stable Diffusion v2-1 model (Stability AI, 2022) serves as the pretrained OSD model with the timestep frozen to 999, and the prompt embedding is provided by HFIE. The Lo RA (Hu et al., 2022) rank for the VAE encoder, the U-Net of the generator, and the regularizer are all set to 4. The weighting scalars λ1 and λ2 in Eq. 4 are set to 2 and 1, respectively. Training is conducted for 35K iterations on 4 NVIDIA A800 GPUs.