Latent Diffusion-Enhanced Virtual Try-On via Optimized Pseudo-Label Generation

Authors: Chenghu Du, Junyin Wang, Feng Yu, Shengwu Xiong

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extended experiments demonstrate that our proposed method is superior to state-of-the-art methods. Our experiments use VITON-HD (Choi et al. 2021), VITON (Han et al. 2018), which are two challenging datasets in virtual try-on.
Researcher Affiliation Academia 1 School of Computer Science and Artificial Intelligence, Wuhan University of Technology 2 Shanghai Artificial Intelligence Laboratory 3 Interdisciplinary Artificial Intelligence Research Institute, Wuhan College 4 School of Computer Science and Artificial Intelligence, Wuhan Textile University
Pseudocode No The paper describes the proposed method in detail using mathematical formulations and textual descriptions, but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes Our experiments use VITON-HD (Choi et al. 2021), VITON (Han et al. 2018), which are two challenging datasets in virtual try-on.
Dataset Splits Yes The dataset is divided into a training set with 14,221 groups and a testing set with 2,032 groups. ... It includes 13,679 image groups and is split into a training set with 11,647 groups and a testing set with 2,032 groups.
Hardware Specification Yes During the training process, we utilize two NVIDIA RTX 4090 GPUs for a duration of 2 days.
Software Dependencies Yes We employ Stable Diffusion v1.4 (Rombach et al. 2022) as the backbone for our architecture and initialize its denoising U-Net with the weights from the U-Net in Pb E (Yang et al. 2023).
Experiment Setup Yes The Adam W optimizer (Loshchilov and Hutter 2017) is employed with a learning rate of 1 10 4, and the batch size is set to 2 for training over 40 epochs. For inference, we adopt the pseudo linear multi-step (PLMS) sampling method (Liu et al. 2021b), setting the number of sampling steps to 50. The hyperparameter in the loss function is set as follows: λ = 1 10 4.