MV-VTON: Multi-View Virtual Try-On with Diffusion Models

Authors: Haoyu Wang, Zhilu Zhang, Donglin Di, Shiliang Zhang, Wangmeng Zuo

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that the proposed method not only achieves state-of-the-art results on MV-VTON task using our MVG dataset, but also has superiority on frontal-view virtual try-on task using VITON-HD and Dress Code datasets. Quantitative Evaluation Table 1 reports the quantitative results on the paired setting, and Table 2 shows the unpaired setting s results.
Researcher Affiliation Collaboration Haoyu Wang1,2,*, Zhilu Zhang2, Donglin Di3, Shiliang Zhang1, Wangmeng Zuo2 1State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University 2Harbin Institute of Technology 3Space AI, Li Auto EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the proposed method using descriptive text and diagrams (e.g., Figure 3: 'Overview of MV-VTON'), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/hywang2002/MV-VTON
Open Datasets Yes We conduct extensive experiments not only on MV-VTON task using the MVG dataset, but also on the frontal-view VTON task using VITON-HD (Lee et al. 2022) and Dress Code (Morelli et al. 2022) datasets.
Dataset Splits No The paper mentions 'For paired test setting' and 'For unpaired test setting', and states 'We follow previous work for the use of them' regarding VITON-HD and Dress Code datasets, but it does not explicitly provide specific percentages or counts for training, validation, and test splits within the paper for any dataset.
Hardware Specification Yes We train our model on 2 NVIDIA Tesla A100 GPUs for 40 epochs with a batch size of 4 and a learning rate of 1e-5.
Software Dependencies No The paper mentions using Adam W optimizer and Paint by Example as a backbone, but does not provide specific version numbers for any software libraries, programming languages, or frameworks (e.g., PyTorch, Python, CUDA versions) used for implementation.
Experiment Setup Yes The hyper-parameter λ1 is set to 1e-1, and λperc is set to 1e-4. We train our model on 2 NVIDIA Tesla A100 GPUs for 40 epochs with a batch size of 4 and a learning rate of 1e-5. We use Adam W (Loshchilov and Hutter 2017) optimizer with β1 = 0.9, β2 = 0.999.