reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

OOTDiffusion: Outfitting Fusion Based Latent Diffusion for Controllable Virtual Try-On

Authors: Yuhao Xu, Tao Gu, Weifeng Chen, Arlene Chen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our comprehensive experiments on the VITON-HD and Dress Code datasets demonstrate that OOTDiffusion efficiently generates high-quality try-on results for arbitrary human and garment images, which outperforms other VTON methods in both realism and controllability, indicating a breakthrough in virtual try-on. ... We train our OOTDiffusion on two broadly-used high-resolution benchmark datasets, i.e., VITON-HD (Choi et al. 2021) and Dress Code (Morelli et al. 2022), respectively. Extensive qualitative and quantitative evaluations demonstrate our superiority over the state-of-the-art VTON methods in both realism and controllability for various target human and garment images (see Figure 1), implying an impressive breakthrough in image-based virtual try-on.
Researcher Affiliation	Industry	Yuhao Xu, Tao Gu, Weifeng Chen, Arlene Chen Xiao-i Research EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology using text and diagrams (Figure 2), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/levihsu/OOTDiffusion
Open Datasets	Yes	Our experiments are performed on two high-resolution (1024 x 768) virtual try-on datasets, i.e., VITON-HD (Choi et al. 2021) and Dress Code (Morelli et al. 2022).
Dataset Splits	Yes	The VITON-HD dataset consists of 13,679 image pairs of frontal half-body models and corresponding upper-body garments, where 2032 pairs are used as the test set. The Dress Code dataset consists of 15,363/8,951/29,478 image pairs of full-body models and corresponding upper-body garments/lower-body garments/dresses, where 1,800 pairs for each garment category are used as the test set.
Hardware Specification	Yes	All the models are trained for 36,000 iterations on a single NVIDIA A100 GPU, with a batch size of 64 for the 512 x 384 resolution and 16 for the 1024 x 768 resolution. At inference time, we run our OOTDiffusion on a single NVIDIA RTX 4090 GPU for 20 sampling steps using the Uni PC sampler (Zhao et al. 2024).
Software Dependencies	No	The paper mentions "Stable Diffusion v1.5 (Rombach et al. 2022)" as inherited pretrained weights, and specific optimizers and samplers (Adam W optimizer, Uni PC sampler), but does not provide specific version numbers for underlying software libraries like Python, PyTorch, or CUDA.
Experiment Setup	Yes	In our experiments, we initialize the OOTDiffusion models by inheriting the pretrained weights of Stable Diffusion v1.5 (Rombach et al. 2022). Then we finetune the outfitting and denoising UNets using an Adam W optimizer (Loshchilov and Hutter 2018) with a fixed learning rate of 5e-5. ... All the models are trained for 36,000 iterations on a single NVIDIA A100 GPU, with a batch size of 64 for the 512 x 384 resolution and 16 for the 1024 x 768 resolution. ... And the optimal value of the guidance scale sg is usually around 1.5 - 2.0 according to our ablation study. ... we empirically set sg = 1.5 for the VITON-HD dataset (Choi et al. 2021) and sg = 2.0 for the Dress Code dataset (Morelli et al. 2022) in the following experiments.