Cascaded Diffusion Models for Virtual Try-On: Improving Control and Resolution
Authors: Guangyuan Li, Yongkang Wang, Junsheng Luan, Lei Zhao, Wei Xing, Huaizhong Lin, Binkai Ou
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that our method outperforms previous approaches in preserving garment details and generating authentic virtual try-on images, both qualitatively and quantitatively. |
| Researcher Affiliation | Collaboration | 1College of Computer Science and Technology, Zhejiang University 2 Innovation Research & Development, Board Ware Information System Limited |
| Pseudocode | No | The paper describes the model architecture and methodology in detail using text and diagrams (e.g., Fig. 2), but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, nor does it provide any links to code repositories. |
| Open Datasets | Yes | We employ two publicly available datasets, Dress Code (Morelli et al. 2022) and VITON-HD (Choi et al. 2021), to evaluate the virtual try-on task. Both datasets consist of paired images of garments and their corresponding human models wearing the garments. |
| Dataset Splits | No | The paper mentions testing experiments are conducted under 'paired' and 'unpaired' settings but does not provide specific numerical splits (e.g., percentages or counts) for training, validation, or testing datasets. |
| Hardware Specification | Yes | Specifically, MC-DM is conducted using two NVIDIA A6000 (48GB) GPUs... SR-DM is trained on two NVIDIA A100 (80GB) GPUs |
| Software Dependencies | No | The paper mentions using the Adam W optimizer but does not specify version numbers for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | Specifically, MC-DM is conducted using two NVIDIA A6000 (48GB) GPUs with image resolutions of 512 384. We use the Adam W optimizer with a learning rate set to 2e-5. SR-DM is trained on two NVIDIA A100 (80GB) GPUs, employing the Adam W optimizer with a learning rate of 5e-5. In MC-DM, we use Paintby-Example (Yang et al. 2023) as the frozen pre-trained diffusion model. In SR-DM, we use IRControl Net (Lin et al. 2024) as the frozen pre-trained diffusion model. |