IMAGDressing-v1: Customizable Virtual Dressing

Authors: Fei Shen, Xin Jiang, Xin He, Hu Ye, Cong Wang, Xiaoyu Du, Zechao Li, Jinhui Tang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that IMAGDressing-v1 achieves state-of-the-art performance in controlled human image synthesis. ... We compare our IMAGDressing-v1 with four state oftheart (SOTA) methods: Blip-Diffusion (Li, Li, and Hoi 2023), Versatile Diffusion (Xu et al. 2023), Versatile Diffusion (Xu et al. 2023), and Magic Clothing (Chen et al. 2024). Quantitative Results. As shown in Table 2... Ablation Studies Effectiveness of each component. Table 3 presents an ablation study...
Researcher Affiliation Collaboration Fei Shen1 , Xin Jiang1 , Xin He2, Hu Ye3, Cong Wang4, Xiaoyu Du1, Zechao Li1, Jinhui Tang1* 1Nanjing University of Science and Technology 2Wuhan University of Technology 3Tencent AI Lab 4Nanjing University
Pseudocode No The paper describes the methodology using textual explanations and mathematical equations, but it does not include any clearly labeled pseudocode blocks or algorithms.
Open Source Code Yes Code https://github.com/muzishen/IMAGDressing
Open Datasets Yes To alleviate data constraints, we introduce the Interactive Garment Pairing (IGPair) dataset, comprising over 300,000 garment image pairs and a standardized data assembly pipeline. ... We collect and release a large-scale interactive garment pairing (IGPair) dataset, containing over 300,000 pairs, available for the community to explore and research.
Dataset Splits No The paper states, "Our model is trained on the paired images from the IGPair dataset at the resolution of 512 640." However, it does not specify any training, validation, or test splits for this dataset, nor does it refer to predefined splits from a cited source.
Hardware Specification Yes The model is trained for 200,000 steps on 10 NVIDIA RTX3090 GPUs with a batch size of 5.
Software Dependencies No The paper mentions using "Stable Diffusion V1.5" as a base model and components like "VAE Encoder", "CLIP image encoder", and "Q-Former", but it does not specify version numbers for any key software libraries (e.g., PyTorch, TensorFlow, CUDA) or programming languages used for implementation.
Experiment Setup Yes We adopt the Adam W optimizer with a fixed learning rate of 5e-5. The model is trained for 200,000 steps on 10 NVIDIA RTX3090 GPUs with a batch size of 5. At the inference stage, the images are generated with the Uni PC sampler for 50 sampling steps and set the guidance scale w to 7.0. ... we empirically set λ to 1.0 in our experiments.