GarmentDiffusion: 3D Garment Sewing Pattern Generation with Multimodal Diffusion Transformers
Authors: Xinyu Li, Qi Yao, Yuanda Wang
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We achieve new state-of-the-art results on Dress Code Data, as well as on the largest sewing pattern dataset, namely Garment Code Data. The project website is available at https: //shenfu-research.github.io/Garment-Diffusion/. 4 Experiments 4.1 Datasets 4.2 Multimodal Data Synthesis 4.3 Evaluation Metrics 4.4 Implemention Details 4.5 Comparison with State-of-the-Art Methods 4.6 Ablation Study |
| Researcher Affiliation | Collaboration | Xinyu Li1,2 , Qi Yao2 , Yuanda Wang2 1Zhejiang University 2Shenfu Research EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology in prose and mathematical formulations but does not include any clearly labeled pseudocode or algorithm blocks. For example, Section 3 'Method' details the sewing pattern representation and generation process, but without a structured pseudocode format. |
| Open Source Code | No | The project website is available at https: //shenfu-research.github.io/Garment-Diffusion/. This is a link to a project website/overview page, not an explicit statement of code release or a direct link to a code repository. |
| Open Datasets | Yes | We use Sew Factory [Liu et al., 2023], Dress Code Data [Korosteleva and Lee, 2021; He et al., 2024] and Garment Code Data (V2) [Korosteleva et al., 2024] for training and evaluation. For Sew Factory, we employ off-the-shelf rendered garments superimposed on diverse human poses as image prompts (without text prompts). For Dress Code Data and Garment Code Data, we designed multimodal data annotation pipelines (depicted in Figure 4) to generate both text and image prompts for sewing patterns. |
| Dataset Splits | Yes | For Sew Factory, we use our own version that 90% of randomly selected data points are used for training, with the remaining 10% evenly divided for validation and testing. For Dress Code Data and Garment Code Data (V2), we adhere strictly to the official splits provided by the authors for training, validation, and testing. |
| Hardware Specification | Yes | Our model is distributedly trained across 8 A10 GPUs (24GB) with the Hugging Face Accelerate library [Gugger et al., 2022]. |
| Software Dependencies | No | The paper mentions several tools and models like 'Adam W optimizer' and 'Hugging Face Accelerate library', 'Open AI Vi T-H/14', 'CLIP', 'Llama-3.1-8B-Instruct', 'Misto Line', and 'Anything-XL fine-tuned from SD-XL'. However, specific version numbers for these software components or libraries are not provided. |
| Experiment Setup | Yes | We adopt a DDPM noise scheduler for diffusion training, with a maximum of 1, 000 denoising steps and a linear beta scheduler (beta start = 1 10 4, beta end = 2 10 2). We use the Adam W optimizer [Loshchilov and Hutter, 2019] with betas = (0.95, 0.999), a constant learning rate of 1 10 4 and the weight decay of 1 10 2. The training epoch is set to 1, 000 with an early-stop criterion. We evaluate the model at denoising steps of 50, 200, 500, and 1000 every 10 epochs. Based on the results shown in Figure 5, we select 50 denoising steps for inference. The multimodal training is performed in a round-robin fashion, following the order of image prompts, text prompts and image-and-text prompts. |