Training Diffusion-based Generative Models with Limited Data
Authors: Zhaoyu Zhang, Yang Hua, Guanxiong Sun, Hui Wang, Seán Mcloone
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on several datasets demonstrate that LD-Diffusion can achieve better performance compared to other diffusion models. Codes are available at https://github. com/zzhang05/LD-Diffusion. |
| Researcher Affiliation | Collaboration | Zhaoyu Zhang 1 Yang Hua 1 Guanxiong Sun 2 Hui Wang 1 Se an Mc Loone 1 1Queen s University Belfast 2Huawei UKRD. Correspondence to: Yang Hua <EMAIL>. |
| Pseudocode | No | The paper includes mathematical equations and theoretical analysis but does not feature any explicitly labeled pseudocode or algorithm blocks. Procedures are described in paragraph text. |
| Open Source Code | Yes | Codes are available at https://github. com/zzhang05/LD-Diffusion. |
| Open Datasets | Yes | We select FFHQ (Karras et al., 2020b) and low-shot (Zhao et al., 2020) datasets for the experiments. |
| Dataset Splits | Yes | Case 1: A small subset of a large dataset is used to train the model, while the full dataset serves as the reference distribution for calculating the FID. This case is applied to the experiments with the FFHQ dataset; Case 2: The original small dataset is used both for training and as the reference distribution for FID calculations. This case is typically utilized in experiments with low-shot datasets. In this study, we consider both cases when designing LD-Diffusion. |
| Hardware Specification | Yes | We conduct all the experiments on a single workstation with two A5000 (24G) GPUs, with a total of 10 same workstations for all experiments. The results are calculated by averaging over ten times on the two NVIDIA A5000 GPUs. |
| Software Dependencies | No | We follow the EDM3 to build up our software environment. 3https://github.com/NVlabs/edm The paper references a GitHub repository for the software environment but does not list specific version numbers for Python, PyTorch, CUDA, or other key libraries. |
| Experiment Setup | Yes | For the compressing model in LD-Diffusion, the pre-trained encoder is selected from SD-MSE (Rombach et al., 2022d) and the pre-trained decoder is selected from SD-EMA (Rombach et al., 2022c) with a downsample factor of 8 (refer to B.2). For the MAFP in LD-Diffusion, we set p1 and p2 as 0.1 for all experiments (refer to 5.4.2). Furthermore, patch training (Wang et al., 2023c) is applied to LD-Diffusion on experiments with the FFHQ dataset. In contrast, the proposed OOD regularization is applied to LD-Diffusion in experiments with low-shot datasets. Based on EDM (Karras et al., 2022), the denoise sampling Number of Function Evaluations (NFE) is set as 79 and 511 for all diffusion-based methods on FFHQ and low-shot datasets, respectively. We set the overall training duration for all diffusion-based generative models on the FFHQ dataset and low-shot datasets as 20000kimgs and 40000kimgs, respectively. |