Training Diffusion-based Generative Models with Limited Data

Authors: Zhaoyu Zhang, Yang Hua, Guanxiong Sun, Hui Wang, Seán Mcloone

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on several datasets demonstrate that LD-Diffusion can achieve better performance compared to other diffusion models. Codes are available at https://github. com/zzhang05/LD-Diffusion.
Researcher Affiliation Collaboration Zhaoyu Zhang 1 Yang Hua 1 Guanxiong Sun 2 Hui Wang 1 Se an Mc Loone 1 1Queen s University Belfast 2Huawei UKRD. Correspondence to: Yang Hua <EMAIL>.
Pseudocode No The paper includes mathematical equations and theoretical analysis but does not feature any explicitly labeled pseudocode or algorithm blocks. Procedures are described in paragraph text.
Open Source Code Yes Codes are available at https://github. com/zzhang05/LD-Diffusion.
Open Datasets Yes We select FFHQ (Karras et al., 2020b) and low-shot (Zhao et al., 2020) datasets for the experiments.
Dataset Splits Yes Case 1: A small subset of a large dataset is used to train the model, while the full dataset serves as the reference distribution for calculating the FID. This case is applied to the experiments with the FFHQ dataset; Case 2: The original small dataset is used both for training and as the reference distribution for FID calculations. This case is typically utilized in experiments with low-shot datasets. In this study, we consider both cases when designing LD-Diffusion.
Hardware Specification Yes We conduct all the experiments on a single workstation with two A5000 (24G) GPUs, with a total of 10 same workstations for all experiments. The results are calculated by averaging over ten times on the two NVIDIA A5000 GPUs.
Software Dependencies No We follow the EDM3 to build up our software environment. 3https://github.com/NVlabs/edm The paper references a GitHub repository for the software environment but does not list specific version numbers for Python, PyTorch, CUDA, or other key libraries.
Experiment Setup Yes For the compressing model in LD-Diffusion, the pre-trained encoder is selected from SD-MSE (Rombach et al., 2022d) and the pre-trained decoder is selected from SD-EMA (Rombach et al., 2022c) with a downsample factor of 8 (refer to B.2). For the MAFP in LD-Diffusion, we set p1 and p2 as 0.1 for all experiments (refer to 5.4.2). Furthermore, patch training (Wang et al., 2023c) is applied to LD-Diffusion on experiments with the FFHQ dataset. In contrast, the proposed OOD regularization is applied to LD-Diffusion in experiments with low-shot datasets. Based on EDM (Karras et al., 2022), the denoise sampling Number of Function Evaluations (NFE) is set as 79 and 511 for all diffusion-based methods on FFHQ and low-shot datasets, respectively. We set the overall training duration for all diffusion-based generative models on the FFHQ dataset and low-shot datasets as 20000kimgs and 40000kimgs, respectively.