Stochastic Forward–Backward Deconvolution: Training Diffusion Models with Finite Noisy Datasets

Authors: Haoye Lu, Qifan Wu, Yaoliang Yu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To overcome this limitation, we propose to pretrain the model with a small fraction of clean data to guide the deconvolution process. Combined with our Stochastic Forward Backward Deconvolution (SFBD) method, we attain FID 6.31 on CIFAR-10 with just 4% clean images (and 3.58 with 10%). We also provide theoretical guarantees that SFBD learns the true data distribution. These results underscore the value of limited clean pretraining, or pretraining on similar datasets. Empirical studies further validate and enrich our findings.
Researcher Affiliation Academia 1Cheriton School of Computer Science, University of Waterloo, Canada 2Vector Institute, Canada. Correspondence to: Haoye Lu <EMAIL>.
Pseudocode Yes Algorithm 1 Stochastic Forward Backward Deconvolution. (Given sample set D, p D denotes the corresponding empirical distribution.) Input: clean data: Dclean = {x(i)}M i=1, noisy data: Dnoisy = {y(i) τ }N i=1, number of iterations: K. // Initialize Denoiser 1 ϕ0 Pretrain Dϕ using Eq (4) with p0 = p Dclean 2 for k = 1 to K do // Backward Sampling 3 Ek {y(i) 0 : y(i) τ Dnoisy, solve backward SDE Eq (3) from τ to 0, starting from y(i) τ , where the score function is estimated as Dϕk 1(xt,t) xt // Denoiser Update 4 ϕk Train Dϕ by minimizing Eq (4) with p0 = p Ek Output: Final denoiser DϕK
Open Source Code Yes Code for the empirical study is available at: github.com/watml/SFBD.
Open Datasets Yes The experiments are conducted on the CIFAR-10 (Krizhevsky & Hinton, 2009) and Celeb A (Liu et al., 2022) datasets, with resolutions of 32 32 and 64 64, respectively. CIFAR-10 consists of 50,000 training images and 10,000 test images across 10 classes. Celeb A, a dataset of human face images, includes a predefined split of 162,770 training images, 19,867 validation images, and 19,962 test images.
Dataset Splits Yes The experiments are conducted on the CIFAR-10 (Krizhevsky & Hinton, 2009) and Celeb A (Liu et al., 2022) datasets, with resolutions of 32 32 and 64 64, respectively. CIFAR-10 consists of 50,000 training images and 10,000 test images across 10 classes. Celeb A, a dataset of human face images, includes a predefined split of 162,770 training images, 19,867 validation images, and 19,962 test images.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, or memory amounts) used for running its experiments. It only mentions general resources: "Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute."
Software Dependencies No The paper mentions software components like "EDMLoss (Karras et al., 2022)", "Adam (Kingma & Ba, 2015)", and "DINOv2 (Oquab et al., 2024)" along with citations, but does not provide specific version numbers for these software components or the underlying programming frameworks (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes We implemented SFBD algorithms using the architectures proposed in EDM (Karras et al., 2022) as well as the optimizers and hyperparameter configurations therein. All models are implemented in an unconditional setting, and we also enabled the non-leaky augmentation technique (Karras et al., 2022) to alleviate the overfitting problem. For the backward sampling step in SFBD, we adopt the 2nd-order Heun method (Karras et al., 2022). More information is provided in Appx E. ... Table 3: Experimental Configuration for CIFAR-10 and Celeb A Parameter CIFAR-10 Celeb A General Batch Size 512 256 Loss Function EDMLoss (Karras et al., 2022) EDMLoss (Karras et al., 2022) Sampling Method 2nd order Heun method (EDM) (Karras et al., 2022) 2nd order Heun method (EDM) (Karras et al., 2022) Sampling steps 18 40 Network Configuration Dropout 0.13 0.05 Channel Multipliers {2, 2, 2} {1, 2, 2, 2} Model Channels 128 128 Resample Filter {1, 1} {1, 3, 3, 1} Channel Mult Noise 1 2 Optimizer Configuration Optimizer Class Adam (Kingma & Ba, 2015) Adam (Kingma & Ba, 2015) Learning Rate 0.001 0.0002 Epsilon 1 10 8 1 10 8 Betas (0.9, 0.999) (0.9, 0.999)