Warm Diffusion: Recipe for Blur-Noise Mixture Diffusion Models

Authors: Hao-Chien Hsueh, Wen-Hsiao Peng, Ching-Chun Huang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS Datasets. We validate our proposed diffusion process on three widely used benchmarks: CIFAR-10 (Krizhevsky, 2009) 32 32, FFHQ (Karras et al., 2019) 64 64, and LSUN-church (Yu et al., 2016) 128 128. These datasets were chosen to demonstrate the effectiveness of our method across various scenarios. The CIFAR-10 dataset contains 32 32 color images across 10 classes, allowing us to test both unconditional and class-conditional image generation. For the FFHQ and LSUN-church datasets, we evaluate the model in unconditional settings. ... Performance Comparison. To evaluate image generation quality, we use two commonly adopted benchmarks: Fr echet Inception Distance (FID) (Heusel et al., 2018) and Inception Score (IS) (Salimans et al., 2016). The FID score measures the distance between the generated and reference datasets; a lower FID score indicates greater similarity between the two, reflecting better recovery of the data distribution by the generative model. ... As shown in Tab. 1, we assess sample quality using FID and IS alongside the number of function evaluations (NFE) during sampling a metric closely related to the sampling speed of diffusion-based methods. Our approach significantly enhances the performance of the baseline model, EDM, across both CIFAR-10 and FFHQ datasets, regardless of their differing characteristics.
Researcher Affiliation Academia Hao-Chien Hsueh Wen-Hsiao Peng Ching-Chun Huang National Yang Ming Chiao Tung University, Taiwan
Pseudocode Yes Algorithm 1 Training Phase 1: Require: Hyperparameters {Pmean, Pstd, BNR} 2: Initialize Neural network Fθ 3: repeat ... Algorithm 2 Generation phase: Deterministic sampling with Heun s 2nd order method 1: Require: Neural network Fθ, Sampling schedule {(α0, β0), (α1, β1), . . . , (αN, βN)} 2: sample x N N(0, β2 NI) 3: for i {N, N 1, . . . , 1} do ...
Open Source Code No The paper does not explicitly state that source code is provided or offer a link to a code repository.
Open Datasets Yes Datasets. We validate our proposed diffusion process on three widely used benchmarks: CIFAR-10 (Krizhevsky, 2009) 32 32, FFHQ (Karras et al., 2019) 64 64, and LSUN-church (Yu et al., 2016) 128 128. These datasets were chosen to demonstrate the effectiveness of our method across various scenarios.
Dataset Splits Yes Datasets. We validate our proposed diffusion process on three widely used benchmarks: CIFAR-10 (Krizhevsky, 2009) 32 32, FFHQ (Karras et al., 2019) 64 64, and LSUN-church (Yu et al., 2016) 128 128. ... Following established procedures, we sample 50,000 images over three rounds and report the minimum scores to mitigate random variation effects. As shown in Tab. 1, we assess sample quality using FID and IS alongside the number of function evaluations (NFE) during sampling a metric closely related to the sampling speed of diffusion-based methods.
Hardware Specification No Table 6: Hyperparameters and model sizes used in the experiments. ... Number of GPUs 8 8 8 8 8 8
Software Dependencies No For training, we adopt the improved DDPM++/NCSN++ (Song et al., 2021) network architectures, training strategies, and hyperparameters from the state-of-the-art diffusion model, EDM (Karras et al., 2022). ... For sampling, we adapt Heun s 2nd solver, following EDM (Karras et al., 2022).
Experiment Setup Yes Implementation Details. For training, we adopt the improved DDPM++/NCSN++ (Song et al., 2021) network architectures, training strategies, and hyperparameters from the state-of-the-art diffusion model, EDM (Karras et al., 2022). Modifications are made to enable the network to accept two conditioning signals the blur and noise levels and we double the output channels to produce predictors for deblurring and denoising, respectively. ... In Tab. 6, we list the hyperparameters used in our experiments. For CIFAR-10 and FFHQ, we adopt the same settings as EDM (Karras et al., 2022), without tuning for optimal hyperparameters. For LSUN-church, we follow the network architecture and settings from Blurring Diffusion to ensure a fair comparison. ... Table 6: Hyperparameters and model sizes used in the experiments. Minibatch size 512 ... Learning rate 10 4 ... Dropout probability 0.13 ... Channel multiplier 128 ... Attention resolutions {16}