Diffusion Models with Deterministic Normalizing Flow Priors

Authors: Mohsen Zand, Ali Etemad, Michael Greenspan

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on standard image generation datasets demonstrate the advantage of the proposed method over existing approaches. On the unconditional CIFAR10 dataset, for example, we achieve an FID of 2.01 and an Inception score of 9.96. Our method also demonstrates competitive performance on Celeb A-HQ-256 dataset as it obtains an FID score of 7.11.
Researcher Affiliation Academia Mohsen Zand1,2 , Ali Etemad2, Michael Greenspan2 1Research Computing Center, University of Chicago 2Department of Electrical and Computer Engineering, and Ingenuity Labs Research Institute, Queen s University
Pseudocode No The paper describes mathematical formulations (Eq. 1, 2, 3, 4, 5, 6) and conceptual steps of the proposed method, but it does not contain a structured pseudocode or algorithm block.
Open Source Code Yes Code is available at https://github.com/Mohsen Zand/Di Nof.
Open Datasets Yes We show quantitative comparisons for unconditional image generation on CIFAR-10 (Krizhevsky et al., 2009) and Celeb A-HQ-256 (Karras et al., 2017).
Dataset Splits No We show quantitative comparisons for unconditional image generation on CIFAR-10 (Krizhevsky et al., 2009) and Celeb A-HQ-256 (Karras et al., 2017). We perform experiments on these two challenging datasets following the conventional experimental setup in the field (such as Kim et al. (2022); Salimans & Ho (2022); Song et al. (2020a)).
Hardware Specification Yes We evaluate Di Nof in terms of sampling time on the CIFAR10 dataset. We specifically measure the improved runtime in comparison to the original SDEs (VESDE, VPSDE, and sub-VPSDE) with PC samplers (Song et al., 2020b) on an NVIDIA A100 GPU.
Software Dependencies No The paper describes various models, architectures, and algorithms used (e.g., NCSN++, DDPM++, Glow, PC samplers, Langevin dynamics), but does not provide specific version numbers for software libraries or dependencies like PyTorch, TensorFlow, or Python.
Experiment Setup Yes We set T = 1000 and T = 1 for discrete and continuous diffusion processes, respectively. The number of noise scales N is however set to 1000 for both cases. Additionally, the number of conditional Langevin steps is set to 1. The Langevin signal-to-noise ratio for CIFAR-10 and Celeb A-HQ-256 are fixed at 0.16 and 0.17, respectively. As our normalizing flow model, we use the multiscale architecture Glow (Kingma & Dhariwal, 2018) with the number of levels L = 3 and the number of steps of each level K = 16. We also set the number of hidden channels to 256. Also includes: models trained for 500K training iterations with a batch size of 32, trained for 1M iterations. The batch size is also fixed to 128, We trained on this dataset for 0.5M iterations, and the most recent training checkpoint is used to derive the results. We use a batch size of 8 for training and a batch size of 64 for sampling.