Improving Probabilistic Diffusion Models With Optimal Diagonal Covariance Matching
Authors: Zijing Ou, Mingtian Zhang, Andi Zhang, Tim Xiao, Yingzhen Li, David Barber
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To support our theoretical discussion, we first evaluate the performance of optimal covariance matching by training diffusion probabilistic models on 2D toy examples. We then demonstrate its effectiveness in enhancing image modelling in both pixel and latent spaces, focusing on the comparison between optimal covariance matching and other covariance estimation methods, and showing that the proposed approach has the potential to scale to large image generation tasks. |
| Researcher Affiliation | Academia | 1Imperial College London, 2University College London, 3University of Manchester, 4Max Planck Institute for Intelligent Systems, Tubingen 5University of Tubingen, 6IMPRS-IS. |
| Pseudocode | Yes | Algorithm 1 Sampling procedure from t t in OCM-DDPM Algorithm 2 Sampling procedure from t t in OCM-DDIM |
| Open Source Code | Yes | Code is available at: https://github.com/J-zin/OCM_DPM. |
| Open Datasets | Yes | In this experiment, we mainly focus on four datasets: CIFAR10 (Krizhevsky et al., 2009) with the linear schedule (LS) of βt (Ho et al., 2020) and the cosine schedule (CS) of βt (Nichol & Dhariwal, 2021); Celeb A (Liu et al., 2015); LSUN Bedroom (Yu et al., 2015). |
| Dataset Splits | Yes | The FID score is computed on 50K generated samples. Following Nichol & Dhariwal (2021); Bao et al. (2022a); Peebles & Xie (2023), the reference distribution statistics for FID are calculated using the full training set for CIFAR10 and Image Net, and 50K training samples for Celeb A and LSUN Bedroom. |
| Hardware Specification | Yes | We train our models using one A100-80G GPU for CIFAR10, Celeb A 64x64, and Image Net 64x64; four A100-80G GPUs for LSUN Bedroom; and eight A100-80G GPUs for Image Net 256x256. |
| Software Dependencies | No | The paper mentions optimizers like Adam W (Loshchilov, 2017) and Adam (Kingma & Ba, 2014), but does not specify software dependencies with version numbers (e.g., PyTorch version, Python version, CUDA version). |
| Experiment Setup | Yes | We use the Adam W optimizer (Loshchilov, 2017) with a learning rate of 0.0001 and train for 500K iterations across all datasets. The batch sizes are set to 64 for LSUN Bedroom, 128 for CIFAR10, Celeb A 64x64, and Image Net 64x64, and 256 for Image Net 256x256. |