Generalization in diffusion models arises from geometry-adaptive harmonic representations
Authors: Zahra Kadkhodaie, Florentin Guth, Eero P Simoncelli, Stéphane Mallat
ICLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We trained denoisers on subsets of the (downsampled) Celeb A dataset (Liu et al., 2015) of size N = 100, 101, 102, 103, 104, 105. We used a UNet architecture (Ronneberger et al., 2015)... Results are shown in Figure 1. When N = 1, the denoiser essentially memorizes the single training image, leading to a high test error. Increasing N substantially increases the performance on the test set while worsening performance on the training set, as the network transitions from memorization to generalization. At N = 105, empirical test and train error are matched for all noise levels. |
| Researcher Affiliation | Academia | Zahra Kadkhodaie Ctr. for Data Science, New York University EMAIL Florentin Guth Ctr. for Data Science, New York University Flatiron Institute, Simons Foundation EMAIL Eero P. Simoncelli New York University Flatiron Institute, Simons Foundation EMAIL Stéphane Mallat Collège de France Flatiron Institute, Simons Foundation EMAIL |
| Pseudocode | Yes | Algorithm 1 Sampling via ascent of the log-likelihood gradient from a denoiser residual |
| Open Source Code | Yes | Source code: https://github.com/Lab For Computational Vision/memorization_generalization_in_diffusion_models |
| Open Datasets | Yes | We trained denoisers on subsets of the (downsampled) Celeb A dataset (Liu et al., 2015) of size N = 100, 101, 102, 103, 104, 105. ... For experiments shown in Figures 9 and 10, we use images drawn from the LSUN bedroom dataset (Yu et al., 2015) downsampled to 80 × 80 resolution. ... For experiments shown in Figure 11 we use Celeb A HQ dataset (Karras et al., 2018) downsampled to 40 × 40 resolution. |
| Dataset Splits | No | The paper discusses training and test data performance (e.g., "At N = 10^5, empirical test and train error are matched"), but it does not explicitly mention a separate 'validation' dataset or its split details. |
| Hardware Specification | No | The paper acknowledges computing resources from the Flatiron Institute and NYU but does not specify any particular hardware components like GPU models, CPU types, or memory used for experiments. |
| Software Dependencies | No | The paper mentions using specific network architectures like UNet and BF-CNN, but it does not list any specific software libraries, frameworks, or operating systems with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Training is carried out on batches of size 512, for 1000 epochs. ... We chose h = 0.01, β = 0.1, σ0 = 1, and σ = 0.05. |