An analytic theory of creativity in convolutional diffusion models
Authors: Mason Kamb, Surya Ganguli
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We next test our theory on two CNN-based architectures, a standard UNet (Ronneberger et al., 2015) and a Res Net (He et al., 2016) trained on 4 datasets, MNIST, Fashion MNIST, CIFAR10, and Celeb A (see App. C.1 for details of architectures and training). We restrict our attention to these simple datasets because our theory is for CNN-based diffusion models only, and more complex diffusion models with attention and latent spaces are required to model more complex datasets. [...] For Res Nets, we find median r2 values between theory and experiment of 0.94 on MNIST, 0.90 on Fashion MNIST, 0.90 on CIFAR10, and 0.96 on Celeb A32x32. |
| Researcher Affiliation | Academia | 1Department of Applied Physics, Stanford University, California, United States. Correspondence to: Mason Kamb <EMAIL>, Surya Ganguli <EMAIL>. |
| Pseudocode | No | The paper includes mathematical derivations and descriptions of methods, but it does not present any explicitly labeled pseudocode or algorithm blocks with structured steps formatted like code. |
| Open Source Code | Yes | 1Code for the following experiments hosted at https://github.com/Kambm/convolutional diffusion |
| Open Datasets | Yes | trained on 4 datasets, MNIST, Fashion MNIST, CIFAR10, and Celeb A (see App. C.1 for details of architectures and training). |
| Dataset Splits | No | The paper mentions training on MNIST, Fashion MNIST, CIFAR10, and Celeb A datasets, which typically have standard splits. However, it does not explicitly provide specific percentages, sample counts for train/test/validation splits, or reference the use of 'standard splits' for these datasets within the main text or appendices for its own experiments. It mentions evaluating on '100 distinct random noise inputs'. |
| Hardware Specification | No | The paper describes the CNN-based architectures (UNet, Res Net) and general training parameters, but it does not specify any particular hardware used for running the experiments, such as GPU models, CPU types, or cloud computing environments with specifications. |
| Software Dependencies | No | The paper mentions using Adam optimizer and a cosine noise schedule, which are methods. It also implicitly uses frameworks like PyTorch (given the context of deep learning research and GitHub link), but it does not explicitly list any software dependencies with specific version numbers (e.g., Python version, PyTorch version, CUDA version, etc.). |
| Experiment Setup | Yes | For all experiments, we train each model for 300 epochs with Adam, using an initial learning rate of 1e-4, a batchsize of 128, and an exponential learning rate schedule that applies a multiplicative factor of 0.999965 to the learning rate with each step (this approximately halves the learning rate over the course of 50 epochs with our batch size of 128). |