Closed-Form Diffusion Models
Authors: Christopher Scarvelis, Haitz Sáez de Ocáriz Borde, Justin Solomon
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using this estimator, our method achieves competitive sampling times while running on consumer-grade CPUs... In Section 6, we scale our method to high-dimensional tasks such as image generation... We benchmark our model s sample quality, training time, and sampling time against a denoising diffusion probabilistic model (DDPM)... We report sample quality and generation time metrics, including inception scores (Salimans et al., 2016) and kernel inception distances (KID) (Bińkowski et al., 2018) between generated samples and samples from the Celeb A test partition in Table 2. |
| Researcher Affiliation | Academia | Christopher Scarvelis EMAIL MIT CSAIL Haitz Sáez de Ocáriz Borde EMAIL University of Oxford Justin Solomon EMAIL MIT CSAIL |
| Pseudocode | Yes | Algorithm 1 Sampling Input: Training set {xi}N i=1, noise {ui,m}, step size h = 1 S , initial sample z0 N(0, I) for n = 0, ..., S 1 do tn = n S zn+1 zn + hvÃ,tn(zn) end for return z T |
| Open Source Code | No | The paper does not explicitly state that the authors are releasing the source code for the methodology described in this paper, nor does it provide a direct link to a code repository. It mentions using Faiss for approximate nearest neighbor search algorithms, which is a third-party tool. |
| Open Datasets | Yes | We display images from a held-out test set along with DDPM samples and our model s samples in Figure 6. Both our model and the DDPM generate images that qualitatively resemble the test images, but as our model can only output barycenters of training samples (see Theorem 5.1), our samples exhibit softer details than the test and DDPM samples. Table 1 records sample quality metrics and training and generation times for our method and the DDPM baseline. 1Dataset available on Hugging Face: huggan/smithsonian_butterflies_subset ...for the Celeb A dataset (Liu et al., 2015) |
| Dataset Splits | Yes | Our training data is drawn from the dataset huggan/smithsonian_butterflies_subset, which is publicly available on huggingface and contains 1000 images of butterflies. We extract RGB images from the image column of their dataset and reshape them to 128 128 before using them in our experiments. We construct an 80/20 train-test split and use the train partition to train the DDPM and to construct our Ã-CFDM, and use the test partition to compute metrics. |
| Hardware Specification | Yes | Using this estimator, our method achieves competitive sampling times while running on consumer-grade CPUs... while running on a Macbook M1 Pro CPU with 16 GB of RAM... comparable sample quality to a DDPM that has been trained for 5.34 hours on a single V100 GPU |
| Software Dependencies | No | The paper mentions several software components like "lucidrains library denoisingdiffusionpytorch", "Faiss", "torch-dct package", "Adam W optimizer", and "torchmetrics" but does not provide specific version numbers for any of these. |
| Experiment Setup | Yes | We use 1000 time steps during training, and use DDIM sampling with 100 time steps during sample generation. We train the diffusion model with a batch size of 8 at a learning rate of 5 10 5 for 20,000 iterations... We set M = 2 and à = 0.1 for this experiment, and compute the smoothed score exactly... We start sampling at T = 0.98 and use step size 0.01... We set the regularization parameter to = 4... We train for 100 epochs at a learning rate of 10 4 using the Adam W optimizer (Loshchilov & Hutter, 2017)... We set the regularization strength at 10 3 and train for 100 epochs at a learning rate of 10 4 using the Adam W optimizer. |