reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Closed-Form Diffusion Models

Authors: Christopher Scarvelis, Haitz Sáez de Ocáriz Borde, Justin Solomon

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using this estimator, our method achieves competitive sampling times while running on consumer-grade CPUs... In Section 6, we scale our method to high-dimensional tasks such as image generation... We benchmark our model s sample quality, training time, and sampling time against a denoising diﬀusion probabilistic model (DDPM)... We report sample quality and generation time metrics, including inception scores (Salimans et al., 2016) and kernel inception distances (KID) (Bińkowski et al., 2018) between generated samples and samples from the Celeb A test partition in Table 2.
Researcher Affiliation	Academia	Christopher Scarvelis EMAIL MIT CSAIL Haitz Sáez de Ocáriz Borde EMAIL University of Oxford Justin Solomon EMAIL MIT CSAIL
Pseudocode	Yes	Algorithm 1 Sampling Input: Training set {xi}N i=1, noise {ui,m}, step size h = 1 S , initial sample z0 N(0, I) for n = 0, ..., S 1 do tn = n S zn+1 zn + hvÃ,tn(zn) end for return z T
Open Source Code	No	The paper does not explicitly state that the authors are releasing the source code for the methodology described in this paper, nor does it provide a direct link to a code repository. It mentions using Faiss for approximate nearest neighbor search algorithms, which is a third-party tool.
Open Datasets	Yes	We display images from a held-out test set along with DDPM samples and our model s samples in Figure 6. Both our model and the DDPM generate images that qualitatively resemble the test images, but as our model can only output barycenters of training samples (see Theorem 5.1), our samples exhibit softer details than the test and DDPM samples. Table 1 records sample quality metrics and training and generation times for our method and the DDPM baseline. 1Dataset available on Hugging Face: huggan/smithsonian_butterflies_subset ...for the Celeb A dataset (Liu et al., 2015)
Dataset Splits	Yes	Our training data is drawn from the dataset huggan/smithsonian_butterflies_subset, which is publicly available on huggingface and contains 1000 images of butterﬂies. We extract RGB images from the image column of their dataset and reshape them to 128 128 before using them in our experiments. We construct an 80/20 train-test split and use the train partition to train the DDPM and to construct our Ã-CFDM, and use the test partition to compute metrics.
Hardware Specification	Yes	Using this estimator, our method achieves competitive sampling times while running on consumer-grade CPUs... while running on a Macbook M1 Pro CPU with 16 GB of RAM... comparable sample quality to a DDPM that has been trained for 5.34 hours on a single V100 GPU
Software Dependencies	No	The paper mentions several software components like "lucidrains library denoisingdiffusionpytorch", "Faiss", "torch-dct package", "Adam W optimizer", and "torchmetrics" but does not provide specific version numbers for any of these.
Experiment Setup	Yes	We use 1000 time steps during training, and use DDIM sampling with 100 time steps during sample generation. We train the diﬀusion model with a batch size of 8 at a learning rate of 5 10 5 for 20,000 iterations... We set M = 2 and Ã = 0.1 for this experiment, and compute the smoothed score exactly... We start sampling at T = 0.98 and use step size 0.01... We set the regularization parameter to = 4... We train for 100 epochs at a learning rate of 10 4 using the Adam W optimizer (Loshchilov & Hutter, 2017)... We set the regularization strength at 10 3 and train for 100 epochs at a learning rate of 10 4 using the Adam W optimizer.