Soft Diffusion: Score Matching with General Corruptions

Authors: Giannis Daras, Mauricio Delbracio, Hossein Talebi, Alex Dimakis, Peyman Milanfar

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show experimentally that our framework works for general linear corruption processes, such as Gaussian blur and masking. Our method outperforms all linear diffusion models on Celeb A-64 achieving FID score 1.85. We also show computational benefits compared to vanilla denoising diffusion.
Researcher Affiliation Collaboration Giannis Daras EMAIL UT Austin Mauricio Delbracio EMAIL Google Research Hossein Talebi EMAIL Google Research Alexandros G. Dimakis EMAIL UT Austin Peyman Milanfar EMAIL Google Research
Pseudocode Yes Algorithm 1 Naive Sampler Algorithm 2 Momentum Sampler
Open Source Code No The paper does not provide an explicit statement of code release, a link to a code repository for the methodology described, or indicate that code is in supplementary materials. It does provide anonymous URLs for schedules of blur, masking, and noise parameters, but these are data files, not the full source code for the model and sampling algorithms.
Open Datasets Yes We evaluate our method in Celeb A-64 and CIFAR-10.
Dataset Splits No The paper states: "We train our networks on Celeb A-64 and CIFAR-10... We use 50000 samples to evaluate the FID, as it is typically done in prior work." While it mentions training and evaluation, it does not specify the exact split percentages or sample counts for training, validation, and test sets for either dataset, nor does it cite a standard split being used.
Hardware Specification Yes We train our models on 16 v2-TPUs.
Software Dependencies No The paper mentions using "Adam optimizer" and states learning rate, beta values, and epsilon for it. However, it does not specify the version of Adam, nor does it mention the version of the deep learning framework (e.g., TensorFlow, PyTorch) or any other libraries used.
Experiment Setup Yes Hyperparameters. For our trainings, we use Adam optimizer with learning rate 2e 4, β1 = 0.9, β2 = 0.999, ϵ = 1e 8. We additionally use gradient clipping for gradient norms bigger than 1. For the learning rate scheduling, we use 5000 steps of linear warmup. We use batch size 128 and we train for 1 2M iterations (based on observed FID performance).