Denoising Levy Probabilistic Models

Authors: Dario Shariatian, Umut Simsekli, Alain Oliviero Durmus

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that DLPM provides (i) better coverage of the tails of the data distribution, (ii) improved generation of unbalanced datasets, and (iii) faster computation times, requiring fewer backward steps.
Researcher Affiliation Academia 1INRIA Department of Computer Science, PSL Research University, Paris, France 2École Polytechnique CMAP, IP Paris, Palaiseau, France EMAIL EMAIL
Pseudocode Yes A ALGORITHMS FOR DLPM AND DLIM In this section, we explicitly provide the algorithms needed to train and sample from the DLPM and DLIM generative methods. Algorithm 2 DLPM training simplified loss Algorithm 3 Stochastic sampling (DLPM) Algorithm 4 Deterministic sampling (DLIM)
Open Source Code No The paper mentions using a third-party implementation: "We use a U-Net following the implementation of Nichol & Dhariwal (2021) available in https://github.com/openai/improved-diffusion." However, there is no explicit statement or link provided for the authors' own source code for the methodology described in this paper.
Open Datasets Yes In our experiments on images, we make use of the dataset CIFAR10 LT (long tail), that has been introduced in Yoon et al. (2023) as an unbalanced modification of the CIFAR10 dataset. We work on the MNIST and the CIFAR10_LT dataset.
Dataset Splits Yes For our 2D datasets, we use 32000 datapoints for training, a batch size of 1024, and 25000 points for evaluation. CIFAR10_LT consists of the CIFAR10 images were artificial class unbalance has been introduced. The specific class counts we use are [5000, 2997, 1796, 1077, 645, 387, 232, 139, 83, 50].
Hardware Specification Yes All the training and experiments are conducted on four NVIDIA RTX8000 GPU and four NVIDIA V100 GPU
Software Dependencies No The paper mentions "Py Torch" for experiments and "Adam" as the optimizer, but no specific version numbers for any software dependencies are provided.
Experiment Setup Yes For our 2D datasets, we use 32000 datapoints for training, a batch size of 1024, and 25000 points for evaluation. We train each model for 10000 steps. ... The optimizer is Adam (Kingma & Ba (2017)) with learning rate 5e-3. We train MNIST for 120000 steps with batch size 256 with a time horizon T = 1000, and CIFAR_LT for 400000 steps with batch size 100 with a time horizon T = 4000. The optimizer is Adam (Kingma & Ba (2017)) with learning rate 1e-3 for MNIST and 2e-4 for CIFAR10_LT. We use the Step LR scheduler which scales the learning rate by γ = .99 every N = 1000 steps for CIFAR10_LT and N = 400 for MNIST. We use an exponential moving average with a rate of 0.99 for MNIST and 0.9999 for CIFAR10_LT.