reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Note on the Convergence of Denoising Diffusion Probabilistic Models

Authors: Sokhna Diarra Mbacke, Omar Rivasplata

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The goal of these experiments is to assess the numerical value of the bound of Theorem 3.1 on a synthetic dataset. The data-generating distribution is chosen to be the uniform distribution on the square of side 2, centered at the origin. Figure 2 shows samples from this target distribution. The backward process uses a shared network with fully connected layers, and 128 hidden units each. The model is trained on 50, 000 samples from the original distribution, and the bound is computed with n = 5, 000 independent samples. Samples from the trained model are shown in Figure 3. We computed the bound for different values of λ.
Researcher Affiliation	Academia	Sokhna Diarra Mbacke EMAIL Université Laval Omar Rivasplata EMAIL University College London
Pseudocode	No	The paper describes mathematical derivations and theoretical results. It does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	No	The data-generating distribution is chosen to be the uniform distribution on the square of side 2, centered at the origin. Figure 2 shows samples from this target distribution. This is a synthetic dataset generated for the purpose of the paper, but no specific access information (link, citation, or generation script) is provided to reproduce this exact dataset.
Dataset Splits	No	The model is trained on 50,000 samples from the original distribution, and the bound is computed with n = 5,000 independent samples. While sample sizes are mentioned for training and bound computation, there is no explicit description of traditional training/validation/test splits for the model's evaluation.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU, CPU models) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software names along with their version numbers.
Experiment Setup	Yes	The backward process uses a shared network with fully connected layers, and 128 hidden units each. The model is trained on 50, 000 samples from the original distribution, and the bound is computed with n = 5, 000 independent samples. We estimated the Lipschitz norms Kt θ using K t from Remark 3.2, and the expected norms in the last two terms of Theorem 3.1 are estimated using 106 independent samples from each distribution.