reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Boosting Latent Diffusion with Perceptual Objectives

Authors: Tariq Berrada, Pietro Astolfi, Melissa Hall, Marton Havasi, Yohann Benchetrit, Adriana Romero-Soriano, Karteek Alahari, Michal Drozdzal, Jakob Verbeek

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments with models trained on three datasets at 256 and 512 resolution show improved quantitative with boosts between 6% and 20% in FID and qualitative results when using our perceptual loss.
Researcher Affiliation	Collaboration	1 FAIR at Meta, 2 Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, France 3 Mc Gill University, 4 Mila, Quebec AI institute, 5 Canada CIFAR AI chair EMAIL
Pseudocode	No	The paper describes methods and equations but does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about providing source code or a link to a code repository.
Open Datasets	Yes	We conduct an extensive evaluation on three datasets of different scales and distributions: Image Net-1k (Deng et al., 2009), CC12M (Changpinyo et al., 2021), and S320M: a large internal dataset of 320M stock images.
Dataset Splits	Yes	We evaluate metrics with respect to Image Net-1k and, for models trained on CC12M and S320M, the validation set of CC12M.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for running its experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	The paper mentions various models and algorithms used (e.g., DDPM-ϵ, DDIM, Florence-2) but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup	Yes	Unless specified otherwise, we follow the DDPM-ϵ training paradigm (Ho et al., 2020), using the DDIM (Song et al., 2021) algorithm with 50 steps for sampling and a classifier-free guidance scale of λ = 2.0 (Ho & Salimans, 2021). Following Podell et al. (2024), we use a quadratic scheduler with βstart = 0.00085 and βend = 0.012. ... we pre-train all models at 256 resolution on the dataset of interest for 600k iterations. We then enter a second phase of training, in which we optionally apply our perceptual loss, which lasts for 200k iterations for 256 resolution models and for 120k iterations for models at 512 resolution. ... we use a guidance scale of 1.5 for resolutions of 256 and 2.0 for resolutions of 512, which we also found to be optimal for our baseline models trained without LPL.