reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

One Step Diffusion via Shortcut Models

Authors: Kevin Frans, Danijar Hafner, Sergey Levine, Pieter Abbeel

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations display that shortcut models satisfy a number of useful desiderata. On the commonly used Celeb A-HQ and Imagenet-256 benchmarks, a single shortcut model can handle many-step, few-step, and one-step generation. Accuracy is not sacrificed in fact, many-step generation quality matches those of baseline diffusion models. At the same time, shortcut models can consistently match or outperform two-stage distillation methods in the fewand one-step settings.
Researcher Affiliation	Academia	Kevin Frans UC Berkeley EMAIL Danijar Hafner UC Berkeley Sergey Levine UC Berkeley Pieter Abbeel UC Berkeley
Pseudocode	Yes	Algorithm 1 Shortcut Model Training Algorithm 2 Sampling
Open Source Code	Yes	We release model checkpoints and the full training code for replicating our experimental results: https://github.com/kvfrans/shortcut-models
Open Datasets	Yes	On the commonly used Celeb A-HQ and Imagenet-256 benchmarks, a single shortcut model can handle many-step, few-step, and one-step generation.
Dataset Splits	Yes	We report the FID-50k metric, as is standard in prior work. Following standard practice, FID is calculated with respect to statistics over the entire dataset, no compression is applied to the generated images, and images are resized to 299x299 with bilinear upscaling and clipped to (-1, 1).
Hardware Specification	Yes	All experiments are run on TPUv3 nodes, and methods are implemented in JAX.
Software Dependencies	No	The paper mentions that methods are implemented in JAX, but does not provide a specific version number for JAX or any other software libraries used.
Experiment Setup	Yes	Table 3: Hyperparameters used during training. Model architecture follows that described in Peebles & Xie (2023), specifically Di T-B unless mentioned otherwise. Batch Size 64 (Celeb A-HQ), 256 (Imagenet) Training Steps 400,000 (Celeb A-HQ) 800,000 (Imagenet) Latent Encoder sd-vae-mse-ft Latent Downsampling 8 (256x256x3 to 32x32x4) Ratio of Empirical to Bootstrap Targets 0.75 Number of Total Denoising Steps (M) 128 Classifier Free Guidance 0 (Celeb A-HQ), 1.5 (Imagenet) EMA Parameters Used For Bootstrap Targets? Yes EMA Parameters Used For Evaluation? Yes EMA Ratio 0.999 Optimizer Adam W Learning Rate 0.0001 Weight Decay 0.1 Hidden Size 768 Patch Size 2 Number of Layers 12 Attention Heads 12 MLP Hidden Size Ratio 4