reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

S4S: Solving for a Fast Diffusion Model Solver

Authors: Eric Frankel, Sitan Chen, Jerry Li, Pang Wei Koh, Lillian J. Ratliff, Sewoong Oh

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate S4S on six different pre-trained DMs, including pixel-space and latent-space DMs for both conditional and unconditional sampling. In all settings, S4S uniformly improves the sample quality relative to traditional ODE solvers. Moreover, our method is lightweight, data-free, and can be plugged in black-box on top of any discretization schedule or architecture to improve performance. Building on top of this, we also propose S4S-Alt, which optimizes both the solver and the discretization schedule. ... In our experiments, we demonstrate that in every setting we tried, our method universally improves the FID achieved compared to previous state-of-the-art solvers.
Researcher Affiliation	Collaboration	1University of Washington 2Harvard University 3Allen Institute for AI. Correspondence to: Eric Frankel <EMAIL>.
Pseudocode	Yes	Algorithm 1 S4S ... Algorithm 2 S4S-Alt ... Algorithm 3 Joint Optimization Algorithm
Open Source Code	No	The paper does not provide an explicit statement about releasing its source code, nor does it include a link to a code repository for the methodology described.
Open Datasets	Yes	We use pixel-space diffusion models for CIFAR-10 (32x32), FFHQ (64x64), and AFHQv2 (64x64), each having an EDM-style backbone (Karras et al., 2022). We also use latent diffusion models, including LSUN-Bedroom (256x256) and class-conditional Image Net (256x256) with a guidance scale of 2.0.
Dataset Splits	No	The paper mentions using 30k samples generated from MS-COCO captions for evaluating Stable Diffusion and 50k samples for all other datasets, but it does not specify explicit training, validation, or test dataset splits needed for reproducibility. While using well-known datasets, it doesn't state how they were partitioned for training/validation.
Hardware Specification	Yes	Our method is lightweight, with minimal computational expense which is comparable to (and often less than) alternative methods for optimizing aspects of the solver, often taking less than an hour on a single A100. ... Table 12: S4S-Alt 7 2.52 A100 < 1 hour
Software Dependencies	No	The paper mentions optimizers like SGD or Adam and the LPIPS distance metric, but it does not provide specific software names with version numbers for its implementation, such as Python, PyTorch, or CUDA versions.
Experiment Setup	Yes	For ease of notation, we first ground our explanation in the version of S4S that only learns coefficients before discussing details specific to our S4S-Alt. We direct explicit queries about hyperparameters, etc. to Appendix G.2. ... Most crucial is our choice of r when optimizing our relaxed objective in both S4S and S4S-Alt. Let m denote the total number of parameters learned in the student solver. Then in both S4S and S4S-Alt, we set r ~ 1/m^(5/2). ... In practice, for CIFAR-10, FFHQ, and AFHQv2, we use 700 samples for learning coefficients in S4S with a batch size of 20; when learning coefficients and time steps in S4S-Alt, we generally use 1400 samples as training data with a batch size of 40. ... In both settings, we run S4S for 10 epochs, and S4S-Alt for K=8 alternating steps.