reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Training-Free Diffusion Model Alignment with Sampling Demons

Authors: Po-Hung Yeh, Kuang-Huei Lee, Jun-Cheng Chen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide comprehensive theoretical and empirical evidence to support and validate our approach, including experiments that use non-differentiable sources of rewards such as Visual-Language Model (VLM) APIs and human judgements. ... In this section, we present both quantitative and qualitative evaluations of our methods.
Researcher Affiliation	Collaboration	Po-Hung Yeh1, Kuang-Huei Lee2, Jun-Cheng Chen1 1Academia Sinica, 2Google Deep Mind
Pseudocode	Yes	C PSEUDOCODES As an aid, we provide pseudocodes for the design of Demons Algorithm 2, Algorithm 3: Algorithm 1 A Numerical Step with Demon
Open Source Code	Yes	Implementation is available at https://github.com/aiiu-lab/Demon Sampling.
Open Datasets	Yes	We use the LAION (2023) aesthetics scores (Aes) as the evaluation metric, and the scores are evaluated on a set of various prompts for generating animal images, which were from the full set of 45 common animals in Image Net-1K (Deng et al., 2009), created by Black et al. (2023). ... For further comparison on Pick Score (Kirstain et al., 2023), please refer to Appendix E.1.
Dataset Splits	No	The paper mentions evaluating on "a set of various prompts for generating animal images, which were from the full set of 45 common animals in Image Net-1K (Deng et al., 2009), created by Black et al. (2023)." This describes the evaluation prompts and their source, but does not specify training/test/validation dataset splits used for the proposed method's experiments.
Hardware Specification	Yes	the Demon algorithm achieves an aesthetics score of 6.72 0.26 on SD v1.4, requiring 5 minutes (i.e., K = 16, T = 16) on an NVIDIA RTX 3090 GPU. ... Due to memory limitations, DOODL was run on an Nvidia RTX A6000, which is slightly slower (0.92x) than the RTX 3090 used for the other experiments.
Software Dependencies	No	The paper mentions using "Stable Diffusion v1.4/v1.5/XL v1.0" and refers to "fp16 SD v1.4/SDXL v1.0 for generation", as well as "Heun's method" and the "SDE formulation proposed in EDM Karras et al. (2022)". However, it does not provide specific version numbers for software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup	Yes	The classifier-free guidance parameter is set to 2 throughout this paper. Across all temporal steps t of image generation, we keep K and β constant. ... The hyperparameters for generation are set to β = 0.5, K = 16, η = 2 and τ adaptive for Tanh, 10 5 for Boltzmann. ... the batch size for solving ODE/SDE is 8 for both Stable Diffusion v1.4, v1.5, and SDXL models. However, due to memory limitations on the RTX 3090, the batch size for evaluating the VAE in SDXL is restricted to 1.