Training-Free Diffusion Model Alignment with Sampling Demons
Authors: Po-Hung Yeh, Kuang-Huei Lee, Jun-Cheng Chen
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide comprehensive theoretical and empirical evidence to support and validate our approach, including experiments that use non-differentiable sources of rewards such as Visual-Language Model (VLM) APIs and human judgements. ... In this section, we present both quantitative and qualitative evaluations of our methods. |
| Researcher Affiliation | Collaboration | Po-Hung Yeh1, Kuang-Huei Lee2, Jun-Cheng Chen1 1Academia Sinica, 2Google Deep Mind |
| Pseudocode | Yes | C PSEUDOCODES As an aid, we provide pseudocodes for the design of Demons Algorithm 2, Algorithm 3: Algorithm 1 A Numerical Step with Demon |
| Open Source Code | Yes | Implementation is available at https://github.com/aiiu-lab/Demon Sampling. |
| Open Datasets | Yes | We use the LAION (2023) aesthetics scores (Aes) as the evaluation metric, and the scores are evaluated on a set of various prompts for generating animal images, which were from the full set of 45 common animals in Image Net-1K (Deng et al., 2009), created by Black et al. (2023). ... For further comparison on Pick Score (Kirstain et al., 2023), please refer to Appendix E.1. |
| Dataset Splits | No | The paper mentions evaluating on "a set of various prompts for generating animal images, which were from the full set of 45 common animals in Image Net-1K (Deng et al., 2009), created by Black et al. (2023)." This describes the evaluation prompts and their source, but does not specify training/test/validation dataset splits used for the proposed method's experiments. |
| Hardware Specification | Yes | the Demon algorithm achieves an aesthetics score of 6.72 0.26 on SD v1.4, requiring 5 minutes (i.e., K = 16, T = 16) on an NVIDIA RTX 3090 GPU. ... Due to memory limitations, DOODL was run on an Nvidia RTX A6000, which is slightly slower (0.92x) than the RTX 3090 used for the other experiments. |
| Software Dependencies | No | The paper mentions using "Stable Diffusion v1.4/v1.5/XL v1.0" and refers to "fp16 SD v1.4/SDXL v1.0 for generation", as well as "Heun's method" and the "SDE formulation proposed in EDM Karras et al. (2022)". However, it does not provide specific version numbers for software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | The classifier-free guidance parameter is set to 2 throughout this paper. Across all temporal steps t of image generation, we keep K and β constant. ... The hyperparameters for generation are set to β = 0.5, K = 16, η = 2 and τ adaptive for Tanh, 10 5 for Boltzmann. ... the batch size for solving ODE/SDE is 8 for both Stable Diffusion v1.4, v1.5, and SDXL models. However, due to memory limitations on the RTX 3090, the batch size for evaluating the VAE in SDXL is restricted to 1. |