DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image Diffusion Models

Authors: Daewon Chae, June Suk Choi, Jinkyu Kim, Kimin Lee

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments to verify that Diff Exp improves both sample efficiency and generated image quality, and demonstrate this across various reward fine-tuning methods such as DDPO or Align Prop. Furthermore, we conduct analysis using more advanced prompt sets such as Draw Bench, and apply our method to the more advanced diffusion models such as SDXL, both of which result in significant performance improvements.
Researcher Affiliation Academia 1Korea University, South Korea 2KAIST, South Korea EMAIL, w EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods using mathematical equations and descriptive text but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for Diff Exp will be made publicly available.
Open Datasets Yes First, we utilize an Aesthetic Score (Schuhmann et al. 2022), which is trained to predict the aesthetic quality of images. Following the baseline (Black et al. 2024; Prabhudesai et al. 2023), we use 45 animal names as training prompts for the aesthetic quality task. Second, in order to improve image-text alignment, we employ Pick Score (Kirstain et al. 2024), an open-source reward model trained on a large-scale human feedback dataset. Based on the baseline (Black et al. 2024), we use a total of 135 prompts for the image-text alignment task, combining 45 different animal names with 3 different activities (e.g., a monkey washing the dishes ). We provide the entire set of prompts used for training in the supplementary materials.
Dataset Splits No The paper mentions using "45 animal names as training prompts" and "a total of 135 prompts for the image-text alignment task", and a "novel test set of animal names" and "58 challenging prompts from Draw Bench". However, it does not provide specific details on how these prompts were split into training/validation/test sets for image-text pairs, nor does it specify percentages or sample counts for these splits beyond the prompt counts themselves.
Hardware Specification No The paper does not contain specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions "Stable Diffusion v1.5" and "Low-Rank Adaptation (Lo RA)" but does not provide specific version numbers for these software components or any other libraries/frameworks.
Experiment Setup Yes As for scheduling exploration, we apply our exploration method only up to the three-fourths of the entire fine-tuning. We set wl to an extremely low value and wh to an ordinary CFG value (i.e., 5.0 or 7.5). This dynamic scheduling adaptively balances between image quality and diversity, allowing for generating diverse image samples without sacrificing overall sample quality. ... We find that sampling wprompt randomly from U(1, 1.2) every time is generally successful. ... Further, we experiment with different values of hyper-parameter, tthres, which determines how long the CFG scale should be maintained to a low value. In Figure 9 (b), we provide reward curves for variants of our models with tthres = {900, 800, 700}.