Diffusion Model Patching via Mixture-of-Prompts

Authors: Seokil Ham, Sangmin Woo, Jin-Young Kim, Hyojun Go, Byeongjun Park, Changick Kim

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the effectiveness of DMP on various image generation tasks using already converged pre-trained diffusion models. Unlike conventional fine-tuning or prompt tuning, the original dataset from the pre-training phase is used for further training. We evaluated image quality using FID (Heusel et al. 2017) score, which measures the distance between feature representations of generated and real images using an Inception-v3 model (Szegedy et al. 2016).
Researcher Affiliation Collaboration 1KAIST 2Twelve Labs 1EMAIL, 2EMAIL
Pseudocode No The paper describes the Diffusion Model Patching (DMP) method using textual descriptions and mathematical equations (Eq. 1, 2, 3), along with architectural diagrams (Fig. 2, 3). However, it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No Project Page https://sangminwoo.github.io/DMP/
Open Datasets Yes We used three datasets for our experiments: (1) FFHQ (Karras, Laine, and Aila 2019) (for unconditional image generation) contains 70,000 training images of human faces. (2) MS-COCO (Lin et al. 2014) (for textto-image generation) includes 82,783 training images and 40,504 validation images, each annotated with 5 descriptive captions. (3) Laion5B (Schuhmann et al. 2022) (for Stable Diffusion) consists of 5.85B image-text pairs, which is known to be used to train Stable Diffusion (Rombach et al. 2022).
Dataset Splits Yes (1) FFHQ (Karras, Laine, and Aila 2019) (for unconditional image generation) contains 70,000 training images of human faces. (2) MS-COCO (Lin et al. 2014) (for textto-image generation) includes 82,783 training images and 40,504 validation images, each annotated with 5 descriptive captions.
Hardware Specification No The paper does not contain any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup Yes Notably, DMP significantly enhances the FID of converged Di T-L/2 by 10.38% on FFHQ, achieved with only a 1.43% parameter increase and 50K additional training iterations. ... To ensure stable further training of a pre-trained diffusion model, we start by zero-initializing the prompts. ... Prompt balancing loss. We adopt two soft constraints from Shazeer et al. (Shazeer et al. 2017) to balance the activation of mixtures-of-prompts. (1) Load balancing: ... (2) Importance balancing: ...