Diffusion Model Patching via Mixture-of-Prompts
Authors: Seokil Ham, Sangmin Woo, Jin-Young Kim, Hyojun Go, Byeongjun Park, Changick Kim
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the effectiveness of DMP on various image generation tasks using already converged pre-trained diffusion models. Unlike conventional fine-tuning or prompt tuning, the original dataset from the pre-training phase is used for further training. We evaluated image quality using FID (Heusel et al. 2017) score, which measures the distance between feature representations of generated and real images using an Inception-v3 model (Szegedy et al. 2016). |
| Researcher Affiliation | Collaboration | 1KAIST 2Twelve Labs 1EMAIL, 2EMAIL |
| Pseudocode | No | The paper describes the Diffusion Model Patching (DMP) method using textual descriptions and mathematical equations (Eq. 1, 2, 3), along with architectural diagrams (Fig. 2, 3). However, it does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Project Page https://sangminwoo.github.io/DMP/ |
| Open Datasets | Yes | We used three datasets for our experiments: (1) FFHQ (Karras, Laine, and Aila 2019) (for unconditional image generation) contains 70,000 training images of human faces. (2) MS-COCO (Lin et al. 2014) (for textto-image generation) includes 82,783 training images and 40,504 validation images, each annotated with 5 descriptive captions. (3) Laion5B (Schuhmann et al. 2022) (for Stable Diffusion) consists of 5.85B image-text pairs, which is known to be used to train Stable Diffusion (Rombach et al. 2022). |
| Dataset Splits | Yes | (1) FFHQ (Karras, Laine, and Aila 2019) (for unconditional image generation) contains 70,000 training images of human faces. (2) MS-COCO (Lin et al. 2014) (for textto-image generation) includes 82,783 training images and 40,504 validation images, each annotated with 5 descriptive captions. |
| Hardware Specification | No | The paper does not contain any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | Notably, DMP significantly enhances the FID of converged Di T-L/2 by 10.38% on FFHQ, achieved with only a 1.43% parameter increase and 50K additional training iterations. ... To ensure stable further training of a pre-trained diffusion model, we start by zero-initializing the prompts. ... Prompt balancing loss. We adopt two soft constraints from Shazeer et al. (Shazeer et al. 2017) to balance the activation of mixtures-of-prompts. (1) Load balancing: ... (2) Importance balancing: ... |