reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Diffusion Model Patching via Mixture-of-Prompts

Authors: Seokil Ham, Sangmin Woo, Jin-Young Kim, Hyojun Go, Byeongjun Park, Changick Kim

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the effectiveness of DMP on various image generation tasks using already converged pre-trained diffusion models. Unlike conventional fine-tuning or prompt tuning, the original dataset from the pre-training phase is used for further training. We evaluated image quality using FID (Heusel et al. 2017) score, which measures the distance between feature representations of generated and real images using an Inception-v3 model (Szegedy et al. 2016).
Researcher Affiliation	Collaboration	1KAIST 2Twelve Labs 1EMAIL, 2EMAIL
Pseudocode	No	The paper describes the Diffusion Model Patching (DMP) method using textual descriptions and mathematical equations (Eq. 1, 2, 3), along with architectural diagrams (Fig. 2, 3). However, it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	Project Page https://sangminwoo.github.io/DMP/
Open Datasets	Yes	We used three datasets for our experiments: (1) FFHQ (Karras, Laine, and Aila 2019) (for unconditional image generation) contains 70,000 training images of human faces. (2) MS-COCO (Lin et al. 2014) (for textto-image generation) includes 82,783 training images and 40,504 validation images, each annotated with 5 descriptive captions. (3) Laion5B (Schuhmann et al. 2022) (for Stable Diffusion) consists of 5.85B image-text pairs, which is known to be used to train Stable Diffusion (Rombach et al. 2022).
Dataset Splits	Yes	(1) FFHQ (Karras, Laine, and Aila 2019) (for unconditional image generation) contains 70,000 training images of human faces. (2) MS-COCO (Lin et al. 2014) (for textto-image generation) includes 82,783 training images and 40,504 validation images, each annotated with 5 descriptive captions.
Hardware Specification	No	The paper does not contain any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup	Yes	Notably, DMP significantly enhances the FID of converged Di T-L/2 by 10.38% on FFHQ, achieved with only a 1.43% parameter increase and 50K additional training iterations. ... To ensure stable further training of a pre-trained diffusion model, we start by zero-initializing the prompts. ... Prompt balancing loss. We adopt two soft constraints from Shazeer et al. (Shazeer et al. 2017) to balance the activation of mixtures-of-prompts. (1) Load balancing: ... (2) Importance balancing: ...