reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization

Authors: Zichen Miao, Zhengyuan Yang, Kevin Lin, Ze Wang, Zicheng Liu, Lijuan Wang, Qiang Qiu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate PSO in both preference optimization and other fine-tuning tasks, including style transfer and concept customization. We show that PSO can directly adapt distilled models to human-preferred generation with both offline and online-generated pairwise preference image data. PSO also demonstrates effectiveness in style transfer and concept customization by directly tuning timestep-distilled diffusion models. Extensive of experiments demonstrate the effectiveness of PSO in human preference tuning, as well as other domain transfer tasks such as style transfer and concept customization.
Researcher Affiliation	Collaboration	Zichen Miao1, Zhengyuan Yang2, Kevin Lin2, Ze Wang3 Zicheng Liu3, Lijuan Wang2, Qiang Qiu1 1Purdue University, 2Microsoft, 3AMD
Pseudocode	Yes	We provide Alg. 1 for better illustration. We provide Alg. 2 for better illustration. We provide Alg. 3 for better illustration.
Open Source Code	Yes	The code is provided at: https: //github.com/Zichen Miao/Pairwise_Sample_Optimization.
Open Datasets	Yes	For human preference tuning, we benchmark PSO on the standard Pick-a-Pic (Kirstain et al., 2023) and Parti Prompts (Yu et al., 2022) datasets. We also evaluate distilled model fine-tuning by experimenting the image style transfer task with the Pokemon dataset (Pinkney, 2022) and the concept customization task with the exeamples in Dreambooth (Ruiz et al., 2023). We adopt the Pick-a-Pic v2 (Kirstain et al., 2023) dataset for the offline human preference tuning task following Wallace et al. (2024).
Dataset Splits	Yes	After removing 12% pairs with tied preference, we obtain 850K win-lose pairs, which are used as data-reference pairs within our Offline-PSO objective as in Eq. 6. For the online human preference setting, we use a subset of 4K prompts from Pick-a-Pic training prompts as the training prompts... We use the Pick-a-Pic test set and Parti Prompts (Yu et al., 2022) as the evaluation benchmarks... They contain 500 and 1632 prompts respectively.
Hardware Specification	Yes	cf., 3840 A100 GPU hours for SDXL-DMD2 (Yin et al., 2024a) and 0.25 A100 GPU hours for Dreambooth concept customization (Ruiz et al., 2023))
Software Dependencies	No	The paper mentions software components like LoRA (Hu et al., 2021) as a parameter-efficient fine-tuning method but does not specify any version numbers for libraries, frameworks, or operating systems used in the implementation.
Experiment Setup	Yes	We use Lo RA (Hu et al., 2021) to fine-tune all the distilled diffusion models efficiently, we set Lo RA rank r = 16 for SDXL-DMD2 and SDXL-Turbo, and r = 32 for SDXL-LCM in both online and offline human preference tuning experiments. As for other training hyperparameters, we set the number of training distilled steps N = 4, which is the same as the number of sampling steps of these distilled models, and we set the regularization weight β = 50 for Offline preference tuning, and β = 5 for Online preference tuning. We set the batch size to 64 and the learning rate to 1e 5 for offline experiments, and train for 5k steps. For online preference tuning, we first sample 128 pairs of images with 128 training prompts, which are further labeled as reference and target with Pick Score (Kirstain et al., 2023) with a batch size of 64, and then we conduct online PSO on the sampled pairs for 1 epoch with a batch size of 32 and learning rate 1e 5 and train for 20k steps.