Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Authors: Zichen Miao, Zhengyuan Yang, Kevin Lin, Ze Wang, Zicheng Liu, Lijuan Wang, Qiang Qiu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate PSO in both preference optimization and other fine-tuning tasks, including style transfer and concept customization. We show that PSO can directly adapt distilled models to human-preferred generation with both offline and online-generated pairwise preference image data. PSO also demonstrates effectiveness in style transfer and concept customization by directly tuning timestep-distilled diffusion models. Extensive of experiments demonstrate the effectiveness of PSO in human preference tuning, as well as other domain transfer tasks such as style transfer and concept customization. |
| Researcher Affiliation | Collaboration | Zichen Miao1, Zhengyuan Yang2, Kevin Lin2, Ze Wang3 Zicheng Liu3, Lijuan Wang2, Qiang Qiu1 1Purdue University, 2Microsoft, 3AMD |
| Pseudocode | Yes | We provide Alg. 1 for better illustration. We provide Alg. 2 for better illustration. We provide Alg. 3 for better illustration. |
| Open Source Code | Yes | The code is provided at: https: //github.com/Zichen Miao/Pairwise_Sample_Optimization. |
| Open Datasets | Yes | For human preference tuning, we benchmark PSO on the standard Pick-a-Pic (Kirstain et al., 2023) and Parti Prompts (Yu et al., 2022) datasets. We also evaluate distilled model fine-tuning by experimenting the image style transfer task with the Pokemon dataset (Pinkney, 2022) and the concept customization task with the exeamples in Dreambooth (Ruiz et al., 2023). We adopt the Pick-a-Pic v2 (Kirstain et al., 2023) dataset for the offline human preference tuning task following Wallace et al. (2024). |
| Dataset Splits | Yes | After removing 12% pairs with tied preference, we obtain 850K win-lose pairs, which are used as data-reference pairs within our Offline-PSO objective as in Eq. 6. For the online human preference setting, we use a subset of 4K prompts from Pick-a-Pic training prompts as the training prompts... We use the Pick-a-Pic test set and Parti Prompts (Yu et al., 2022) as the evaluation benchmarks... They contain 500 and 1632 prompts respectively. |
| Hardware Specification | Yes | cf., 3840 A100 GPU hours for SDXL-DMD2 (Yin et al., 2024a) and 0.25 A100 GPU hours for Dreambooth concept customization (Ruiz et al., 2023)) |
| Software Dependencies | No | The paper mentions software components like LoRA (Hu et al., 2021) as a parameter-efficient fine-tuning method but does not specify any version numbers for libraries, frameworks, or operating systems used in the implementation. |
| Experiment Setup | Yes | We use Lo RA (Hu et al., 2021) to fine-tune all the distilled diffusion models efficiently, we set Lo RA rank r = 16 for SDXL-DMD2 and SDXL-Turbo, and r = 32 for SDXL-LCM in both online and offline human preference tuning experiments. As for other training hyperparameters, we set the number of training distilled steps N = 4, which is the same as the number of sampling steps of these distilled models, and we set the regularization weight β = 50 for Offline preference tuning, and β = 5 for Online preference tuning. We set the batch size to 64 and the learning rate to 1e 5 for offline experiments, and train for 5k steps. For online preference tuning, we first sample 128 pairs of images with 128 training prompts, which are further labeled as reference and target with Pick Score (Kirstain et al., 2023) with a batch size of 64, and then we conduct online PSO on the sampled pairs for 1 epoch with a batch size of 32 and learning rate 1e 5 and train for 20k steps. |