Reward Guided Latent Consistency Distillation

Authors: Jiachen Li, Weixi Feng, Wenhu Chen, William Yang Wang

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform thorough experiments to demonstrate the effectiveness of our RG-LCD. Sec. 5.1 conducts human evaluation to compare the performance of our methods with baselines. Sec. 5.2 further increases the experiment scales to experiment with a wider array of RMs with automatic metrics. By connecting both evaluation results, we identify problems with the current RMs. Finally, Sec. 5.3 conducts ablation studies on critical design choices.
Researcher Affiliation Academia Jiachen Li EMAIL University of California, Santa Barbara Weixi Feng EMAIL University of California, Santa Barbara Wenhu Chen EMAIL University of Waterloo William Yang Wang EMAIL University of California, Santa Barbara
Pseudocode Yes Appendix B includes pseudo-codes for our RG-LCD training. Appendix B.3 includes pseudo-codes for training our RG-LCM with an LRM in Algorithm 4. Algorithm 1 Multistep Consistency Sampling Algorithm 2 Multistep Latent Consistency Sampling Algorithm 3 Reward Guided Latent Consistency Distillation Algorithm 4 Reward Guided Latent Consistency Distillation with a Latent Proxy RM
Open Source Code Yes Project Page: https://rg-lcd.github.io/
Open Datasets Yes Our training are conducted on the CC12M datasets (Changpinyo et al., 2021)... contributing to both improved Fréchet Inception Distance (FID) on MS-COCO (Lin et al., 2014) and a higher HPSv2.1 score on HPSv2 (Wu et al., 2023a) s test set...
Dataset Splits No Our training are conducted on the CC12M datasets (Changpinyo et al., 2021)... We follow a similar evaluation protocol as in (Wallace et al., 2023a) to generate images by conditioning on prompts from Partiprompt (Yu et al., 2022) (1632 prompts) and of HPSv2 s test set (Wu et al., 2023a) (3200 prompts).
Hardware Specification Yes We distill our LCM from the Stable Diffusion-v2.1 (Rombach et al., 2022) by training for 10K iterations on 8 NVIDIA A100 GPUs without gradient accumulation
Software Dependencies No We follow the hyperparameter settings listed in the diffusers (von Platen et al., 2022) library by setting learning rate 1e 6, EMA rate µ = 0.95 and the guidance scale range [ωmin, ωmax] = [5, 15].
Experiment Setup Yes We distill our LCM from the Stable Diffusion-v2.1 (Rombach et al., 2022) by training for 10K iterations on 8 NVIDIA A100 GPUs without gradient accumulation and set the batch size to reach the maximum capacity of our GPUs. We follow the hyperparameter settings listed in the diffusers (von Platen et al., 2022) library by setting learning rate 1e 6, EMA rate µ = 0.95 and the guidance scale range [ωmin, ωmax] = [5, 15]. As mentioned in Sec. 3.3, we use DDIM (Song et al., 2020a) as our ODE solver Ψ with a skipping step k = 20. Table 2: β for different RG-LCMs when training with different RMs.