Improving the Training of Rectified Flows

Authors: Sangyun Lee, Zinan Lin, Giulia Fanti

NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation shows that on several datasets (CIFAR-10 [Krizhevsky et al., 2009], Image Net 64 64 [Deng et al., 2009]), our improved rectified flow outperforms the state-of-the-art distillation methods such as consistency distillation (CD) [Song et al., 2023] and progressive distillation (PD) [Salimans and Ho, 2022] in both one-step and two-step settings, and it rivals the performance of the improved consistency training (i CT) [Song et al., 2023] in terms of the Frechet Inception Distance [Heusel et al., 2017] (FID). Our training techniques reduce the FID of the previous 2-rectified flow [Liu et al., 2022] by about 75% (12.21 3.07) on CIFAR-10. Ablations on three datasets show that the proposed techniques give a consistent and sizeable gain.
Researcher Affiliation Collaboration Sangyun Lee Carnegie Mellon University EMAIL Zinan Lin Microsoft Research EMAIL Giulia Fanti Carnegie Mellon University EMAIL
Pseudocode Yes Pseudocode for Reflow is provided in Algorithm. 1. Algorithm 2 shows the pseudocode for generating samples using the new update rule.
Open Source Code Yes Code is available at https://github.com/sangyun884/rfpp.
Open Datasets Yes Our evaluation shows that on several datasets (CIFAR-10 [Krizhevsky et al., 2009], Image Net 64 64 [Deng et al., 2009]), our improved rectified flow outperforms the state-of-the-art distillation methods such as consistency distillation (CD) [Song et al., 2023] and progressive distillation (PD) [Salimans and Ho, 2022] in both one-step and two-step settings, and it rivals the performance of the improved consistency training (i CT) [Song et al., 2023] in terms of the Frechet Inception Distance [Heusel et al., 2017] (FID). License The following are licenses for each dataset we use: CIFAR-10: Unknown FFHQ: CC BY-NC-SA 4.0 AFHQ: CC BY-NC 4.0 Image Net: Custom (research, non-commercial)
Dataset Splits No The paper mentions evaluating models and training details but does not provide explicit training/validation/test dataset splits (e.g., percentages or counts for each split). It refers to the 'entire training set' for FID calculation but does not specify a validation set split. 'For a two-step generation, we evaluate vθ at t = 0.99999 and t = 0.8.' refers to time steps, not data splits.
Hardware Specification Yes This work used Bridges-2 GPU at the Pittsburgh Supercomputing Center through allocation CIS240037 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants 2138259, 2138286, 2138307, 2137603, and 2138296 Boerner et al. [2023]. On Image Net, the training takes roughly 9 days with 64 NVIDIA V100 GPUs. On CIFAR-10 and FFHQ/AFHQ, it takes roughly 4 days with 16 and 8 V100 GPUs, respectively. For all cases, we use the NVIDIA DGX-2 cluster.
Software Dependencies No The paper mentions 'mixed-precision training [Micikevicius et al., 2017]' and implies the use of Adam optimizer, but it does not specify version numbers for key software components or libraries such as Python, PyTorch, or CUDA.
Experiment Setup Yes Table 7: Training configurations for each dataset. We linearly ramp up learning rates for all datasets. Datasets Batch size Dropout Learning rate Warm up iter. CIFAR-10 512 0.13 2e-4 5000 FFHQ / AFHQ 256 0.25 2e-4 5000 Image Net 2048 0.10 1e-4 2500 We use Adam optimizer. We use the exponential moving average (EMA) with 0.9999 decay rate for all datasets.