Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow

Authors: Fu-Yun Wang, Ling Yang, Zhaoyang Huang, Mengdi Wang, Hongsheng Li

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To empirically validate our claim, we conduct experiments using Stable Diffusion, comparing our approach with Insta Flow (Liu et al., 2023), a key baseline based on rectified flow for text-to-image generation. We adhere to the training setting of Insta Flow. Our results demonstrate apparently better performance and faster training, likely due to our minimal differences in diffusion configurations. Our one-step performance achieves significantly superior performance with only 8% trained images of Insta Flow as shown in Fig. 2.
Researcher Affiliation Academia 1 MMLab, CUHK, Hong Kong SAR 2 Peking University, Beijing, China 3 Princeton University, New Jersey, USA 4 CPII under Inno HK, Hong Kong SAR EMAIL, EMAIL, EMAIL EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Flow Matching v-Prediction Algorithm 2 Diffusion Training ϵ-Prediction Algorithm 3 Rectified Flow v-Prediction Algorithm 4 Rectified Diffusion ϵ-Prediction
Open Source Code Yes Our code is available at https://github.com/G-U-N/Rectified-Diffusion. REPRODUCIBILITY STATEMENT We have undertaken substantial efforts to ensure that the results in this paper are reproducible. The training and evaluation code, along with detailed guidance, is made available at https: //github.com/G-U-N/Rectified-Diffusion. Additionally, we have released the pretrained weights at https://huggingface.co/wangfuyun/Rectified-Diffusion.
Open Datasets Yes We validate our method on Stable Diffusion v1-5 and Stable Diffusion XL. We calculate the FID (Heusel et al., 2017) and CLIP scores (Radford et al., 2021) for different models on the COCO-2017 validation set (Lin et al., 2014) and the 30k subset of the COCO-2014 validation set (Lin et al., 2014), respectively. We additionally evaluate model performance on the Laion (Schuhmann, 2022) and CC3M (Changpinyo et al., 2021) subsets.
Dataset Splits Yes We calculate the FID (Heusel et al., 2017) and CLIP scores (Radford et al., 2021) for different models on the COCO-2017 validation set (Lin et al., 2014) and the 30k subset of the COCO-2014 validation set (Lin et al., 2014), respectively. We follow the testing setup of Diffusion DPO (Wallace et al., 2024), generating images with 500 unique prompts from the Pick-apic (Kirstain et al., 2023) validation set for comparison. Following the test setting of COCO-2017, we adopt the 5k subset for evaluation. Results of Stable Diffusion XL-based models are tested with COCO-2014 10k following the evaluation setting of DMDv2 (Yin et al., 2024b). Other results are tested with COCO-2014 30k following the karpathy split.
Hardware Specification Yes Insta Flow s total training time was 75.2 A100 GPU days, whereas our method required approximately 20 A800 GPU days. Typically, the training efficiency of an A800 is about 80% of that of an A100.
Software Dependencies No The paper mentions using Stable Diffusion v1-5 and Stable Diffusion XL, and references DPM-Solver (Lu et al., 2022), but does not explicitly list specific software libraries or frameworks (e.g., PyTorch, TensorFlow, CUDA) with version numbers within the main text.
Experiment Setup Yes During the training of Rectified Diffusion, we used a batch size of 128 for a total of 200,000 iterations, resulting in a total of 128 200, 000 = 25, 600, 000 samples processed. In comparison, Insta Flow processed 64 70, 000+1024 25, 000 = 30, 080, 000 samples. For the second-stage distillation, we employ consistency distillation training with a batch size of 512 for 10,000 iterations, consuming a total of 4.6 A800 GPU days. The CFG values are 1, 1.2, 1.5, 2.0 respectively. By default, we adopt CFG value 1.5 for both rectified diffusion and rectified flow.