FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner

Authors: Wenliang Zhao, Minglei Shi, Xumin Yu, Jie Zhou, Jiwen Lu

NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive experiments to evaluate our method. By applying Flow Turbo to different flow-based models, we obtain an acceleration ratio of 53.1% 58.3% on class-conditional generation and 29.8% 38.5% on text-to-image generation.
Researcher Affiliation Academia Wenliang Zhao Department of Automation Tsinghua University EMAIL Minglei Shi Department of Automation Tsinghua University EMAIL Xumin Yu Department of Automation Tsinghua University EMAIL Jie Zhou Department of Automation Tsinghua University EMAIL Jiwen Lu Department of Automation Tsinghua University EMAIL
Pseudocode Yes Algorithm 1 Heun s Method Sampler and Algorithm 2 Pseudo Corrector Sampler in Appendix B.
Open Source Code Yes Code is available at https://github.com/shiml20/Flow Turbo.
Open Datasets Yes For class-conditional image generation, we adopt a transformer-style flow-based model Si T-XL [24] pre-trained on Image Net 256 256. We use Image Net-1K [6]2 to train our velocity model. We use a subset of LAION [34]3 containing only 50K images to train our velocity model.
Dataset Splits No The paper mentions using 'MS COCO 2017 [16] validation set' for FID calculation but does not explicitly state the train/validation splits used for training its own models or components.
Hardware Specification Yes In both tasks, we use a single NVIDIA A800 GPU to train the velocity refiner and find it converges within 6 hours. We use a batch size of 8 on a single A800 GPU to measure the latency of each method.
Software Dependencies No Our code is implemented in Py Torch 6. (The '6' is a footnote, not a version. No specific PyTorch version or other library versions are mentioned.)
Experiment Setup Yes Following common practice [24, 30], we adopt a classifier-free guidance scale (CFG) of 1.5. During training, we randomly sample t (0, 0.12] and compute the training objectives in (13). We use Adam W [21] optimizer for all models. We use a constant learning rate of 5 10 5 and a batch size of 18 on a single A800 GPU. We use Adam W [21] optimizer with a learning rate of 2e-5 and weight decay of 0.0. We adopt a batch size of 16 and set the warming-up steps as 100. We also use a gradient clipping of 0.01 to stabilize training.