FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner
Authors: Wenliang Zhao, Minglei Shi, Xumin Yu, Jie Zhou, Jiwen Lu
NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experiments to evaluate our method. By applying Flow Turbo to different flow-based models, we obtain an acceleration ratio of 53.1% 58.3% on class-conditional generation and 29.8% 38.5% on text-to-image generation. |
| Researcher Affiliation | Academia | Wenliang Zhao Department of Automation Tsinghua University EMAIL Minglei Shi Department of Automation Tsinghua University EMAIL Xumin Yu Department of Automation Tsinghua University EMAIL Jie Zhou Department of Automation Tsinghua University EMAIL Jiwen Lu Department of Automation Tsinghua University EMAIL |
| Pseudocode | Yes | Algorithm 1 Heun s Method Sampler and Algorithm 2 Pseudo Corrector Sampler in Appendix B. |
| Open Source Code | Yes | Code is available at https://github.com/shiml20/Flow Turbo. |
| Open Datasets | Yes | For class-conditional image generation, we adopt a transformer-style flow-based model Si T-XL [24] pre-trained on Image Net 256 256. We use Image Net-1K [6]2 to train our velocity model. We use a subset of LAION [34]3 containing only 50K images to train our velocity model. |
| Dataset Splits | No | The paper mentions using 'MS COCO 2017 [16] validation set' for FID calculation but does not explicitly state the train/validation splits used for training its own models or components. |
| Hardware Specification | Yes | In both tasks, we use a single NVIDIA A800 GPU to train the velocity refiner and find it converges within 6 hours. We use a batch size of 8 on a single A800 GPU to measure the latency of each method. |
| Software Dependencies | No | Our code is implemented in Py Torch 6. (The '6' is a footnote, not a version. No specific PyTorch version or other library versions are mentioned.) |
| Experiment Setup | Yes | Following common practice [24, 30], we adopt a classifier-free guidance scale (CFG) of 1.5. During training, we randomly sample t (0, 0.12] and compute the training objectives in (13). We use Adam W [21] optimizer for all models. We use a constant learning rate of 5 10 5 and a batch size of 18 on a single A800 GPU. We use Adam W [21] optimizer with a learning rate of 2e-5 and weight decay of 0.0. We adopt a batch size of 16 and set the warming-up steps as 100. We also use a gradient clipping of 0.01 to stabilize training. |