Text-to-Image Rectified Flow as Plug-and-Play Priors
Authors: Xiaofeng Yang, Cheng Chen, xulei yang, Fayao Liu, Guosheng Lin
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct extensive experiments to compare our proposed methods with diffusionbased approaches. We perform experiments on two types of rectified flow-based methods: Stable Diffusion v3 (SD3), trained with flow-matching but without Reflow finetuning, and Insta Flow, finetuned with Reflow. |
| Researcher Affiliation | Academia | 1College of Computing and Data Science, Nanyang Technological University, Singapore 2Institute for Infocomm Research, A*STAR, Singapore |
| Pseudocode | Yes | Algorithm 1: The RFDS-Rev Algorithm. Algorithm 2: The RFDS Algorithm. Algorithm 3: The i RFDS Algorithm. |
| Open Source Code | Yes | Code is available at: https://github.com/yangxiaofeng/rectified flow prior. |
| Open Datasets | Yes | We also conduct quantitative experiments on the text-to-3D benchmark T3Bench (He et al., 2023). The dataset contains 300 text prompts for text-to-3D generation, making it the largest text-to-3D benchmark available. |
| Dataset Splits | No | The paper mentions using the T3Bench dataset and 15 real images, but does not provide specific training/test/validation splits, percentages, or predefined split references for reproducibility. |
| Hardware Specification | Yes | The experiments are carried out on NVIDIA A6000 GPUs. |
| Software Dependencies | No | The paper mentions using "Stable Diffusion v3", "Insta Flow", and the "Threestudio codebase" but does not specify version numbers for these or other key software components like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Each 3D model is optimized for 15000 steps. We use a CFG of 50 for all 3D experiments and 2D toy experiments. The model is optimized with a resolution of 256 for the first 5000 steps and then 500 for the final 10000 steps. The inversion starts from a randomly sampled Gaussian noise. We optimize the noise for 1000 steps using i RFDS and CFG 1 with a learning rate of 3*10-3. |