Taming Rectified Flow for Inversion and Editing

Authors: Jiangshan Wang, Junfu Pu, Zhongang Qi, Jiayi Guo, Yue Ma, Nisha Huang, Yuxin Chen, Xiu Li, Ying Shan

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across generation, inversion, and editing tasks in both image and video modalities demonstrate the superiority and versatility of our method.
Researcher Affiliation Collaboration 1Tsinghua University 2ARC Lab, Tencent PCG 3HKUST.
Pseudocode Yes A. Pesudo Code of RF-Solver Algorithm. Algorithm 1 Sampling process of RF-Solver
Open Source Code Yes Code is available at this URL .
Open Datasets Yes For image inversion, similar to Text-to-Image generation, we use images and captions from the MS-COCO validation set.
Dataset Splits No For image inversion, similar to Text-to-Image generation, we use images and captions from the MS-COCO validation set. For video inversion, we select about 40 videos from social media platforms such as Tik Tok and other publicly available sources. This specifies which datasets were used, but not how they were split for training/testing within their experimental framework, especially considering they use pre-trained models.
Hardware Specification No No specific hardware details (GPU models, CPU models, etc.) are mentioned in the paper.
Software Dependencies No We implement our method respectively on FLUX (Black-Forest-Labs, 2024) and Open Sora (Zheng et al., 2024). This lists frameworks but no specific version numbers for software dependencies.
Experiment Setup Yes The total NFE for generating one image is set to 10 for both our method and baselines.