FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing

Authors: Yingying Deng, Xiangyu He, Changwang Mei, Peisong Wang, Fan Tang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiment 5.1. Implementation Details Baselines: This section compares Fire Flow with DM inversion-based editing methods such as Prompt-to-Prompt (Hertz et al., 2022) , Masa Ctrl (Cao et al., 2023), Pix2Pixzero (Parmar et al., 2023), Plug-and-Play (Tumanyan et al., 2023), Diff Edit (Couairon et al., 2023) and Direct Inversion (Ju et al., 2024). We also consider the recent RF inversion methods, such as RF-Inversion (Rout et al., 2025) and RF-Solver (Wang et al., 2024). Metrics: We evaluate different methods across three aspects: generation quality, text-guided quality, and preservation quality. The Fr echet Inception Distance (FID) (Heusel et al., 2017) is used to measure image generation quality by comparing the generated images to real ones. A CLIP model (Radford et al., 2021) is used to calculate the similarity between the generated image and the guiding text. To assess the preservation quality of non-edited areas, we use metrics including Learned Perceptual Image Patch Similarity (LPIPS) (Zhang et al., 2018), Structural Similarity Index Measure (SSIM) (Wang et al., 2004), Peak Signal-to-Noise Ratio (PSNR), and structural distance (Ju et al., 2024). [...] 5.4. Inversion-based Semantic Image Editing Quantitative Comparison: We evaluate prompt-guided editing using the recent PIE-Bench dataset (Ju et al., 2024), which comprises 700 images across 10 types of edits. As shown in Table 4, we compare the editing performance in terms of preservation ability and CLIP similarity. Our method not only competes with but often outperforms other approaches, particularly in CLIP similarity. [...] E. Ablation Study Editing Steps: An ablation study was conducted to evaluate the influence of the number of editing steps on the editing performance. The number of steps was varied from 2 to 12, and the corresponding results are presented in Figure 8.
Researcher Affiliation Academia 1University of Science and Technology Beijing, Beijing, China 2Institute of Automation, Chinese Academy of Sciences, Beijing, China 3Nanjing University of Science and Technology, Nanjing, China 4Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. Correspondence to: Xiangyu He <EMAIL>.
Pseudocode Yes A. The Pseudo-code for Inversion and Editing Algorithm 1 Solving Re Flow Inversion ODE [...] Algorithm 2 Solving Re Flow Denoising ODE (Editing) [...] D. Python-style Pseudo-Code
Open Source Code No The code is available at this URL.
Open Datasets Yes Unconditional Image Generation: We adhere to the protocol established in the original Re Flow paper (Liu et al., 2023) for unconditional image generation on CIFAR-10, utilizing the open-source 1-Rectified-Flow-distill weights. [...] Conditional Image Generation: We compare the performance of our method against the vanilla rectified flow and the second-order RF-solver on the fundamental T2I task. Following the setup in RF-solver, we evaluate a randomly selected subset of 10K images from the MSCOCO Caption 2014 validation set (Chen et al., 2015), using the groundtruth captions as reference prompts. [...] Quantitative Comparison: We report the inversion and reconstruction results on the first 1K images from the Densely Captioned Images (DCI) dataset (Urbanek et al., 2024), using the official descriptions as source prompts. [...] We evaluate prompt-guided editing using the recent PIE-Bench dataset (Ju et al., 2024), which comprises 700 images across 10 types of edits.
Dataset Splits Yes Unconditional Image Generation: We adhere to the protocol established in the original Re Flow paper (Liu et al., 2023) for unconditional image generation on CIFAR-10, utilizing the open-source 1-Rectified-Flow-distill weights. [...] Conditional Image Generation: We compare the performance of our method against the vanilla rectified flow and the second-order RF-solver on the fundamental T2I task. Following the setup in RF-solver, we evaluate a randomly selected subset of 10K images from the MSCOCO Caption 2014 validation set (Chen et al., 2015), using the groundtruth captions as reference prompts. [...] Quantitative Comparison: We report the inversion and reconstruction results on the first 1K images from the Densely Captioned Images (DCI) dataset (Urbanek et al., 2024), using the official descriptions as source prompts. [...] We evaluate prompt-guided editing using the recent PIE-Bench dataset (Ju et al., 2024), which comprises 700 images across 10 types of edits.
Hardware Specification Yes Table 6. Per-image inference time for Re Flow inversion-based editing measured on an RTX 3090.
Software Dependencies No The paper does not provide specific software dependencies with version numbers. It mentions 'Python-style pseudo-code' but no concrete software environment details.
Experiment Setup Yes Steps: Since the number of inference steps can significantly impact performance, we follow the best settings reported for the RF-Solver to ensure a fair comparison: 10 steps for text-to-image generation (T2I) and 30 steps for reconstruction. For editing, RF-Solver varies the number of steps by task, using up to 25 steps. In contrast, we find that our approach achieves comparable or better results using 8 steps. The ablation study is shown in Section E. [...] Conditional Image Generation: [...] The FID and CLIP scores for results generated with a fixed random seed of 1024 are presented in Table 3. [...] Table 7. Comparison on different editing methods. Results on PIE Bench are reported. Guidance terms indicate the guidance ratio settings used in the FLUX model during the denoising process.