reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing

Authors: Yingying Deng, Xiangyu He, Changwang Mei, Peisong Wang, Fan Tang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiment 5.1. Implementation Details Baselines: This section compares Fire Flow with DM inversion-based editing methods such as Prompt-to-Prompt (Hertz et al., 2022) , Masa Ctrl (Cao et al., 2023), Pix2Pixzero (Parmar et al., 2023), Plug-and-Play (Tumanyan et al., 2023), Diff Edit (Couairon et al., 2023) and Direct Inversion (Ju et al., 2024). We also consider the recent RF inversion methods, such as RF-Inversion (Rout et al., 2025) and RF-Solver (Wang et al., 2024). Metrics: We evaluate different methods across three aspects: generation quality, text-guided quality, and preservation quality. The Fr echet Inception Distance (FID) (Heusel et al., 2017) is used to measure image generation quality by comparing the generated images to real ones. A CLIP model (Radford et al., 2021) is used to calculate the similarity between the generated image and the guiding text. To assess the preservation quality of non-edited areas, we use metrics including Learned Perceptual Image Patch Similarity (LPIPS) (Zhang et al., 2018), Structural Similarity Index Measure (SSIM) (Wang et al., 2004), Peak Signal-to-Noise Ratio (PSNR), and structural distance (Ju et al., 2024). [...] 5.4. Inversion-based Semantic Image Editing Quantitative Comparison: We evaluate prompt-guided editing using the recent PIE-Bench dataset (Ju et al., 2024), which comprises 700 images across 10 types of edits. As shown in Table 4, we compare the editing performance in terms of preservation ability and CLIP similarity. Our method not only competes with but often outperforms other approaches, particularly in CLIP similarity. [...] E. Ablation Study Editing Steps: An ablation study was conducted to evaluate the influence of the number of editing steps on the editing performance. The number of steps was varied from 2 to 12, and the corresponding results are presented in Figure 8.
Researcher Affiliation	Academia	1University of Science and Technology Beijing, Beijing, China 2Institute of Automation, Chinese Academy of Sciences, Beijing, China 3Nanjing University of Science and Technology, Nanjing, China 4Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. Correspondence to: Xiangyu He <EMAIL>.
Pseudocode	Yes	A. The Pseudo-code for Inversion and Editing Algorithm 1 Solving Re Flow Inversion ODE [...] Algorithm 2 Solving Re Flow Denoising ODE (Editing) [...] D. Python-style Pseudo-Code
Open Source Code	No	The code is available at this URL.
Open Datasets	Yes	Unconditional Image Generation: We adhere to the protocol established in the original Re Flow paper (Liu et al., 2023) for unconditional image generation on CIFAR-10, utilizing the open-source 1-Rectified-Flow-distill weights. [...] Conditional Image Generation: We compare the performance of our method against the vanilla rectified flow and the second-order RF-solver on the fundamental T2I task. Following the setup in RF-solver, we evaluate a randomly selected subset of 10K images from the MSCOCO Caption 2014 validation set (Chen et al., 2015), using the groundtruth captions as reference prompts. [...] Quantitative Comparison: We report the inversion and reconstruction results on the first 1K images from the Densely Captioned Images (DCI) dataset (Urbanek et al., 2024), using the official descriptions as source prompts. [...] We evaluate prompt-guided editing using the recent PIE-Bench dataset (Ju et al., 2024), which comprises 700 images across 10 types of edits.
Dataset Splits	Yes	Unconditional Image Generation: We adhere to the protocol established in the original Re Flow paper (Liu et al., 2023) for unconditional image generation on CIFAR-10, utilizing the open-source 1-Rectified-Flow-distill weights. [...] Conditional Image Generation: We compare the performance of our method against the vanilla rectified flow and the second-order RF-solver on the fundamental T2I task. Following the setup in RF-solver, we evaluate a randomly selected subset of 10K images from the MSCOCO Caption 2014 validation set (Chen et al., 2015), using the groundtruth captions as reference prompts. [...] Quantitative Comparison: We report the inversion and reconstruction results on the first 1K images from the Densely Captioned Images (DCI) dataset (Urbanek et al., 2024), using the official descriptions as source prompts. [...] We evaluate prompt-guided editing using the recent PIE-Bench dataset (Ju et al., 2024), which comprises 700 images across 10 types of edits.
Hardware Specification	Yes	Table 6. Per-image inference time for Re Flow inversion-based editing measured on an RTX 3090.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers. It mentions 'Python-style pseudo-code' but no concrete software environment details.
Experiment Setup	Yes	Steps: Since the number of inference steps can significantly impact performance, we follow the best settings reported for the RF-Solver to ensure a fair comparison: 10 steps for text-to-image generation (T2I) and 30 steps for reconstruction. For editing, RF-Solver varies the number of steps by task, using up to 25 steps. In contrast, we find that our approach achieves comparable or better results using 8 steps. The ablation study is shown in Section E. [...] Conditional Image Generation: [...] The FID and CLIP scores for results generated with a fixed random seed of 1024 are presented in Table 3. [...] Table 7. Comparison on different editing methods. Results on PIE Bench are reported. Guidance terms indicate the guidance ratio settings used in the FLUX model during the denoising process.