Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
Authors: Litu Rout, Yujia Chen, Nataniel Ruiz, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate RF inversion on stroke-to-image generation and image editing tasks, with additional qualitative results on cartoonization, object insertion, image generation, and content-style composition. Our method significantly improves photorealism in stroke-to-image generation, surpassing a state-of-the-art (So TA) method (Mokady et al., 2023) by 89%, while maintaining faithfulness to the input stroke. In addition, we show that RF inversion outperforms DM inversion (Meng et al., 2022) in faithfulness by 4.7% and in realism by 13.8% on LSUN-bedroom dataset (Wang et al., 2017). Figure 1 shows a graphical illustration of our method RF-Inversion. |
| Researcher Affiliation | Collaboration | 1 Google 2 UT Austin |
| Pseudocode | Yes | Algorithm 1: Controlled Forward ODE (8) Input: Discretization steps N, reference image y0, prompt embedding network Φ, Flux model u( , , ; ϕ), Flux noise scheduler σ : [0, 1] R Tunable parameter: Controller guidance γ Output: Structured noise Y1 |
| Open Source Code | Yes | See our project page https://rf-inversion.github.io/ for code and demo. ... Refer to our project page: https://rf-inversion.github.io/ for source code and demo. |
| Open Datasets | Yes | We evaluate RF inversion on stroke-to-image generation and image editing tasks, with additional qualitative results on cartoonization, object insertion, image generation, and content-style composition. ... We show that RF inversion outperforms DM inversion across three benchmarks: LSUN-church, LSUN-bedroom (Wang et al., 2017), and SFHQ (Beniaguev, 2022) on two tasks: Stroke2Image generation and image editing. |
| Dataset Splits | Yes | On the test split of LSUN bedroom dataset, our approach is 4.7% more faithful and 13.79% more realistic than the best optimization free method SDEdit-SD1.5. ... We conduct a user study on the test splits of both LSUN Bedroom and LSUN Church dataset using Amazon Mechanical Turk |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) were explicitly mentioned for running the experiments. |
| Software Dependencies | No | The paper mentions software like "NTI codebase", "Diffusers library", and "Flux" without specifying exact version numbers for these or other underlying libraries/frameworks (e.g., PyTorch, Python). |
| Experiment Setup | Yes | In Table 4, we provide the hyper-parameters for the empirical results reported in 5. We use a fix γ = 0.5 in our controlled forward ODE (8) and a time-varying guidance parameter ηt in our controlled reverse ODE (15), as motivated in Remark 3.3 and Remark 3.6. Thus, our algorithm introduces one additional hyper-parameter ηt into the Flux pipeline. For each experiment, we use a fixed time-varying schedule of ηt described by starting time (s), stopping time τ, and strength (η). We use the default config for Flux model: 3.5 for classifier-free guidance and 28 for the total number of inference steps. |