FunEditor: Achieving Complex Image Edits via Function Aggregation with Diffusion Models
Authors: Mohammadreza Samadi, Fred X. Han, Mohammad Salameh, Hao Wu, Fengyu Sun, Chunhua Zhou, Di Niu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that Fun Editor significantly outperforms recent inference-time optimization methods and fine-tuned models, either quantitatively across various metrics or through visual comparisons or both, on complex tasks like object movement and object pasting. In the meantime, with only 4 steps of inference, Fun Editor achieves 5 24 times inference speedups over existing popular methods. |
| Researcher Affiliation | Collaboration | 1Huawei Technologies Canada, 2Dept. ECE, University of Alberta, 3Huawei Kirin Solution, China EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes its methodology using equations and textual explanations, along with figures illustrating the pipeline, but does not include any distinct pseudocode or algorithm blocks. |
| Open Source Code | No | Project https://mhmdsmdi.github.io/funeditor. The provided URL is a project demonstration page (github.io) rather than a direct link to a code repository. |
| Open Datasets | Yes | The COCOEE dataset, compiled by Yang et al. (2022), features 3,500 images manually selected from the MSCOCO (Microsoft Common Objects in Context) (Lin et al. 2014) validation set. A human operator used the Segment Anything model (Kirillov et al. 2023) to extract segments and assign diff vectors, resulting in a benchmark of 100 images with corresponding masks and diff vectors for object movement tasks. Wang et al. (2024) also released the Re S dataset, comprising 100 pairs of real-world images that present challenging cases of object movement. |
| Dataset Splits | No | The paper mentions using COCOEE and Re S datasets for evaluation, and describes them as a 'benchmark of 100 images' and '100 pairs of real-world images' respectively. However, it does not provide specific training/validation/test splits (e.g., percentages or exact counts) for any of its experiments, nor does it refer to predefined standard splits for its specific use case. |
| Hardware Specification | Yes | We fine-tuned our model using 4 Nvidia V100 (32GB) GPUs |
| Software Dependencies | Yes | During inference, we distilled the trained model into 4 steps using SD1.5 LCM-Lo RA from Hugging Face1. For the baselines, we used SDv1.5 (Rombach et al. 2022) from Hugging Face2 |
| Experiment Setup | Yes | We fine-tuned our model using 4 Nvidia V100 (32GB) GPUs, with the IP2P UNet (Brooks, Holynski, and Efros 2023) serving as our diffusion backbone. The Adam W optimizer (Loshchilov and Hutter 2017) was employed with a learning rate of 5 10 5 and a batch size of 4. During inference, we distilled the trained model into 4 steps using SD1.5 LCM-Lo RA from Hugging Face1. |