FunEditor: Achieving Complex Image Edits via Function Aggregation with Diffusion Models

Authors: Mohammadreza Samadi, Fred X. Han, Mohammad Salameh, Hao Wu, Fengyu Sun, Chunhua Zhou, Di Niu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that Fun Editor significantly outperforms recent inference-time optimization methods and fine-tuned models, either quantitatively across various metrics or through visual comparisons or both, on complex tasks like object movement and object pasting. In the meantime, with only 4 steps of inference, Fun Editor achieves 5 24 times inference speedups over existing popular methods.
Researcher Affiliation Collaboration 1Huawei Technologies Canada, 2Dept. ECE, University of Alberta, 3Huawei Kirin Solution, China EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes its methodology using equations and textual explanations, along with figures illustrating the pipeline, but does not include any distinct pseudocode or algorithm blocks.
Open Source Code No Project https://mhmdsmdi.github.io/funeditor. The provided URL is a project demonstration page (github.io) rather than a direct link to a code repository.
Open Datasets Yes The COCOEE dataset, compiled by Yang et al. (2022), features 3,500 images manually selected from the MSCOCO (Microsoft Common Objects in Context) (Lin et al. 2014) validation set. A human operator used the Segment Anything model (Kirillov et al. 2023) to extract segments and assign diff vectors, resulting in a benchmark of 100 images with corresponding masks and diff vectors for object movement tasks. Wang et al. (2024) also released the Re S dataset, comprising 100 pairs of real-world images that present challenging cases of object movement.
Dataset Splits No The paper mentions using COCOEE and Re S datasets for evaluation, and describes them as a 'benchmark of 100 images' and '100 pairs of real-world images' respectively. However, it does not provide specific training/validation/test splits (e.g., percentages or exact counts) for any of its experiments, nor does it refer to predefined standard splits for its specific use case.
Hardware Specification Yes We fine-tuned our model using 4 Nvidia V100 (32GB) GPUs
Software Dependencies Yes During inference, we distilled the trained model into 4 steps using SD1.5 LCM-Lo RA from Hugging Face1. For the baselines, we used SDv1.5 (Rombach et al. 2022) from Hugging Face2
Experiment Setup Yes We fine-tuned our model using 4 Nvidia V100 (32GB) GPUs, with the IP2P UNet (Brooks, Holynski, and Efros 2023) serving as our diffusion backbone. The Adam W optimizer (Loshchilov and Hutter 2017) was employed with a learning rate of 5 10 5 and a batch size of 4. During inference, we distilled the trained model into 4 steps using SD1.5 LCM-Lo RA from Hugging Face1.