OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

Authors: Cong Wei, Zheyang Xiong, Weiming Ren, Xeron Du, Ge Zhang, Wenhu Chen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Both automatic evaluation and human evaluations demonstrate that OMNI-EDIT can significantly outperform all the existing models. Our code, dataset and model will be available at https://tiger-ai-lab.github.io/OmniEdit/ ... We perform comprehensive automatic and human evaluation to show the significant boost of OMNI-EDIT over the existing baseline models... In Section 5.2, we study the advantages of importance sampling for synthetic data. In Section 5.2, we perform an analysis to study the design of OMNI-EDIT.
Researcher Affiliation Collaboration 1University of Waterloo, 2University of Wisconsin-Madison, 3Vector Institute, 4M-A-P EMAIL, EMAIL, EMAIL
Pseudocode Yes We formally provide the algorithm in 1.
Open Source Code Yes Our code, dataset and model will be available at https://tiger-ai-lab.github.io/OmniEdit/
Open Datasets Yes We constructed the training dataset D by sampling high-resolution images with a minimum resolution of 1 megapixel from the LAION-5B (Schuhmann et al., 2022) and Open Image V6 (Kuznetsova et al., 2020) databases. Our code, dataset and model will be available at https://tiger-ai-lab.github.io/OmniEdit/
Dataset Splits No The paper describes creating a training dataset of 1.2M entries and a test benchmark (OMNI-EDIT-BENCH) with 434 edits. While it mentions the size of the curated training set and the test set, it does not provide specific percentages or counts for traditional training, validation, and test splits for the main model training, nor does it refer to predefined splits in a way that would allow direct reproduction of the data partitioning process for internal validation.
Hardware Specification Yes We train OMNI-EDIT on the 1.2M OMNI-EDIT training dataset for 2 epochs on a single node with 8 H100 GPUs.
Software Dependencies No The paper mentions building upon "Stable diffusion 3 Medium" but does not specify version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), programming languages, or other ancillary software components used for implementation.
Experiment Setup Yes The OMNI-EDIT model is built upon Stable diffusion 3 Medium(Esser et al., 2024) with Edit Net architecture. The stable diffusion 3 has 24 Di T layers. Each layer has a corresponding Edit Net layer. We train OMNI-EDIT on the 1.2M OMNI-EDIT training dataset for 2 epochs on a single node with 8 H100 GPUs.