DiT4Edit: Diffusion Transformer for Image Editing

Authors: Kunyu Feng, Yue Ma, Bingyuan Wang, Chenyang Qi, Haozhe Chen, Qifeng Chen, Zeyu Wang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the strong performance of Di T4Edit in various editing scenarios, highlighting the potential of diffusion transformers for image editing. Experiments demonstrate that our framework achieves superior editing results with fewer inference steps. Extensive qualitative and quantitative results demonstrate the superior performance of Di T4Edit in object editing, style editing, and shape-aware editing for various image sizes, including 512 512, 1024 1024, 1024 2048. For quantitative evaluation, we used three indicators: Fr echet Inception Distance (FID) (Heusel et al. 2017), Peak Signal-to-Noise Ratio (PSNR), and CLIP to evaluate the performance differences between our model and SOTA in image generation quality, background preservation, and text alignment. We perform a series of ablation studies to demonstrate the effectiveness of DPM-Solver inversion and patches merging.
Researcher Affiliation Academia 1Peking University, China 2The Hong Kong University of Science and Technology, China 3The Hong Kong University of Science and Technology (Guangzhou), China EMAIL, EMAIL, EMAIL EMAIL, EMAIL
Pseudocode No The paper describes methods and components but does not include any clearly labeled pseudocode, algorithm blocks, or structured, code-like steps for any procedure.
Open Source Code Yes Code https://github.com/fkyyyy/Di T4Edit
Open Datasets No The paper mentions using pre-trained models and conducting editing on real and generated images, but it does not provide specific access information (links, DOIs, repositories, or formal citations) for any publicly available or open dataset used in its experiments or for evaluation.
Dataset Splits No The paper does not mention any specific dataset splits (e.g., training, validation, test percentages or counts) or refer to standard splits of a named dataset, likely because no specific dataset is mentioned for their own experimental evaluation.
Hardware Specification Yes All experiments were run on an NVIDIA Tesla A100 GPU.
Software Dependencies No The paper mentions various models and methods like DPM-Solver and PIXART-α, but it does not specify any ancillary software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes We configured the DPM-Solver with 30 steps, the classifier-free guidance of 4.5, and a patch merging ratio of 0.3 0.7.