DiT4Edit: Diffusion Transformer for Image Editing
Authors: Kunyu Feng, Yue Ma, Bingyuan Wang, Chenyang Qi, Haozhe Chen, Qifeng Chen, Zeyu Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the strong performance of Di T4Edit in various editing scenarios, highlighting the potential of diffusion transformers for image editing. Experiments demonstrate that our framework achieves superior editing results with fewer inference steps. Extensive qualitative and quantitative results demonstrate the superior performance of Di T4Edit in object editing, style editing, and shape-aware editing for various image sizes, including 512 512, 1024 1024, 1024 2048. For quantitative evaluation, we used three indicators: Fr echet Inception Distance (FID) (Heusel et al. 2017), Peak Signal-to-Noise Ratio (PSNR), and CLIP to evaluate the performance differences between our model and SOTA in image generation quality, background preservation, and text alignment. We perform a series of ablation studies to demonstrate the effectiveness of DPM-Solver inversion and patches merging. |
| Researcher Affiliation | Academia | 1Peking University, China 2The Hong Kong University of Science and Technology, China 3The Hong Kong University of Science and Technology (Guangzhou), China EMAIL, EMAIL, EMAIL EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods and components but does not include any clearly labeled pseudocode, algorithm blocks, or structured, code-like steps for any procedure. |
| Open Source Code | Yes | Code https://github.com/fkyyyy/Di T4Edit |
| Open Datasets | No | The paper mentions using pre-trained models and conducting editing on real and generated images, but it does not provide specific access information (links, DOIs, repositories, or formal citations) for any publicly available or open dataset used in its experiments or for evaluation. |
| Dataset Splits | No | The paper does not mention any specific dataset splits (e.g., training, validation, test percentages or counts) or refer to standard splits of a named dataset, likely because no specific dataset is mentioned for their own experimental evaluation. |
| Hardware Specification | Yes | All experiments were run on an NVIDIA Tesla A100 GPU. |
| Software Dependencies | No | The paper mentions various models and methods like DPM-Solver and PIXART-α, but it does not specify any ancillary software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | We configured the DPM-Solver with 30 steps, the classifier-free guidance of 4.5, and a patch merging ratio of 0.3 0.7. |