GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models

Authors: Zewei Zhang, Huan Liu, Jun Chen, Xiangyu Xu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that the proposed Good Drag compares favorably against the state-of-the-art approaches both qualitatively and quantitatively. The source code and data are available at https://gooddrag.github.io. In addition, we contribute to the benchmarking of drag editing by introducing a new dataset, Drag100, and developing dedicated quality assessment metrics, Dragging Accuracy Index and Gemini Score, utilizing Large Multimodal Models. Section 5 details the experiments conducted.
Researcher Affiliation Academia Mc Master University Xi an Jiaotong University
Pseudocode Yes Finally, the whole pipeline of Good Drag is summarized in Algorithm 1 in the Appendix.
Open Source Code Yes The source code and data are available at https://gooddrag.github.io.
Open Datasets Yes In addition, we contribute to the benchmarking of drag editing by introducing a new dataset, Drag100, and developing dedicated quality assessment metrics, Dragging Accuracy Index and Gemini Score, utilizing Large Multimodal Models. The source code and data are available at https://gooddrag.github.io.
Dataset Splits No The paper introduces Drag100 as a new evaluation dataset consisting of 100 images. It describes the composition of the dataset (e.g., 85 real images, 15 AI-generated images, categories like animal images, artistic paintings, etc.) and its use for evaluation. However, it does not specify any training/test/validation splits for this dataset or any other dataset used for training their model. The dataset is explicitly for 'evaluation'.
Hardware Specification Yes We evaluate the runtime and GPU memory usage of Good Drag with an A100 GPU.
Software Dependencies No The paper mentions using "Stable Diffusion 1.5" as the base model and employing the "Adam optimizer." It does not provide specific version numbers for software libraries or environments (e.g., Python, PyTorch, CUDA versions) that would be needed for replication.
Experiment Setup Yes In our experiments, we use Stable Diffusion 1.5 (Rombach et al., 2022) as the base model and finetune its U-Net with Lo RA (rank=16) to enhance image fidelity. We employ the Adam optimizer (Kingma & Ba, 2014) with a 0.02 learning rate. For the diffusion process, we set Tmax = 50 denoising steps, an inversion strength of κ = 0.75 (resulting in T = Tmax κ = 38), and no text prompt. Features for Eq. 6 are extracted from the last U-Net layer. In the Al DD framework, we set the motion supervision and point tracking radii to r1 = 4 and r2 = 12, respectively, with a drag size β = 4 and a mask loss weight λ = 0.2. We perform a total of K = 70 drag operations, with B = 10 operations per denoising step, resulting in K/B = 7 denoising steps during the alternating phase. Each drag operation includes J = 3 motion supervision steps in Eq. 7. Similar to Shi et al. (2023), we incorporate the Latent-Masa Ctrl mechanism (Cao et al., 2023) starting from the 10th U-Net layer to enhance editing performance.