reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models

Authors: Zewei Zhang, Huan Liu, Jun Chen, Xiangyu Xu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that the proposed Good Drag compares favorably against the state-of-the-art approaches both qualitatively and quantitatively. The source code and data are available at https://gooddrag.github.io. In addition, we contribute to the benchmarking of drag editing by introducing a new dataset, Drag100, and developing dedicated quality assessment metrics, Dragging Accuracy Index and Gemini Score, utilizing Large Multimodal Models. Section 5 details the experiments conducted.
Researcher Affiliation	Academia	Mc Master University Xi an Jiaotong University
Pseudocode	Yes	Finally, the whole pipeline of Good Drag is summarized in Algorithm 1 in the Appendix.
Open Source Code	Yes	The source code and data are available at https://gooddrag.github.io.
Open Datasets	Yes	In addition, we contribute to the benchmarking of drag editing by introducing a new dataset, Drag100, and developing dedicated quality assessment metrics, Dragging Accuracy Index and Gemini Score, utilizing Large Multimodal Models. The source code and data are available at https://gooddrag.github.io.
Dataset Splits	No	The paper introduces Drag100 as a new evaluation dataset consisting of 100 images. It describes the composition of the dataset (e.g., 85 real images, 15 AI-generated images, categories like animal images, artistic paintings, etc.) and its use for evaluation. However, it does not specify any training/test/validation splits for this dataset or any other dataset used for training their model. The dataset is explicitly for 'evaluation'.
Hardware Specification	Yes	We evaluate the runtime and GPU memory usage of Good Drag with an A100 GPU.
Software Dependencies	No	The paper mentions using "Stable Diffusion 1.5" as the base model and employing the "Adam optimizer." It does not provide specific version numbers for software libraries or environments (e.g., Python, PyTorch, CUDA versions) that would be needed for replication.
Experiment Setup	Yes	In our experiments, we use Stable Diffusion 1.5 (Rombach et al., 2022) as the base model and finetune its U-Net with Lo RA (rank=16) to enhance image fidelity. We employ the Adam optimizer (Kingma & Ba, 2014) with a 0.02 learning rate. For the diffusion process, we set Tmax = 50 denoising steps, an inversion strength of κ = 0.75 (resulting in T = Tmax κ = 38), and no text prompt. Features for Eq. 6 are extracted from the last U-Net layer. In the Al DD framework, we set the motion supervision and point tracking radii to r1 = 4 and r2 = 12, respectively, with a drag size β = 4 and a mask loss weight λ = 0.2. We perform a total of K = 70 drag operations, with B = 10 operations per denoising step, resulting in K/B = 7 denoising steps during the alternating phase. Each drag operation includes J = 3 motion supervision steps in Eq. 7. Similar to Shi et al. (2023), we incorporate the Latent-Masa Ctrl mechanism (Cao et al., 2023) starting from the 10th U-Net layer to enhance editing performance.