GDrag:Towards General-Purpose Interactive Editing with Anti-ambiguity Point Diffusion
Authors: Xiaojian Lin, Hanhui Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the challenging Drag Bench dataset demonstrate that GDrag outperforms state-of-the-art methods significantly. The code of GDrag is available at https://github.com/Da Da Y-coder/GDrag. 4 EXPERIMENTS In this section, we provide quantitative and qualitative analysis of GDrag. |
| Researcher Affiliation | Collaboration | 1Shenzhen Campus of Sun Yat-Sen University 2Guangdong Key Laboratory of Big Data Analysis and Processing 3Lenovo Research 4Peng Cheng Laboratory |
| Pseudocode | No | The paper describes methods using text and mathematical formulas but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code of GDrag is available at https://github.com/Da Da Y-coder/GDrag. |
| Open Datasets | Yes | Extensive experiments on the challenging Drag Bench dataset demonstrate that GDrag outperforms state-of-the-art methods significantly. ... We compare GDrag with state-of-the-art methods on the Drag Bench dataset (Shi et al., 2024b) to validate its effectiveness. |
| Dataset Splits | No | The paper mentions using the Drag Bench dataset for evaluation but does not specify any training/test/validation splits, percentages, or sample counts for their experiments. |
| Hardware Specification | Yes | All our experiments are conducted with an NVIDIA Ge Force RTX 4090 graphical card (24 GB). |
| Software Dependencies | No | The paper mentions using "Stable Diffusion 1.5 (Rombach et al., 2022) as the base diffusion model" and "SMS uses the Adam optimizer", but it does not provide specific version numbers for software libraries or programming languages like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | SMS uses the Adam optimizer with a learning rate of 0.01 for b and 0.02 for s, and the parameter λ is set to 0.2. In addition, the values of parameters L, T , and N are 50, 35, and 5, respectively. We set r = 5 and k = |M|/10 to calculate the boundary band in the non-rigid transformation task. In the relocation and in-plane rotation tasks, the value of ρ is set to 1.0, while in the non-rigid transformation and out-of-plane rotation tasks, the value of ρ is 0.2. We calculate the value of β adaptively based on the specific task and the details can be found in Appendix A.10. ... we perform Lo RA fine-tuning on the input image, with 80 fine-tuning steps and the adaptor rank is 16. |