Origin Identification for Text-Guided Image-to-Image Diffusion Models
Authors: Wenhao Wang, Yifan Sun, Zongxin Yang, Zhentao Tan, Zhengdong Hu, Yi Yang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the proposed method achieves satisfying generalization performance, significantly surpassing similarity-based methods (+31.6% m AP), even those with generalization designs. The project is available at https://id2icml.github.io. Extensive experimental results show (1) the challenge of the proposed ID2 task: all pre-trained deep embedding models, fine-tuned similarity-based methods, and specialized domain generalization methods fail to achieve satisfying performance; and (2) the effectiveness of our proposed method: it achieves 88.8%, 81.5%, 87.3%, 89.3%, 85.7%, 85.7%, and 90.3% m AP, respectively, for Stable Diffusion 2, Stable Diffusion XL, Open Dalle, Colorful XL, Kandinsky-3, Stable Diffusion 3, and Kolors. |
| Researcher Affiliation | Academia | 1University of Technology Sydney 2Zhejiang University 3Harvard University 4Peking University. Correspondence to: Yi Yang <EMAIL>. |
| Pseudocode | No | The paper describes methods and proofs in prose and mathematical notation but does not contain any explicitly labeled pseudocode or algorithm blocks. For example, Section 4.3 'Implementation' describes the learning process but not in a structured pseudocode format. |
| Open Source Code | Yes | The project is available at https://id2icml.github.io. |
| Open Datasets | Yes | To advance research in ID2, this section introduces Ori PID, the first dataset specifically designed for the proposed task. The source images in Ori PID are derived from the DISC21 dataset (Papakipos et al., 2022), which is a subset of the realworld multimedia dataset YFCC100M (Thomee et al., 2016). |
| Dataset Splits | Yes | The training set comprises (1) 100, 000 origins randomly selected from the 1, 000, 000 original images in DISC21, (2) 2, 000, 000 guided prompts (20 for each origin) generated by GPT-4o (for details on how these prompts were generated, see Appendix (Section B)), and (3) 2, 000, 000 images generated by inputting the origins and prompts into Stable Diffusion 2 (Rombach et al., 2022). For testing, we randomly select 5, 000 images as origins from a reference set containing 1, 000, 000 images, and ask GPT-4o to generate a guided prompt for each origin. Subsequently, we generate 5, 000 queries using the origins, corresponding prompts, and each of the following models: Stable Diffusion 2, Stable Diffusion XL, Open Dalle, Colorful XL, Kandinsky-3, Stable Diffusion 3, and Kolors. |
| Hardware Specification | Yes | We distribute the optimization of the theoretically expected matrix W across 8 NVIDIA A100 GPUs using Py Torch. |
| Software Dependencies | No | The paper mentions 'Py Torch' as a software used but does not provide a specific version number. Other software like diffusion models (Stable Diffusion 2, etc.) and GPT-4o are mentioned as tools or models rather than core implementation dependencies with versions. |
| Experiment Setup | Yes | During testing, the editing strengths for Stable Diffusion 2, Stable Diffusion XL, Open Dalle, Colorful XL, Kandinsky-3, Stable Diffusion 3, and Kolors, are 0.9, 0.8, 0.7, 0.7, 0.6, 0.8, and 0.7, respectively... During training, the editing strength for Stable Diffusion 2 is 0.9. The classifier-free guidance (CFG) scale is set to 7.5 for all diffusion models... The images are resized to a resolution of 256 256 before being embedded by the VAE encoder. The peak learning rate is set to 3.5 10 4, and the Adam optimizer is used. |