Origin Identification for Text-Guided Image-to-Image Diffusion Models

Authors: Wenhao Wang, Yifan Sun, Zongxin Yang, Zhentao Tan, Zhengdong Hu, Yi Yang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that the proposed method achieves satisfying generalization performance, significantly surpassing similarity-based methods (+31.6% m AP), even those with generalization designs. The project is available at https://id2icml.github.io. Extensive experimental results show (1) the challenge of the proposed ID2 task: all pre-trained deep embedding models, fine-tuned similarity-based methods, and specialized domain generalization methods fail to achieve satisfying performance; and (2) the effectiveness of our proposed method: it achieves 88.8%, 81.5%, 87.3%, 89.3%, 85.7%, 85.7%, and 90.3% m AP, respectively, for Stable Diffusion 2, Stable Diffusion XL, Open Dalle, Colorful XL, Kandinsky-3, Stable Diffusion 3, and Kolors.
Researcher Affiliation Academia 1University of Technology Sydney 2Zhejiang University 3Harvard University 4Peking University. Correspondence to: Yi Yang <EMAIL>.
Pseudocode No The paper describes methods and proofs in prose and mathematical notation but does not contain any explicitly labeled pseudocode or algorithm blocks. For example, Section 4.3 'Implementation' describes the learning process but not in a structured pseudocode format.
Open Source Code Yes The project is available at https://id2icml.github.io.
Open Datasets Yes To advance research in ID2, this section introduces Ori PID, the first dataset specifically designed for the proposed task. The source images in Ori PID are derived from the DISC21 dataset (Papakipos et al., 2022), which is a subset of the realworld multimedia dataset YFCC100M (Thomee et al., 2016).
Dataset Splits Yes The training set comprises (1) 100, 000 origins randomly selected from the 1, 000, 000 original images in DISC21, (2) 2, 000, 000 guided prompts (20 for each origin) generated by GPT-4o (for details on how these prompts were generated, see Appendix (Section B)), and (3) 2, 000, 000 images generated by inputting the origins and prompts into Stable Diffusion 2 (Rombach et al., 2022). For testing, we randomly select 5, 000 images as origins from a reference set containing 1, 000, 000 images, and ask GPT-4o to generate a guided prompt for each origin. Subsequently, we generate 5, 000 queries using the origins, corresponding prompts, and each of the following models: Stable Diffusion 2, Stable Diffusion XL, Open Dalle, Colorful XL, Kandinsky-3, Stable Diffusion 3, and Kolors.
Hardware Specification Yes We distribute the optimization of the theoretically expected matrix W across 8 NVIDIA A100 GPUs using Py Torch.
Software Dependencies No The paper mentions 'Py Torch' as a software used but does not provide a specific version number. Other software like diffusion models (Stable Diffusion 2, etc.) and GPT-4o are mentioned as tools or models rather than core implementation dependencies with versions.
Experiment Setup Yes During testing, the editing strengths for Stable Diffusion 2, Stable Diffusion XL, Open Dalle, Colorful XL, Kandinsky-3, Stable Diffusion 3, and Kolors, are 0.9, 0.8, 0.7, 0.7, 0.6, 0.8, and 0.7, respectively... During training, the editing strength for Stable Diffusion 2 is 0.9. The classifier-free guidance (CFG) scale is set to 7.5 for all diffusion models... The images are resized to a resolution of 256 256 before being embedded by the VAE encoder. The peak learning rate is set to 3.5 10 4, and the Adam optimizer is used.