reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Origin Identification for Text-Guided Image-to-Image Diffusion Models

Authors: Wenhao Wang, Yifan Sun, Zongxin Yang, Zhentao Tan, Zhengdong Hu, Yi Yang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that the proposed method achieves satisfying generalization performance, signiﬁcantly surpassing similarity-based methods (+31.6% m AP), even those with generalization designs. The project is available at https://id2icml.github.io. Extensive experimental results show (1) the challenge of the proposed ID2 task: all pre-trained deep embedding models, ﬁne-tuned similarity-based methods, and specialized domain generalization methods fail to achieve satisfying performance; and (2) the effectiveness of our proposed method: it achieves 88.8%, 81.5%, 87.3%, 89.3%, 85.7%, 85.7%, and 90.3% m AP, respectively, for Stable Diffusion 2, Stable Diffusion XL, Open Dalle, Colorful XL, Kandinsky-3, Stable Diffusion 3, and Kolors.
Researcher Affiliation	Academia	1University of Technology Sydney 2Zhejiang University 3Harvard University 4Peking University. Correspondence to: Yi Yang <EMAIL>.
Pseudocode	No	The paper describes methods and proofs in prose and mathematical notation but does not contain any explicitly labeled pseudocode or algorithm blocks. For example, Section 4.3 'Implementation' describes the learning process but not in a structured pseudocode format.
Open Source Code	Yes	The project is available at https://id2icml.github.io.
Open Datasets	Yes	To advance research in ID2, this section introduces Ori PID, the ﬁrst dataset speciﬁcally designed for the proposed task. The source images in Ori PID are derived from the DISC21 dataset (Papakipos et al., 2022), which is a subset of the realworld multimedia dataset YFCC100M (Thomee et al., 2016).
Dataset Splits	Yes	The training set comprises (1) 100, 000 origins randomly selected from the 1, 000, 000 original images in DISC21, (2) 2, 000, 000 guided prompts (20 for each origin) generated by GPT-4o (for details on how these prompts were generated, see Appendix (Section B)), and (3) 2, 000, 000 images generated by inputting the origins and prompts into Stable Diffusion 2 (Rombach et al., 2022). For testing, we randomly select 5, 000 images as origins from a reference set containing 1, 000, 000 images, and ask GPT-4o to generate a guided prompt for each origin. Subsequently, we generate 5, 000 queries using the origins, corresponding prompts, and each of the following models: Stable Diffusion 2, Stable Diffusion XL, Open Dalle, Colorful XL, Kandinsky-3, Stable Diffusion 3, and Kolors.
Hardware Specification	Yes	We distribute the optimization of the theoretically expected matrix W across 8 NVIDIA A100 GPUs using Py Torch.
Software Dependencies	No	The paper mentions 'Py Torch' as a software used but does not provide a specific version number. Other software like diffusion models (Stable Diffusion 2, etc.) and GPT-4o are mentioned as tools or models rather than core implementation dependencies with versions.
Experiment Setup	Yes	During testing, the editing strengths for Stable Diffusion 2, Stable Diffusion XL, Open Dalle, Colorful XL, Kandinsky-3, Stable Diffusion 3, and Kolors, are 0.9, 0.8, 0.7, 0.7, 0.6, 0.8, and 0.7, respectively... During training, the editing strength for Stable Diffusion 2 is 0.9. The classiﬁer-free guidance (CFG) scale is set to 7.5 for all diffusion models... The images are resized to a resolution of 256 256 before being embedded by the VAE encoder. The peak learning rate is set to 3.5 10 4, and the Adam optimizer is used.