Spin: Diffusion-based Semantic Image Painting Through Independent Information Injection

Authors: Dantong Wu, Zhiqiang Chen, Tianjiao Du, Peipei Ran, Mengchao Bai, Kai Zhang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that Spin achieve superior semantic similarity and image coherence across various styles, including realistic, pencil drawings, cartoon, and oil painting. Additionally, Spin offers diversity and editability, and can be integrated into other models that meet our prerequisites.
Researcher Affiliation Collaboration 1Shenzhen International Graduate School, Tsinghua University, China 2Institute of Automation, Chinese Academy of Science, China 3Media Technology Lab, Huawei, China
Pseudocode No The paper describes the methodology in narrative text and mathematical formulations but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code No The paper states: "Also, our approach is diverse, editable, model-independent so can be used in other frameworks." and "Since our design doesn t depend on a specific model, it can be applied to any framework that satisfies our setting." However, there is no explicit statement of code release or a link to a code repository.
Open Datasets Yes We mainly use the dataset proposed by TF(Lu, Liu, and Kong 2023), which contains 332 sets of data from four domains: real domain, pencil drawings domain, cartoon domain and oil painting domain.
Dataset Splits No In the quantitative experiments, we construct the test-set by all data from real domain where metrics are more effective. In the qualitative experiments, we enrich it by collecting images from the Internet since it lacks non-real domains images. ... Due to the diversity of the diffusion model, we generate 32 images for each image in Blended, and 4 images for BLD, SD-inpaint, PPt, Uni and our work.
Hardware Specification Yes Our experiments are conducted on a single V100 GPU.
Software Dependencies Yes Most of the baselines are based on version 1.4 of Stable Diffusion(Rombach et al. 2022), so we use the same model for fairness.
Experiment Setup Yes We adopt DDIM(Song, Meng, and Ermon 2020) with 50 steps for sampling and set the CFG scale(Ho and Salimans 2022) to 7.5. For each reference image, we perform 200 iterations using the Adam W(Kingma and Ba 2014) optimizer with a learning rate of 0.0005.