Spin: Diffusion-based Semantic Image Painting Through Independent Information Injection
Authors: Dantong Wu, Zhiqiang Chen, Tianjiao Du, Peipei Ran, Mengchao Bai, Kai Zhang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that Spin achieve superior semantic similarity and image coherence across various styles, including realistic, pencil drawings, cartoon, and oil painting. Additionally, Spin offers diversity and editability, and can be integrated into other models that meet our prerequisites. |
| Researcher Affiliation | Collaboration | 1Shenzhen International Graduate School, Tsinghua University, China 2Institute of Automation, Chinese Academy of Science, China 3Media Technology Lab, Huawei, China |
| Pseudocode | No | The paper describes the methodology in narrative text and mathematical formulations but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | No | The paper states: "Also, our approach is diverse, editable, model-independent so can be used in other frameworks." and "Since our design doesn t depend on a specific model, it can be applied to any framework that satisfies our setting." However, there is no explicit statement of code release or a link to a code repository. |
| Open Datasets | Yes | We mainly use the dataset proposed by TF(Lu, Liu, and Kong 2023), which contains 332 sets of data from four domains: real domain, pencil drawings domain, cartoon domain and oil painting domain. |
| Dataset Splits | No | In the quantitative experiments, we construct the test-set by all data from real domain where metrics are more effective. In the qualitative experiments, we enrich it by collecting images from the Internet since it lacks non-real domains images. ... Due to the diversity of the diffusion model, we generate 32 images for each image in Blended, and 4 images for BLD, SD-inpaint, PPt, Uni and our work. |
| Hardware Specification | Yes | Our experiments are conducted on a single V100 GPU. |
| Software Dependencies | Yes | Most of the baselines are based on version 1.4 of Stable Diffusion(Rombach et al. 2022), so we use the same model for fairness. |
| Experiment Setup | Yes | We adopt DDIM(Song, Meng, and Ermon 2020) with 50 steps for sampling and set the CFG scale(Ho and Salimans 2022) to 7.5. For each reference image, we perform 200 iterations using the Adam W(Kingma and Ba 2014) optimizer with a learning rate of 0.0005. |