Training-and-Prompt-Free General Painterly Harmonization via Zero-Shot Disentenglement on Style and Content References
Authors: Teng-Fang Hsiao, Bo-Kai Ruan, Hong-Han Shuai
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the efficacy of our method in all benchmarks. |
| Researcher Affiliation | Academia | National Yang Ming Chiao Tung University, Taiwan EMAIL |
| Pseudocode | No | The overall algorithm and visualization can be found in Appendix. |
| Open Source Code | Yes | Code https://github.com/Blue Dyee/TF-GPH |
| Open Datasets | Yes | We generalize the computational metrics and benchmarks from various image editing methods including Painterly image harmonization , Prompt-based Image Composition , and Style Transfer... and aims to mitigate the shortcomings of existing benchmarks such as Wiki Art combined with COCO (Tan et al. 2019; Lin et al. 2014) and the TF-ICON Benchmark (Lu, Liu, and Kong 2023). |
| Dataset Splits | No | Details and experiment of these datasets can be found in the Appendix. |
| Hardware Specification | No | No specific hardware details such as GPU/CPU models or processor types are mentioned in the paper. |
| Software Dependencies | No | We employ the Stable Diffusion model (Rombach et al. 2022) as the pretrained backbone and utilize DPM Solver++ as the scheduler with a total of 25 steps for both inversion and reconstruction. |
| Experiment Setup | Yes | Experiments Setup. We employ the Stable Diffusion model (Rombach et al. 2022) as the pretrained backbone and utilize DPM Solver++ as the scheduler with a total of 25 steps for both inversion and reconstruction. Specifically, we first resize the input images If, Ib, and Ic to 512 512, and encode them into corresponding zf 0, zb 0, and zc 0. Afterward, we take these latents with prompt embedding ρexceptional as the input during both inversion and reconstruction stage... Without loss of generality, we place a higher tendency on style reference zb by setting β to 1.1, and a minor preference on content preservation related to zf by setting α to 0.9 |