SigStyle: Signature Style Transfer via Personalized Text-to-Image Models
Authors: Ye Wang, Tongyuan Bai, Xuping Xie, Zili Yi, Yilin Wang, Rui Ma
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Quantitative and qualitative evaluations demonstrate our approach outperforms existing style transfer methods for recognizing and transferring the signature styles. Extensive experiments and evaluations further demonstrate the versatility and effectiveness of our method. We randomly selected 10 content images and 15 style images, generating a total of 150 stylized images for each method by applying the Cartesian product to the content and style images. To evaluate content fidelity, we followed (Chung, Hyun, and Heo 2024) and employed the LPIPS metric (Zhang et al. 2018), which measures the similarity between the stylized image and the corresponding content image. For style similarity, we adopted the Style Loss (Gatys, Ecker, and Bethge 2016), measuring the alignment between the stylized image and the corresponding style image. As shown in Figure 7, our hypernetwork effectively facilitates the precise learning and inversion of the style. |
| Researcher Affiliation | Collaboration | Ye Wang1, Tongyuan Bai1, Xuping Xie2, Zili Yi3, Yilin Wang4*, Rui Ma1,5* 1 School of Artificial Intelligence, Jilin University 2 College of Computer Science and Technology, Jilin University 3 School of Intelligence Science and Technology, Nanjing University 4 Adobe 5 Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using textual explanations and mathematical formulas (e.g., equations 1, 3, 4, 5) and figures (Figure 3, 4, 5) but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository. It mentions 'Currently, Sig Style still needs to fine-tune the diffusion model for each given style image during inference, which limits its deployment on resource-constrained devices. How to further reduce the computation cost of the style learning and transferring process will be worthy to investigate.' |
| Open Datasets | No | The paper mentions: 'We randomly selected 10 content images and 15 style images, generating a total of 150 stylized images for each method'. However, it does not provide concrete access information (specific link, DOI, repository name, formal citation) for these content and style images, which appear to be internally selected for evaluation rather than a publicly available dataset. It also mentions 'Stable Diffusion 1.4 (Rombach et al. 2022)' and 'BLIP-2 (Li et al. 2023)' as base models/tools, but these are not the datasets used for their experiments. |
| Dataset Splits | No | The paper states: 'We randomly selected 10 content images and 15 style images, generating a total of 150 stylized images for each method by applying the Cartesian product to the content and style images.' This describes the composition of images for evaluation but does not specify training, validation, or test splits for model training, as the method fine-tunes on single style images. |
| Hardware Specification | Yes | Our model is trained on a single NVIDIA A6000 GPU with a batch size of 1 and a learning rate of 1e-6. |
| Software Dependencies | No | The paper mentions 'We employ Stable Diffusion 1.4 (Rombach et al. 2022) as our base model' and 'We use BLIP-2 (Li et al. 2023) to generate the text prompt for the content image.' These are specific models/frameworks, but the paper does not provide specific version numbers for underlying software dependencies like Python, PyTorch, or CUDA libraries. |
| Experiment Setup | Yes | Our model is trained on a single NVIDIA A6000 GPU with a batch size of 1 and a learning rate of 1e-6. The number of fine-tuning steps and time may vary slightly for each reference image, but on average, approximately 1500 steps are sufficient. We set k = 25 for time-aware attention swapping. |