Efficient Hi-Fi Style Transfer via Statistical Attention and Modulation
Authors: Zhirui Fang, Yi Li, Xin Xie, Chengyan Li, Yanqing Guo
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our method significantly improves the inference speed and the quality of style transfer while preserving content details, outperforming existing approaches based on both convolution and diffusion. Extensive quantitative and qualitative experiments, conducted on a comprehensive dataset of 800 stylized images, demonstrate that our proposed SRCA-SM framework significantly outperforms state-of-the-art convolutional and diffusion-based methods in terms of Art FID, LPIPS, CSFD, and computational efficiency. |
| Researcher Affiliation | Academia | Zhirui Fang1, Yi Li1 , Xin Xie1, Chengyan Li1, Yanqing Guo1 1Dalian University of Technology EMAIL, EMAIL |
| Pseudocode | No | The paper describes the proposed method (SRCA-SM) using mathematical formulations and textual descriptions, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements regarding the release of source code or provide any links to code repositories. |
| Open Datasets | Yes | We utilize the MS-COCO dataset [Phillips and Mackintosh, 2011] for content images and the Wiki Art dataset [Phillips and Mackintosh, 2011] for style images. |
| Dataset Splits | No | During the training phase, all images are randomly cropped to a fixed resolution of 256x256 pixels, whereas during testing, images of arbitrary resolution are supported. The evaluation is conducted on a comprehensive dataset consisting of 20 content images and 40 style images, resulting in a total of 800 stylized images. This describes data preprocessing and test set composition, but not explicit train/validation/test splits for the main model training. |
| Hardware Specification | No | The paper discusses inference speed and computational efficiency, but does not provide any specific hardware details such as GPU or CPU models used for experiments. |
| Software Dependencies | No | The model is optimized using the Adam optimizer [Kingma, 2014], but no other software dependencies or their specific version numbers are mentioned. |
| Experiment Setup | Yes | The model is optimized using the Adam optimizer [Kingma, 2014], with an initial learning rate of 0.0001 and a warm-up strategy for adjustment. The batch size is set to 8, and the network undergoes a total of 320,000 iterations during training.The loss function incorporates multiple terms, where the weights λs, λc,λidentity1,λidentity2 and λcontra are set to 10, 8, 70, 1 and 0.1, respectively, ensuring a balanced yet flexible contribution from style loss, content loss, and contrastive loss. |