Efficient Hi-Fi Style Transfer via Statistical Attention and Modulation

Authors: Zhirui Fang, Yi Li, Xin Xie, Chengyan Li, Yanqing Guo

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our method significantly improves the inference speed and the quality of style transfer while preserving content details, outperforming existing approaches based on both convolution and diffusion. Extensive quantitative and qualitative experiments, conducted on a comprehensive dataset of 800 stylized images, demonstrate that our proposed SRCA-SM framework significantly outperforms state-of-the-art convolutional and diffusion-based methods in terms of Art FID, LPIPS, CSFD, and computational efficiency.
Researcher Affiliation Academia Zhirui Fang1, Yi Li1 , Xin Xie1, Chengyan Li1, Yanqing Guo1 1Dalian University of Technology EMAIL, EMAIL
Pseudocode No The paper describes the proposed method (SRCA-SM) using mathematical formulations and textual descriptions, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements regarding the release of source code or provide any links to code repositories.
Open Datasets Yes We utilize the MS-COCO dataset [Phillips and Mackintosh, 2011] for content images and the Wiki Art dataset [Phillips and Mackintosh, 2011] for style images.
Dataset Splits No During the training phase, all images are randomly cropped to a fixed resolution of 256x256 pixels, whereas during testing, images of arbitrary resolution are supported. The evaluation is conducted on a comprehensive dataset consisting of 20 content images and 40 style images, resulting in a total of 800 stylized images. This describes data preprocessing and test set composition, but not explicit train/validation/test splits for the main model training.
Hardware Specification No The paper discusses inference speed and computational efficiency, but does not provide any specific hardware details such as GPU or CPU models used for experiments.
Software Dependencies No The model is optimized using the Adam optimizer [Kingma, 2014], but no other software dependencies or their specific version numbers are mentioned.
Experiment Setup Yes The model is optimized using the Adam optimizer [Kingma, 2014], with an initial learning rate of 0.0001 and a warm-up strategy for adjustment. The batch size is set to 8, and the network undergoes a total of 320,000 iterations during training.The loss function incorporates multiple terms, where the weights λs, λc,λidentity1,λidentity2 and λcontra are set to 10, 8, 70, 1 and 0.1, respectively, ensuring a balanced yet flexible contribution from style loss, content loss, and contrastive loss.