Completeness and Coherence Learning for Fast Arbitrary Style Transfer

Authors: Zhijie Wu, Chunjin Song, Guanxiong Chen, Sheng Guo, Weilin Huang

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through an empirical evaluation, we demonstrate that compared with existing methods, our method strikes a better tradeoff between computation costs, generalization ability and stylization quality. First we compare our approach with several state-of-the-art methods qualitatively and quantitatively. Note that all of our results on the baselines are obtained from publicly available, pre-trained models under their default settings. Then we show results from our ablation study in which we investigate the impact of several design decisions.
Researcher Affiliation Collaboration Zhijie Wu EMAIL Department of Computer Science University of British Columbia Chunjin Song EMAIL Department of Computer Science University of British Columbia Guanxiong Chen EMAIL Department of Computer Science University of British Columbia Sheng Guo B EMAIL MYbank, Ant Group Weilin Huang EMAIL Alibaba Group
Pseudocode No The paper includes figures (e.g., Fig. 3, 4, 5, 16, 17) and mathematical equations to describe the methodology, but there are no explicitly labeled pseudocode blocks or algorithm sections with structured steps.
Open Source Code No We will release our source code upon publication.
Open Datasets Yes We use 80, 000 images from MS-COCO (Lin et al., 2014) and 80, 000 images from Wiki Art (Nichol, 2016) as the content and style dataset respectively for training. Note that the MS-COCO (Lin et al., 2014) and Wiki Art (Nichol, 2016) dataset in our experiments are public and under a Creative Commons Attribution 4.0 License, which permits us to distribute, remix, tweak, and build upon them.
Dataset Splits No The paper states: "During training, first we resize the smaller dimension of each image to 512 but keep the initial ratio. Then we randomly crop a region of size 256 x 256." It also mentions using a "test set" for evaluation ("We randomly pick 20 content images and 30 style images from our test set"), but does not specify the train/test/validation splits for the entire MS-COCO and Wiki Art datasets.
Hardware Specification Yes One limitation of this work is that our CCNet is patch-based, thus is relatively weaker than statistics-based alternatives in capturing and augmenting information at different scales. Another limitation is that the complexity of the computed affinity matrices is O(n2), making CCNet limited to a pair of 2048 x 1024 images on a Titan X GPU with 12GB memory.
Software Dependencies No The paper mentions using "Adam optimizer (Kingma & Ba, 2015)" and a "pre-trained VGG network (Simonyan & Zisserman, 2015)", but does not specify version numbers for these or any other software components (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes We use 80, 000 images from MS-COCO (Lin et al., 2014) and 80, 000 images from Wiki Art (Nichol, 2016) as the content and style dataset respectively for training. We initialize the encoder with a pre-trained VGG network (Simonyan & Zisserman, 2015) and freeze it during training. As far as the decoder, we take the same setting from Huang & Belongie (2017). We also apply the Adam optimizer (Kingma & Ba, 2015) with batch size set to four image pairs, and learning rate set to 1e-4 for 200K iterations. During training, first we resize the smaller dimension of each image to 512 but keep the initial ratio. Then we randomly crop a region of size 256 x 256. But in testing an input image can be of any size. Throughout our experiments, we set λ1 id, λ2 id, λcom cc and λcoh cc respectively to 50, 1, 300 and 5.