Image style transfer has become a key technique in modern photo-editing applications. Although significant progress has been made to blend content from one image with style from another image, the synthesized image may have a hallucinatory effect when the texture from the style image is rich when processing high-resolution image style transfer tasks. In this paper, we propose a novel attention mechanism, named compositional attention, to design a compositional transformer-based autoencoder (CTA) to solve this above-mentioned issue. With the support from this module, our model is capable of generating high-quality images when transferring from texture-riched style images to content images with semantics. Additionally, we embed region-based consistency terms in our loss function for ensuring internal structure semantic preservation in our synthesized image. Moreover, information theory-based CTA is discussed and Kullback–Leibler divergence loss is introduced to preserve more brightness information for photo-realistic style transfer. Extensive experimental results based on three benchmark datasets, namely Churches, Flickr Landscapes, and Flickr Faces HQ, confirmed excellent performance when compared to several state-of-the-art methods. Based on a user study assessment, the majority number of users, ranging from 61% to 66%, gave high scores on the transfer effects of our method compared to 9% users who supported the second best method. Further, for the questions of realism and style transfer quality, we achieved the best score, i.e., an average of 4.5 out of 5 compared to other style transfer methods.
Funding
The theory and key technologies of integrated network traffic between heaven and earth
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).