What is Neural style transfer?

Overview

Neural style transfer (NST) is a deep learning application that uses convolutional neural networks (CNNs) to metamorphose a content image to make it seem viewed under the lens of a style image. The model is supplied with two input images- style and content images.

Note: The content image defines the content of the output image that should be maintained. The style image captures the pattern along which the content image should be changed to produce the output image.

Lower level layers: We use the lower filter layers to compile more delicate input image features, such as lines and other details.
Higher level layers: We use the filter layers to build on the feature maps produced by lower filter layers and dwell on extracting more complex image details.

Content loss

We don't alter the parameters or weights of the filters present in each layer. Instead, we update the image using content loss as a distance metric between the content image and the generated image.

Content loss is defined by the equation below, where $O$ represents the output image and $I$ represents the input image:

Style loss

Style of an image is to extract the finer details of the input style image without paying much regard to the higher-level content that is depicted in the image. We utilize the output of lower-level filters which better capture such minute and fine-grained details required to mimic the style.

To better comprehend it, the output of each of the filters indicates the presence of detail, and when these outputs are correlatedScalar product of two vectors each representing the output of a filter channel, it produces a corresponding Gram matrix.

The Gram matrix for input style image $S$ can be equated as the following where $j$ represents the width of the $i^{th}$ channel, $L$ , the total number of layers, $c$ and $c'$ respectively two different filters.

We can change these weights to pivot the amount of importance to the output image. For instance, greater values of $\alpha$ would mean that higher importance is to be given to content preservation, whereas higher values of $\beta$ would imply greater emphasis to style in the final output.

The total loss is computed and is back-propagated to produce another image with a seemingly lower loss. The process is repeated until convergence, and this eventually produces artistic and aptly transformed images.

What is Neural style transfer?

Overview

How does NST work?

Convolutional networks

Content loss

Style loss

Total Loss