What is Neural style transfer?
Overview
Neural style transfer (NST) is a deep learning application that uses convolutional neural networks (CNNs) to metamorphose a content image to make it seem viewed under the lens of a style image. The model is supplied with two input images- style and content images.
Note: The content image defines the content of the output image that should be maintained. The style image captures the pattern along which the content image should be changed to produce the output image.
How does NST work?
Convolutional networks
Convolutional networks consist of layers of filters where each filter has weights/parameters that remain constant during the training process. The filter weights perform convolution with a section of the input image representation. Consequently, each filter produces a filtered output representation as input to the next layer. These filters introduce non-linearity into the model.
- Maxpool layers: In the above-illustrated architecture, we use these layers to cut down the dimensionality of the feature maps to prevent overfitting of the image. Additionally, it provides the edge of reducing computational costs.
- Lower level layers: We use the lower filter layers to compile more delicate input image features, such as lines and other details.
- Higher level layers: We use the filter layers to build on the feature maps produced by lower filter layers and dwell on extracting more complex image details.
Content loss
We don't alter the parameters or weights of the filters present in each layer. Instead, we update the image using content loss as a distance metric between the content image and the generated image.
Content loss is defined by the equation below, where
Style loss
Style of an image is to extract the finer details of the input style image without paying much regard to the higher-level content that is depicted in the image. We utilize the output of lower-level filters which better capture such minute and fine-grained details required to mimic the style.
To better comprehend it, the output of each of the filters indicates the presence of detail, and when these outputs are
The Gram matrix for input style image
Similarly, the Gram matrix for the generated image
Consequently, the total style loss can be found by the accumulation of the mean squared errors as follows where
Total Loss
The style and content losses are combined and weighted to compute the total loss as follows, where
We can change these weights to pivot the amount of importance to the output image. For instance, greater values of
The total loss is computed and is back-propagated to produce another image with a seemingly lower loss. The process is repeated until convergence, and this eventually produces artistic and aptly transformed images.
Free Resources