Neural style transfer (NST) is a deep learning application that uses convolutional neural networks (CNNs) to metamorphose a content image to make it seem viewed under the lens of a style image. The model is supplied with two input images- style and content images.
Note: The content image defines the content of the output image that should be maintained. The style image captures the pattern along which the content image should be changed to produce the output image.
Convolutional networks consist of layers of filters where each filter has weights/parameters that remain constant during the training process. The filter weights perform convolution with a section of the input image representation. Consequently, each filter produces a filtered output representation as input to the next layer. These filters introduce non-linearity into the model.
We don't alter the parameters or weights of the filters present in each layer. Instead, we update the image using content loss as a distance metric between the content image and the generated image.
Content loss is defined by the equation below, where
Style of an image is to extract the finer details of the input style image without paying much regard to the higher-level content that is depicted in the image. We utilize the output of lower-level filters which better capture such minute and fine-grained details required to mimic the style.
To better comprehend it, the output of each of the filters indicates the presence of detail, and when these outputs are
The Gram matrix for input style image
Similarly, the Gram matrix for the generated image
Consequently, the total style loss can be found by the accumulation of the mean squared errors as follows where
The style and content losses are combined and weighted to compute the total loss as follows, where
We can change these weights to pivot the amount of importance to the output image. For instance, greater values of
The total loss is computed and is back-propagated to produce another image with a seemingly lower loss. The process is repeated until convergence, and this eventually produces artistic and aptly transformed images.