Neural style transfer with TensorFlow

Computer vision is a specialized field within the realm of artificial intelligence that enables machines to process and extract information from visual data depicted in images and videos. Computer vision allows us to do highly interesting processing tasks, and neural style transfer is one of them.

Neural style transfer

Neural style transfer is a technique that combines the content of one image with the artistic style of another image. This results in a new image that depicts a painting-like phenomenon as if it was painted in the second image's style while preserving the original content of the first image.

In easier words, we blend the style of an image into the content of another. Neural style transfer works using:

Content of the first image
The artistic style of the second image

Mechanism of neural style transfer

We can understand how neural style transfers work by understanding the concepts below.

Input

We give two images to our model. The first one is the content of which we want to preserve, and the second one is the style we want to apply.

Note: By style, we refer to the visual patterns or texture present in an image.

Content loss

The first thing we need to ensure is that our final image retains the crucial content of our content image. The content loss is calculated as the difference between the feature mapsfilters extracted from an image of the original image and the stylized one.

Style loss

The style loss is generally calculated by comparing the Gram matricesA gram matrix contains the correlations between feature maps of the feature maps between the original style image and the stylized image.

Optimization

Our goal here is to minimize the total loss i.e. the content and style loss. This can be done by adjusting pixels of the stylized image using iterative optimization techniques until our loss is minimized.

Therefore, the algorithm gradually results in the final stylized image, which is a high-quality blend of the two images.

Our first method reads the image and, after processing, returns the tensor image. Let's dive deeper into how it does this.

maxDimension = 650: Sets the maximum dimension of the output to 650 pixels.
img = tf.io.read_file(imagePath): Reads the image's data from the path imagePath.
img = tf.image.decode_image(img, channels = 3): Decodes the binary image data into a tensor with RGB colors i.e. 3 channels.
img = tf.image.convert_image_dtype(img, tf.float32): Converts pixel values to tf.float32 data type since it's most commonly used in ML.
shape = tf.cast(tf.shape(img)[:-1], tf.float32): Calculates the image shape and converts it to a floating-point tensor.
longDimension = max(shape): Calculates the length of the image's longest side.
scale = maxDimension / longDimension: Computes the scaling factor for resizing.
newShape = tf.cast(shape * scale, tf.int32): Computes new image dimensions after resizing.
img = tf.image.resize(img, newShape): Resizes the image tensor using newShape.
img = img[tf.newaxis, :]: Adds a batch dimension to the image tensor.
Returns our processed image tensor.

The `stylizeImages` method

Moving on to our next method, we define stylizeImages for producing our final stylized image. Let's see how we accomplish this.

contentImage = loadImage(contentImagePath): Loads and processes our content image using the loadImage method we just defined.
styleImage = loadImage(styleImagePath): Does the same for the style image.
hubModel = hub.load('https://tfhub.dev/google/magenta/arbitrary-image-stylization-v1-256/2'): Loads a neural style transfer model from TensorFlow Hub.
stylizedImage = hubModel(tf.constant(contentImage), tf.constant(styleImage))[0]: Applies the style transfer model to both our processed content and style images. hubModel takes two tensors representing these images, and it returns a list of stylized images. We only consider the first stylized image in this code using the [0] index.
Returns the first stylized image tensor.

The `tfToPILImage` method

We then define a method to convert our tf tensor to a PIL compatible image.

tensor = tensor * 255: Scales the pixel values of the input tensor by 255 so that we can convert normalized pixel values back to the 0-255 range.
tensor = np.array(tensor, dtype=np.uint8): Converts the input tensor to a NumPy array since its a PIL requirement.
if np.ndim(tensor) > 3:: Checks if the tensor has more than three dimensions to handle cases where the tensor has an extra batch dimension.
assert tensor.shape[0] == 1: Raises an error if the first dimension isn't equal to 1, to ensure that the tensor represents a single image.
tensor = tensor[0]: Keeps only the first image in the batch.
return PIL.Image.fromarray(tensor): Converts the NumPy array tensor to an image in PIL.

The `plotStylizedImages` method

def plotStylizedImages(contentImage, styleImage, finalImage):
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=[0], y=[0], mode='markers', marker_opacity=0))  # Dummy trace for layout
    fig.add_layout_image(
        source=contentImage,
        xref="x",
        yref="y",
        x=-0.1,
        y=0,
        sizex=0.35,
        sizey=0.35,
        xanchor="left",
        yanchor="top"
    )
    fig.add_layout_image(
        source=styleImage,
        xref="x",
        yref="y",
        x=0.25,
        y=0,
        sizex=0.4,
        sizey=0.4,
        xanchor="left",
        yanchor="top"
    )
    fig.add_layout_image(
        source=finalImage,
        xref="x",
        yref="y",
        x=0.7,
        y=0,
        sizex=0.4,
        sizey=0.4,
        xanchor="left",
        yanchor="top"
    )
    fig.update_layout(
        xaxis=dict(showgrid=False, zeroline=False, range=[-0.1, 1]),
        yaxis=dict(showgrid=False, zeroline=False, range=[-0.5, 0.1]),  
        width=1720,  
        height=1000
    )
    return fig

import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import PIL.Image
import plotly.graph_objects as go
import plotly.io as pio

def loadImage(imagePath):
    
    maxDimension = 650
    img = tf.io.read_file(imagePath)
    img = tf.image.decode_image(img, channels=3)
    img = tf.image.convert_image_dtype(img, tf.float32)

    shape = tf.cast(tf.shape(img)[:-1], tf.float32)
    longDimension = max(shape)
    scale = maxDimension / longDimension

    newShape = tf.cast(shape * scale, tf.int32)

    img = tf.image.resize(img, newShape)
    img = img[tf.newaxis, :]
    return img

def tfToPILImage(tensor):
    tensor = tensor * 255

    tensor = np.array(tensor, dtype=np.uint8)
    if np.ndim(tensor) > 3:
        assert tensor.shape[0] == 1
        tensor = tensor[0]

    return PIL.Image.fromarray(tensor)
    
def stylizeImages(contentImagePath, styleImagePath):
    contentImage = loadImage(contentImagePath)
    styleImage = loadImage(styleImagePath)

    hubModel = hub.load('https://tfhub.dev/google/magenta/arbitrary-image-stylization-v1-256/2')

    stylizedImage = hubModel(tf.constant(contentImage), tf.constant(styleImage))[0]

    return stylizedImage


def plotStylizedImages(contentImage, styleImage, finalImage):
    fig = go.Figure()

    fig.add_trace(go.Scatter(x=[0], y=[0], mode='markers', marker_opacity=0)) 

    fig.add_layout_image(
        source=contentImage,
        xref="x",
        yref="y",
        x=-0.1,
        y=0,
        sizex=0.35,
        sizey=0.35,
        xanchor="left",
        yanchor="top"
    )
    fig.add_layout_image(
        source=styleImage,
        xref="x",
        yref="y",
        x=0.25,
        y=0,
        sizex=0.4,
        sizey=0.4,
        xanchor="left",
        yanchor="top"
    )
    fig.add_layout_image(
        source=finalImage,
        xref="x",
        yref="y",
        x=0.7,
        y=0,
        sizex=0.4,
        sizey=0.4,
        xanchor="left",
        yanchor="top"
    )

    fig.update_layout(
        xaxis=dict(showgrid=False, zeroline=False, range=[-0.1, 1]),  
        yaxis=dict(showgrid=False, zeroline=False, range=[-0.5, 0.1]),
        width=1720,  
        height=1000
    )
    return fig


def main():
    contentPath = 'izza.jpg'
    stylePath = 'starry_night.png'

    contentImage = PIL.Image.open(contentPath)
    styleImage = PIL.Image.open(stylePath)

    stylizedImage = stylizeImages(contentPath, stylePath)
    finalImage = tfToPILImage(stylizedImage)

    pio.write_html(plotStylizedImages(contentImage, styleImage, finalImage), 'output.html')
    finalImage.save("final-output.jpg")

if __name__ == "__main__":
    main()

Neural style transfer with TensorFlow