Computer vision is a specialized field within the realm of artificial intelligence that enables machines to process and extract information from visual data depicted in images and videos. Computer vision allows us to do highly interesting processing tasks, and neural style transfer is one of them.
Neural style transfer is a technique that combines the content of one image with the artistic style of another image. This results in a new image that depicts a painting-like phenomenon as if it was painted in the second image's style while preserving the original content of the first image.
In easier words, we blend the style of an image into the content of another. Neural style transfer works using:
Content of the first image
The artistic style of the second image
We can understand how neural style transfers work by understanding the concepts below.
We give two images to our model. The first one is the content of which we want to preserve, and the second one is the style we want to apply.
Note: By style, we refer to the visual patterns or texture present in an image.
The first thing we need to ensure is that our final image retains the crucial content of our content image. The content loss is calculated as the difference between the
The style loss is generally calculated by comparing the
Our goal here is to minimize the total loss i.e. the content and style loss. This can be done by adjusting pixels of the stylized image using iterative optimization techniques until our loss is minimized.
Therefore, the algorithm gradually results in the final stylized image, which is a high-quality blend of the two images.
TensorFlow is one of the most efficient deep learning frameworks and is adept at building and training various machine learning models.
We will be using TensorFlow Hub's pre-trained model "Arbitrary Image Stylization" for our code in this Answer.
The model variant used here is "arbitrary-image-stylization-v1-256/2." This model is designed for stylizing images based on artistic styles.
In this Answer, we will be demonstrating a Python implementation of a neural style transfer using TensorFlow and then Plotly to visualize the results.
import tensorflow as tfimport tensorflow_hub as hubimport numpy as npimport PIL.Imageimport plotly.graph_objects as goimport plotly.io as pio
First and foremost, we import the required modules for our code snippet. We use tensorflow
for its trained model, numpy
for numerical calculations, PIL
for image processing, and plotly
for visualizing results.
loadImage
methoddef loadImage(imagePath):maxDimension = 650img = tf.io.read_file(imagePath)img = tf.image.decode_image(img, channels=3)img = tf.image.convert_image_dtype(img, tf.float32)shape = tf.cast(tf.shape(img)[:-1], tf.float32)longDimension = max(shape)scale = maxDimension / longDimensionnewShape = tf.cast(shape * scale, tf.int32)img = tf.image.resize(img, newShape)img = img[tf.newaxis, :]return img
Our first method reads the image and, after processing, returns the tensor image. Let's dive deeper into how it does this.
maxDimension = 650
: Sets the maximum dimension of the output to 650 pixels.
img = tf.io.read_file(imagePath)
: Reads the image's data from the path imagePath
.
img = tf.image.decode_image(img, channels = 3)
: Decodes the binary image data into a tensor with RGB colors i.e. 3 channels.
img = tf.image.convert_image_dtype(img, tf.float32)
: Converts pixel values to tf.float32
data type since it's most commonly used in ML.
shape = tf.cast(tf.shape(img)[:-1], tf.float32)
: Calculates the image shape and converts it to a floating-point tensor.
longDimension = max(shape)
: Calculates the length of the image's longest side.
scale = maxDimension / longDimension
: Computes the scaling factor for resizing.
newShape = tf.cast(shape * scale, tf.int32)
: Computes new image dimensions after resizing.
img = tf.image.resize(img, newShape)
: Resizes the image tensor using newShape
.
img = img[tf.newaxis, :]
: Adds a batch dimension to the image tensor.
Returns our processed image tensor.
stylizeImages
methoddef stylizeImages(contentImagePath, styleImagePath):contentImage = loadImage(contentImagePath)styleImage = loadImage(styleImagePath)hubModel = hub.load('https://tfhub.dev/google/magenta/arbitrary-image-stylization-v1-256/2')stylizedImage = hubModel(tf.constant(contentImage), tf.constant(styleImage))[0]return stylizedImage
Moving on to our next method, we define stylizeImages
for producing our final stylized image. Let's see how we accomplish this.
contentImage = loadImage(contentImagePath)
: Loads and processes our content image using the loadImage
method we just defined.
styleImage = loadImage(styleImagePath)
: Does the same for the style image.
hubModel = hub.load('https://tfhub.dev/google/magenta/arbitrary-image-stylization-v1-256/2')
: Loads a neural style transfer model from TensorFlow Hub.
stylizedImage = hubModel(tf.constant(contentImage), tf.constant(styleImage))[0]
: Applies the style transfer model to both our processed content and style images. hubModel
takes two tensors representing these images, and it returns a list of stylized images. We only consider the first stylized image in this code using the [0]
index.
Returns the first stylized image tensor.
tfToPILImage
methoddef tfToPILImage(tensor):tensor = tensor * 255tensor = np.array(tensor, dtype=np.uint8)if np.ndim(tensor) > 3:assert tensor.shape[0] == 1tensor = tensor[0]return PIL.Image.fromarray(tensor)
We then define a method to convert our tf
tensor to a PIL
compatible image.
tensor = tensor * 255
: Scales the pixel values of the input tensor by 255 so that we can convert normalized pixel values back to the 0-255 range.
tensor = np.array(tensor, dtype=np.uint8)
: Converts the input tensor to a NumPy array since its a PIL requirement.
if np.ndim(tensor) > 3:
: Checks if the tensor has more than three dimensions to handle cases where the tensor has an extra batch dimension.
assert tensor.shape[0] == 1
: Raises an error if the first dimension isn't equal to 1, to ensure that the tensor represents a single image.
tensor = tensor[0]
: Keeps only the first image in the batch.
return PIL.Image.fromarray(tensor)
: Converts the NumPy array tensor
to an image in PIL.
plotStylizedImages
methoddef plotStylizedImages(contentImage, styleImage, finalImage):fig = go.Figure()fig.add_trace(go.Scatter(x=[0], y=[0], mode='markers', marker_opacity=0)) # Dummy trace for layoutfig.add_layout_image(source=contentImage,xref="x",yref="y",x=-0.1,y=0,sizex=0.35,sizey=0.35,xanchor="left",yanchor="top")fig.add_layout_image(source=styleImage,xref="x",yref="y",x=0.25,y=0,sizex=0.4,sizey=0.4,xanchor="left",yanchor="top")fig.add_layout_image(source=finalImage,xref="x",yref="y",x=0.7,y=0,sizex=0.4,sizey=0.4,xanchor="left",yanchor="top")fig.update_layout(xaxis=dict(showgrid=False, zeroline=False, range=[-0.1, 1]),yaxis=dict(showgrid=False, zeroline=False, range=[-0.5, 0.1]),width=1720,height=1000)return fig
We define a utility method called plotStylizedImages
to aid the data visualization aspect of our code. This function simply depicts the three images on a Plotly graph by customizing the display properties.
main
methoddef main():contentPath = 'izza.jpg'stylePath = 'starry_night.png'contentImage = PIL.Image.open(contentPath)styleImage = PIL.Image.open(stylePath)stylizedImage = stylizeImages(contentPath, stylePath)finalImage = tfToPILImage(stylizedImage)pio.write_html(plotStylizedImages(contentImage, styleImage, finalImage), 'output.html')finalImage.save("final-output.jpg")if __name__ == "__main__":main()
The main method is where the neural style transfer is actually triggered. It loads the content and style images using PIL, applies neural style transfer to get the final stylized image, and saves both the stylized image and the Plotly HTML output.
Voila! We've reached the end of the code explanation!
Here's the code we put forth, feel free to experiment with it and click on the "Run" button to see Plotly display the content, style, and final stylized image.
import tensorflow as tf import tensorflow_hub as hub import numpy as np import PIL.Image import plotly.graph_objects as go import plotly.io as pio def loadImage(imagePath): maxDimension = 650 img = tf.io.read_file(imagePath) img = tf.image.decode_image(img, channels=3) img = tf.image.convert_image_dtype(img, tf.float32) shape = tf.cast(tf.shape(img)[:-1], tf.float32) longDimension = max(shape) scale = maxDimension / longDimension newShape = tf.cast(shape * scale, tf.int32) img = tf.image.resize(img, newShape) img = img[tf.newaxis, :] return img def tfToPILImage(tensor): tensor = tensor * 255 tensor = np.array(tensor, dtype=np.uint8) if np.ndim(tensor) > 3: assert tensor.shape[0] == 1 tensor = tensor[0] return PIL.Image.fromarray(tensor) def stylizeImages(contentImagePath, styleImagePath): contentImage = loadImage(contentImagePath) styleImage = loadImage(styleImagePath) hubModel = hub.load('https://tfhub.dev/google/magenta/arbitrary-image-stylization-v1-256/2') stylizedImage = hubModel(tf.constant(contentImage), tf.constant(styleImage))[0] return stylizedImage def plotStylizedImages(contentImage, styleImage, finalImage): fig = go.Figure() fig.add_trace(go.Scatter(x=[0], y=[0], mode='markers', marker_opacity=0)) fig.add_layout_image( source=contentImage, xref="x", yref="y", x=-0.1, y=0, sizex=0.35, sizey=0.35, xanchor="left", yanchor="top" ) fig.add_layout_image( source=styleImage, xref="x", yref="y", x=0.25, y=0, sizex=0.4, sizey=0.4, xanchor="left", yanchor="top" ) fig.add_layout_image( source=finalImage, xref="x", yref="y", x=0.7, y=0, sizex=0.4, sizey=0.4, xanchor="left", yanchor="top" ) fig.update_layout( xaxis=dict(showgrid=False, zeroline=False, range=[-0.1, 1]), yaxis=dict(showgrid=False, zeroline=False, range=[-0.5, 0.1]), width=1720, height=1000 ) return fig def main(): contentPath = 'izza.jpg' stylePath = 'starry_night.png' contentImage = PIL.Image.open(contentPath) styleImage = PIL.Image.open(stylePath) stylizedImage = stylizeImages(contentPath, stylePath) finalImage = tfToPILImage(stylizedImage) pio.write_html(plotStylizedImages(contentImage, styleImage, finalImage), 'output.html') finalImage.save("final-output.jpg") if __name__ == "__main__": main()
Let's analyze the output of our code.
We first choose a content image. Since this is an image of a person, we would naturally want to preserve the person object in our image.
Next, we choose a style image. The artistic styles of this image will be applied to our content image.
The model minimizes the total loss and results in the following image.
That's it! We've learned how neural style transfer works using TensorFlow's model.
Conclusively, it's a fascinating technique that combines the content of one image with the artistic style of another image, resulting in a brand-new artistic composition. This field is still limited, but new advancements will allow more refined results and allow developers and artists to explore more creative aspects in the future.
The content image follows the concepts where
the object is to be focused on
The total loss should be
maximized
minimized
artistic styles are to be added
Free Resources