Defining a tf.data.Dataset
Explore how to build TensorFlow data pipelines for image captioning by defining helper functions to load and preprocess images, generate tokenizers for captions, and prepare batched datasets suitable for transformer training. Understand the flow from raw data to inputs and targets for model training.
Helper functions
Now, let’s look at how we can create a tf.data.Dataset using the data. We’ll first write a few helper functions. Namely, we’ll define:
parse_image()to load and process an image from afilepath.generate_tokenizer()to generate a tokenizer trained on the data passed to the function.
The parse_image() function
First, let’s discuss the parse_image() function. It takes three arguments:
filepath: Location of the imageresize_height: Height to resize the image toresize_width: Width to resize the image to
The function is defined as follows:
def parse_image(filepath, resize_height, resize_width):""" Reading an image from a given filepath """# Reading the imageimage = tf.io.read_file(filepath)# Decode the JPEG and make sure there are three channels in the outputimage = tf.io.decode_jpeg(image, channels=3)image = tf.image.convert_image_dtype(image, tf.float32)# Resize the image to 224x224image = tf.image.resize(image, [resize_height, resize_width])# Bring pixel values to [-1, 1]image = image*2.0 - 1.0return image
We are mostly relying on tf.image functions to load and process the image. This function specifically: