Search⌘ K
AI Features

Defining a tf.data.Dataset

Explore how to build TensorFlow data pipelines for image captioning by defining helper functions to load and preprocess images, generate tokenizers for captions, and prepare batched datasets suitable for transformer training. Understand the flow from raw data to inputs and targets for model training.

Helper functions

Now, let’s look at how we can create a tf.data.Dataset using the data. We’ll first write a few helper functions. Namely, we’ll define:

  • parse_image() to load and process an image from a filepath.

  • generate_tokenizer() to generate a tokenizer trained on the data passed to the function.

The parse_image() function

First, let’s discuss the parse_image() function. It takes three arguments:

  • filepath: Location of the image

  • resize_height: Height to resize the image to

  • resize_width: Width to resize the image to

The function is defined as follows:

def parse_image(filepath, resize_height, resize_width):
""" Reading an image from a given filepath """
# Reading the image
image = tf.io.read_file(filepath)
# Decode the JPEG and make sure there are three channels in the output
image = tf.io.decode_jpeg(image, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
# Resize the image to 224x224
image = tf.image.resize(image, [resize_height, resize_width])
# Bring pixel values to [-1, 1]
image = image*2.0 - 1.0
return image
Read image from the path of the file

We are mostly relying on tf.image functions to load and process the image. This function specifically:

...