Defining a tf.data.Dataset
Learn to create the TensorFlow data pipeline.
We'll cover the following...
Helper functions
Now, let’s look at how we can create a tf.data.Dataset using the data. We’ll first write a few helper functions. Namely, we’ll define:
- parse_image()to load and process an image from a- filepath.
- generate_tokenizer()to generate a tokenizer trained on the data passed to the function.
The parse_image() function
First, let’s discuss the parse_image() function. It takes three arguments:
- filepath: Location of the image
- resize_height: Height to resize the image to
- resize_width: Width to resize the image to
The function is defined as follows:
def parse_image(filepath, resize_height, resize_width):""" Reading an image from a given filepath """# Reading the imageimage = tf.io.read_file(filepath)# Decode the JPEG and make sure there are three channels in the outputimage = tf.io.decode_jpeg(image, channels=3)image = tf.image.convert_image_dtype(image, tf.float32)# Resize the image to 224x224image = tf.image.resize(image, [resize_height, resize_width])# Bring pixel values to [-1, 1]image = image*2.0 - 1.0return image
We are mostly relying on tf.image functions to load and process the image. This function specifically:
- Reads the image from the - filepath
- Decodes the bytes in the JPEG image to a - uint8tensor and converts to a ...