Protocol Buffer

Chapter Goals:

Learn how protocol buffers are used in TensorFlow
Implement a function to convert a Python dictionary to a tf.train.Example object

A. TensorFlow protocol buffer

Since protocol buffers use a structured format when storing data, they can be represented with Python classes. In TensorFlow, the tf.train.Example class represents the protocol buffer used to store data for the input pipeline.

Each individual tf.train.Example object describes data for a single dataset observation (e.g. a single row in a data table). We convert raw data to a protocol buffer by initializing a tf.train.Example object with the data’s values.

When we initialize a tf.train.Example object, we need to set that object’s features argument to a tf.train.Features object. The tf.train.Features class is initialized by setting the feature field to a dictionary that maps feature names to feature values.

B. Features

Each feature value is represented by a tf.train.Featur object, which is initialized with exactly one of the following fields:

int64_list, for integer data values. Set with a tf.train.Int64List object.
float_list, for floating point data values. Set with a tf.train.FloatList object.
bytes_list, for byte string data values. Set with a tf.train.BytesList object.

The code below creates tf.train.Feature objects from data values. The encode function used in the last example converts the string to bytes, so the type is compatible with tf.train.BytesList.

import tensorflow as tf
def dict_to_example(data_dict, config):
    feature_dict = {}
    for feature_name, value in data_dict.items():
        feature_config = config[feature_name]
        shape = feature_config['shape']
        if shape == () or shape == []:
            value = [value]
        value_type = feature_config['type']
        if value_type == 'int':
            feature_dict[feature_name] = make_int_feature(value)
        elif value_type == 'float':
            feature_dict[feature_name] = make_float_feature(value)
        elif value_type == 'string' or value_type == 'bytes':
            feature_dict[feature_name] = make_bytes_feature(
              value, value_type)
    features = tf.train.Features(feature=feature_dict)
    return tf.train.Example(features=features)

What you'll learn in this course

Data Pipeline

Model Execution

Chapter Goals:

A. TensorFlow protocol buffer

B. Features

C. Bytes and text

Time to Code!