Feature Columns
Explore how to define and implement TensorFlow feature columns, including numeric and categorical types, to efficiently convert raw data into model inputs. Understand vocabulary handling, indicator columns, and how to build feature columns for production-ready machine learning pipelines.
We'll cover the following...
Chapter Goals:
- Learn about feature columns and how they’re used
- Implement a function that creates a list of feature columns
A. Overview
Before we get into using a dataset of parsed protocol buffers, we need to first discuss feature columns. In TensorFlow, a feature column is how we specify what kind of data a feature contains. In this chapter, we’ll focus on the two most common types of feature data: numeric and categorical data.
Feature columns are incredibly useful for converting raw data into an input layer for a machine learning model. Once we have a list of feature columns, we can use them to combine tf.Tensor and tf.SparseTensor feature data into a single input layer. We’ll discuss more of this in the next chapter.
B. Numeric features
For numeric features, we create a feature column using tf.feature_column.numeric_feature. The function takes in the feature name as a required argument.
In the example above, nc represents a numeric feature column for the feature called 'GPA'. We used the shape keyword argument to specify that the feature must be 1-D and contain 5 elements. We also set the feature’s datatype to tf.float32.
Other less commonly used keyword arguments for the function are default_value and normalizer_fn.
The default_value keyword argument sets the default value for the feature column if the ...