Exercise: Implementing OHE for a Categorical Feature

Learn to implement one-hot encoding for categorical features.

Using pandas to create one-hot encoding

In this exercise, we will “reverse engineer” the EDUCATION feature in the dataset to obtain the text labels that represent the different education levels, then show how to use pandas to create an OHE. As a preliminary step, please set up the environment and load progress from previous exercises:

import pandas as pd
import matplotlib as mpl #additional plotting functionality
mpl.rcParams['figure.dpi'] = 400 #high resolution figures
df_clean_2 = pd.read_csv('df_clean_2_01.csv')

First, let’s consider our EDUCATION feature before it was encoded as an ordinal. From the data dictionary, we know that 1 = graduate school, 2 = university, 3 = high school, 4 = others. We would like to recreate a column that has these strings, instead of numbers. Perform the following steps in the Jupyter notebook at the end of the lesson to complete the exercise.

  1. Create an empty column for the categorical labels called EDUCATION_CAT. Using the following command, every row will contain the string 'none':

    df_clean_2['EDUCATION_CAT'] = 'none'
  2. Examine the first few rows of the DataFrame for the EDUCATION and EDUCATION_CAT columns:

    df_clean_2[['EDUCATION', 'EDUCATION_CAT']].head(10)

    The output should appear as follows:

Get hands-on with 1200+ tech skills courses.