Categorical Data

This lesson discusses what is categorical data and how Pandas provide support dealing with it.

Introduction to categorical data #

Sometimes you get categorical data which are variables with a limited and usually fixed number of values. For example, male and female. Machine learning algorithms need numbers to work, so how do you deal with these? We will discuss two ways:

  • Label encoding
  • One-hot encoding a.k.a. dummy variables.

Dealing with categorical data #

Label encoding #

Label encoding works by converting the unique values to a numeric representation. For example, if we have two categories male and female, we can categorize them as numbers:

  • male as 0
  • female 1

Pandas provides an easy way to do this by using the category type.

Get hands-on with 1200+ tech skills courses.