# Encode Categorical Data

Discover the basics of categorical data and how to encode it in pandas.

## We'll cover the following

## Overview of categorical data

Beyond numerical data, categorical data is another data type that is commonly seen in real-world datasets. **Categorical data** refers to data that can be divided into distinct groups based on specific characteristics and take on a limited number of unique values. It can be distinguished into two general types:

**Nominal categorical data**is where there is no order or ranking between categories. Examples of nominal data include gender, blood type, and hair color.**Ordinal categorical data**is where the categories can be ordered or ranked. Examples of ordinal data include educational degrees (e.g., high school, bachelor's, master's), income groups (e.g., low, medium, high), and star ratings (e.g., one star to five stars).

## Categorical data in `pandas`

In `pandas`

, the categorical data type is represented as a `Categorical`

object where the `dtype='category'`

. A unique property of the `Categorical`

data type is that although it appears like an array of string values, its internal data structure is represented by an array of integers that points to these categories. This feature results in the benefits of optimizing memory usage and improving performance for computations involving categorical data.

Suppose we have the following truncated credit card dataset that represents the demographics of a group of credit card holders:

Get hands-on with 1200+ tech skills courses.