...

/

Categorical encoding

Categorical encoding

In this lesson, we will explore an important concept in feature engineering called categorical encoding, what it is and what its importance is, the different encoding techniques, and how to install the library that provides these techniques.

Definition

Most machine learning algorithms and deep neural networks work with numerical inputs. In other words, building models and networks that function requires encoding any actual categorical data in the beginning.

The goal of categorical encoding is to create predictive and informative numerical variables from the categorical variables in our dataset to build, train and evaluate the machine learning model and improve its performance.

Several techniques exist for this kind of data transformation; to name a few:

Traditional Techniques

  • One-hot encoding
  • Count or frequency encoding
  • Ordinal or label encoding

Monotonic Relationship

  • Ordered label encoding
  • Mean encoding
  • Probability ratio encoding
  • Weight of evidence

Alternative Techniques

  • Rare labels encoding
  • Binary encoding

To encode categorical variables, we need a library called category_encoders, which contains many basic and advanced methods for categorical variable encoding. You can use the following commands to install it:

# using pip
pip install category_encoders
# using conda
conda install -c conda-forge category_encoders

We will use the following data sample for our encodings:

Access this course and 1200+ top-rated courses and projects.