Search⌘ K
AI Features

Encoding

Explore how encoding transforms categorical features into numerical data suitable for machine learning algorithms. Understand when and how to use LabelEncoder, OneHotEncoder, and OrdinalEncoder within scikit-learn to prepare your dataset effectively for classification and other ML tasks.

Encoding refers to the process of converting categorical features into numerical features so that ML algorithms can use them. Categorical features can take on a limited number of values and are unordered, making them difficult for algorithms to handle. By encoding these features, we can convert them into numerical representations that can be useful for ML algorithms.

It’s common in ML to have categorical features—such as “Sex,” “Zip code,” and “Profession”—that need to be transformed before they can be ingested by an ML algorithm. The table below features this type of categorical data:

Name

Sex

Zip code

Profession

John Smith

Male

12345

Engineer

Amy Johnson

Female

67890

Teacher

Michael Davis

Male

54321

Doctor

Sarah Miller

Female

98765

Accountant

The scikit-learn library provides several tools for encoding features, including LabelEncoder, OneHotEncoder, and OrdinalEncoder.

The LabelEncoder method

The LabelEncoder method assigns integer values to each category, starting from 00. For example, it would convert “male” to 00 and “female” to 11 ...