Ordinal encoding in Python

Ordinal encoding is a technique used to convert categorical dataCategorical data consists of fixed and distinct values, organizing information into specific groups without a numerical order. into numerical features. In ordinal encoding, each unique category is assigned a numerical value based on its position in the ordered sequence. The assigned values retain the ordinal relationship between the categories, allowing the model to understand and leverage the inherent order during training.

Consider a scenario where the categorical variable represents colors such as red, green, and blue. These categories can be mapped to numerical values like 1, 2, and 3 using ordinal encoding.

Colors

Encoded colors

Red

1

Green

2

Blue

3

Steps

To execute ordinal encoding in Python, the following steps are typically followed.

1. Installation

The first step is to install the scikit-learn library to use the OrdinalEncoder package as follows:

pip install -U scikit-learn

The -U flag is used to upgrade a package to the latest version available.

2. Importing the libraries

The next step is to import the required libraries.

import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

3. Creating a simple DataFrame

In this step, we create a simple DataFrame, as shown below. We can also import our dataset.

colors = {'Colors': ['Red', 'Green', 'Blue']}
df = pd.DataFrame(colors)

4. Initializing the OrdinalEncoder class

We then initialize an instance of the OrdinalEncoder class and store it in the encoder variable as follows:

encoder = OrdinalEncoder()

5. Transforming the categorical data

In this step, we pass the Colors column to the fit_transform function to perform ordinal encoding, as shown below:

df['Colors_Encoded'] = encoder.fit_transform(df[['Colors']])

Note: The OrdinalEncoder package can encode multiple columns simultaneously.

Example

The following code shows how we can use the OrdinalEncoder package in Python:

# Import necessary libraries
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder
# Create a sample DataFrame
colors = {'Colors': ['Red', 'Green', 'Blue']}
df = pd.DataFrame(colors)
# Print the original DataFrame
print("Original DataFrame Before Ordinal Encoding:")
print(df)
# Initialize the OrdinalEncoder
encoder = OrdinalEncoder()
# Fit and transform the 'Colors' column using ordinal encoding
df['Colors_Encoded'] = encoder.fit_transform(df[['Colors']])
# Display the DataFrame with the encoded column
print("\nDataFrame after Ordinal Encoding:")
print(df)

Explanation

  • Lines 2–3: We import the required libraries, including pandas for data manipulation and the OrdinalEncoder package from the scikit-learn library for ordinal encoding.

  • Line 6: We create a sample DataFrame (df) with a categorical column named Colors.

  • Line 14: We initialize the OrdinalEncoder class.

  • Line 17: We fit and transform the Colors column using the ordinal encoding. The transformed values are stored in a new column named Colors_Encoded.

  • Lines 20–21: We display the DataFrame after applying ordinal encoding to observe the changes.

Copyright ©2024 Educative, Inc. All rights reserved