Ordinal encoding in Python

Ordinal encoding is a technique used to convert categorical dataCategorical data consists of fixed and distinct values, organizing information into specific groups without a numerical order. into numerical features. In ordinal encoding, each unique category is assigned a numerical value based on its position in the ordered sequence. The assigned values retain the ordinal relationship between the categories, allowing the model to understand and leverage the inherent order during training.

Consider a scenario where the categorical variable represents colors such as red, green, and blue. These categories can be mapped to numerical values like 1, 2, and 3 using ordinal encoding.

# Import necessary libraries
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder
# Create a sample DataFrame
colors = {'Colors': ['Red', 'Green', 'Blue']}
df = pd.DataFrame(colors)
# Print the original DataFrame
print("Original DataFrame Before Ordinal Encoding:")
print(df)
# Initialize the OrdinalEncoder
encoder = OrdinalEncoder()
# Fit and transform the 'Colors' column using ordinal encoding
df['Colors_Encoded'] = encoder.fit_transform(df[['Colors']])
# Display the DataFrame with the encoded column
print("\nDataFrame after Ordinal Encoding:")
print(df)

Explanation

Lines 2–3: We import the required libraries, including pandas for data manipulation and the OrdinalEncoder package from the scikit-learn library for ordinal encoding.
Line 6: We create a sample DataFrame (df) with a categorical column named Colors.
Line 14: We initialize the OrdinalEncoder class.
Line 17: We fit and transform the Colors column using the ordinal encoding. The transformed values are stored in a new column named Colors_Encoded.
Lines 20–21: We display the DataFrame after applying ordinal encoding to observe the changes.