Mean encoding in Python

Mean encoding is a technique for transforming categorical variables into numerical values based on the mean of the target variable. It’s particularly useful in classification problems, where the goal is to predict a target variable based on input features.

How does it work?

Consider a dataset like the one below with a categorical feature “City” and a binary target variable “Churn” indicating whether a customer churned.

Explanation

Line 1: Import the pandas library for data manipulation.
Lines 4–6: Create a sample data frame (df). The data frame has two columns: City and Churn.
Lines 8–10: This code computes the mean of the target_feature grouped by unique values in the cat_feature column and creates a new column by mapping these mean values to corresponding categories in the data frame df.
Line 12: Here we call the mean_encode function with the data frame df, specifying City as the categorical feature and Churn as the target variable.

Conclusion

Mean encoding in Python is a technique used to represent categorical variables numerically by replacing their values with the mean of the target variable. This helps capture relationships between categories and the target variable, making it useful for machine learning models.

CustomerID	City	Churn
1	New York	0
2	Paris	1
3	New York	1
4	Tokyo	1
5	Paris	1

CustomerID	City (Mean Encoded)	Churn
1	0.5	0
2	1	1
3	0.5	1
4	1	1
5	1	1

Mean encoding in Python

How does it work?

How do we implement it?

Import the libraries

Create a simple DataFrame

Define mean encoding function

Apply mean encoding

Example

Explanation

Conclusion