Feature Engineering: One-Hot Encoding
Explore how to apply one-hot encoding for feature engineering in customer churn prediction. Understand the importance of encoding categorical variables correctly, handling redundant features, and preparing data for supervised machine learning. This lesson guides you through separating categorical and numerical data and implementing one-hot encoding using pandas to enhance model accuracy.
In this lesson, we’ll start applying feature engineering techniques to our telecom customer dataset. We’ll create, format, and encode features for the supervised machine learning algorithm that will predict customer churn.
Encoding
We have 15 categorical features to encode because the machine learning algorithm we’ll use expects all parameters to be numeric. We will use one-hot encoding for this. The encoding will be performed as follows.
Feature Details
Feature | Encoding |
Gender | 0: Female 1: Male |
SeniorCitizen | 0: age <= 65 1: age > 65 |
Partner | 0: No 1: Yes |
Dependents | 0: No 1: Yes |
PhoneService | 0: No 1: Yes |
MultipleLines | 0: No 1: Yes 2: No phone service |
InternetService | 0: No 1: DSL 2: Fiber optic |
OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies | 0: No 1: No internet service 2: Yes |
PaperlessBilling | 0: No 1: Yes |
PaymentMethod | 0: Bank transfer (automatic) 1: Credit card (automatic) 2: Electronic check 3: Mailed check |
One-hot encoding
One-hot encoding creates a binary column for each category, but only the active category is set to 1 and all the other columns are set to 0. ...