Search⌘ K

Redundant Variables and Machine Learning

Explore how to manage redundant variables in categorical features by dropping unnecessary columns before modeling. Understand the effects of lasso and ridge regression on feature selection and learn best practices to improve model performance and interpret coefficients accurately.

In our previous lesson, we observed (in the day column) that if three days are 0, the fourth must be 1. We don’t need an extra column in all the cases; these are redundant. Therefore, it’s recommended to drop the redundant variables first. Otherwise, lasso reduces them to zero, even with a mild regularization strength.

Drop redundant variables

Let’s drop all the ...