Scaling Data, Pipelines, and Interaction Features in Scikit-Learn
Explore how to scale data using MinMaxScaler and incorporate scaling into logistic regression models with scikit-learn pipelines. Understand the importance of pipelines in proper cross-validation and learn to engineer interaction features to enhance model performance while managing overfitting risks.
We'll cover the following...
Scaling data
Compared to the synthetic data we were just working with, the case study data is relatively large. If we want to use L1 regularization, then according to the scikit-learn documentation, we ought to use the saga solver. However, this solver is not robust to unscaled datasets. Therefore, we need to be sure to scale the data. This is also a good idea whenever doing regularization, so all the features are on the same scale and are equally penalized by the regularization process.
A simple way ...