(Challenge) Cross-Validation and Feature Engineering

(Challenge) Cross-Validation and Feature Engineering

In this challenge, we’ll apply the knowledge of cross-validation and regularization that we’ve learned in this chapter to the case study data. We’ll perform basic feature engineering in order to estimate parameters for the regularized logistic regression model for the case study data, which is larger in size than the synthetic data that we’ve worked with, we’ll use the saga solver. In order to use this solver, and for the purpose of regularization, we’ll need to scale our data as part of the modeling process, leading us to the use of Pipeline class in scikit-learn. Once you have completed the activity, you should obtain an improved cross-validation test performance with the use of interaction features, as shown in the following diagram:

Expected output
Expected output

Note: We have already set up the environment, loaded the cleaned dataset, and included the required Python packages for you in the Notebook file.