Exercise: Finding Appropriate Features for Logistic Regression
Explore how to assess the suitability of the PAY_1 feature for logistic regression by examining the log odds of default probabilities. Understand the assumptions of logistic regression and discover how to interpret the relationship between feature values and response variables for better model accuracy.
We'll cover the following...
In the Visualizing Features and Response Variable Relationship exercise, we plotted a groupby/mean of what might be the most important feature of the model, according to our exploration so far: the PAY_1 feature. By grouping samples by the values of PAY_1, and then looking at the mean of the response variable, we are effectively looking at the probability, p, of default within each of these groups.
Examining the log odds of default within groups
In this exercise, we will evaluate the appropriateness of PAY_1 for logistic regression. We will do this by examining the log odds of default within these groups to see whether the response variable is linear in the log odds, as logistic regression formally assumes. Perform the following steps to complete the exercise:
-
In the following code, reviewing the ...