Assumptions of Logistic Regression

Explore the fundamental assumptions behind logistic regression models such as linearity in the log odds, absence of multicollinearity, independence of observations, and handling of outliers. This lesson helps learners recognize when logistic regression is appropriate and how these assumptions affect model results, enabling better diagnostic and modeling decisions in data science projects.

We'll cover the following...

The four assumptions of logistic regression
No outliers
How many features should you include?

Because it is a classical statistical model, similar to the F-test and Pearson correlation we already examined, logistic regression makes certain assumptions about the data. While it’s not necessary to follow every one of these assumptions in the strictest possible sense, it’s good to be aware of them. That way, if a logistic regression model is not performing very well, you can try to investigate and figure out why, using your knowledge of the ideal situation that logistic regression is intended for. You may find slightly different lists of the specific assumptions from different resources. However, those that are listed here are widely accepted.

The four assumptions of logistic regression

Here are the four most widely accepted assumptions of logistic regression.

Features are linear in the log odds

Logistic regression is a linear model, so it will only work well as long as the features are effective at describing a linear trend in the log odds. In particular, logistic regression won’t capture interactions, polynomial features, or the discretization of features, on its own. You can, however, specify all of these as “new features”—even though they may be engineered from existing features.

Remember from the previous section that the most important feature from univariate feature exploration, PAY_1, was not found to be linear in the log odds.

No multicollinearity of features

Multicollinearity means that features are correlated with each other. The worst violation of this assumption is when features are perfectly correlated with each other, such ...

1.Introduction

2.Data Exploration and Cleaning

Mini Project

3.Introduction to scikit-learn and Model Evaluation

Project

Mini Project

4.Details of Logistic Regression and Feature Extraction

Mini Project

5.The Bias-Variance Trade-Off

Mini Project

6.Decision Trees and Random Forests

Mini Project

7.Gradient Boosting, XGBoost, and SHAP Values

Mini Project

Project

8.Test Set Analysis, Financial Insights, and Delivery to the Client

Mini Project

9.Appendix

Assumptions of Logistic Regression

The four assumptions of logistic regression

Features are linear in the log odds

No multicollinearity of features