Trusted answers to developer questions

What is logistic regression?

Get Started With Data Science

Learn the fundamentals of Data Science with this free course. Future-proof your career by adding Data Science skills to your toolkit — or prepare to land a job in AI, Machine Learning, or Data Analysis.

Why logistic regression?

Logistic regression introduces a decision threshold – the threshold value holds significance because it directly affects the classification problem and regression model. This threshold allows the value to range strictly between 0 and 1.
Logistic Regression is used when the dependent variabletarget is categorical.
In reference to the example about spam emails, a linear model cannot be used because it is unbounded, meaning that no classification-based threshold can be made. Therefore, we use logistic regression.
Linear regression assumes that the data follows a linear function, while logistic regression models the data using the sigmoid function.

The threshold value is completely dependent on two factors, precision and recall. Ideally, we want both precision and recall to have a value of 1. However, this is not realistically possible, so we settle on trade-off values.

Let’s discuss how we can decide their values:

Low Precision/High Recall Example: We never want anyone affected with a disease to be classified as “not affected” without paying attention to the fact that the patient may have been wrongfully diagnosed. Therefore, when a situation arises where we want the number of false-negatives to decrease without impacting the false positives, we use a low precision value and high recall value.
High Precision/Low Recall Example: a company has released a new advertisement and are classifying whether or not people will react positively. In this case, the company needs to make sure most people react positively. Therefore, we want to reduce the number of false positives without necessarily reducing false negatives; so, we choose a decision value that has a high value of precision or a low value of recall.

Categories of logistic regression

Logistic regression can be classified into 3 categories:

Binomial: There are only 2 possible types of data, 0 or 1. These types may represent “win” vs. “loss,” “pass” vs. “fail,” etc.

Multinomial: The target variable can have 3 or more possible types that are not ordered as types have no quantitative significance (e.g., “disease A” vs. “disease B” vs. “disease C”).

Ordinal: Target variables are with ordered categories. For example, a test score can be categorized as, “very poor,” “poor,” “good,” or “very good.” Therefore, each category is given a respective score.

RELATED TAGS

machine learning

CONTRIBUTOR

Sarvech Qadir

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments