Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

# What is the difference between linear and logistic regression? Abdul Muizz

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

### Overview

Linear and logistic regression models are widely used in machine learning. Both these models fall into the category of supervised learning.

The linear regression model uses a linear combination of the inputs to predict the output. After successful training, the outcome is always a continuous value in the range of $(-\infty,\infty)$.

On the other hand, the logistic regression model is a probabilistic model that uses log oddsOdds provide the measure of the likelihood of a particular outcome. The mathematical logs of this are defined as log odds. of independent variables to make data predictions. This model's output is usually a discrete categorical label.

### Linear regression

We mainly use linear regression for regression problems. For example, we can use this model to estimate the housing price in a particular locality. In this case, the data contains independent and dependent variables that are linearly related.

The following is the mathematical formulation of the model:

Here, $\alpha$ is the weight obtained after training, and $b$ represents the bias term. These together constitute the parameters of the model. $\hat{y}$ is the prediction of the model, and $x$ is the input vector in a high dimensional subspace.

The graphical representation of the output after training is the best fit line predicted using the training data's trend. The visualization below depicts the model after successful training:

The best fit line predicted using the linear regression model

We optimize the model's weights using the mean squared error loss function. Therefore, the weights that minimize the loss function are always optimal. Moreover, to apply the linear regression model, the independent variables must satisfy the property of collinearityCollinearity refers to the correlation between independent variables..

### Logistic regression

Logistic regression, on the other hand, is a model used for classification problems. For example, the prediction of rain ("yes" or "no") is a binary classification problem we can solve using a logistic regression model.

The training data in the case of logistic regression can support variable relationships between the independent and dependent variables. Also, this model usually outputs a value in the range of $(0,1)$ with the help of the sigmoid (σ) function. The following is the formulation of the model:

Here, $\hat{y}$ is the output vector with the probabilities of all the classes. $x$ is the input vector that is mapped to the output after the linear combination of inputs, that is, $\theta^Tx$ . Here, $\theta^{T}$ refers to the transpose of weights that are tuned during training.

The sigmoid function is as follows:

The logistic regression model

If the output of the sigmoid is greater than the threshold of 0.5, we classify the data point as Class 1. Otherwise, it belongs to class 2.

Note: The value of the threshold can vary but, in the general case, we keep it as 0.5.

As we can see above, an S-shaped curve (the sigmoid) fits the data and separates it into different linearly separable classes. The model uses the concept of maximum likelihood estimation to find optimal weights and accuracy. Also, the data to which we apply the model must not be collinear in the case of independent variables.

The following table encapsulates the key differences between both techniques:

A summary of differences
 Linear regression Logistic regression Always used for regression problems Mostly used for classification problems The outcome is a continuous value The outcome is a discrete value The 'best fit line' is fitted on the training data and is used to predict a value on unseen data The 'S-shaped curve' is fitted on the training data and used to predict the labels on unseen data The mean squared error is used to calculate accuracy Maximum likelihood estimation is used estimate accuracy Supports a linear relation between the independent and dependent variables Supports a variable relation between the independent and dependent variables Collinearity is must between the independent variables in the training data Collinearity must not exist between the independent variables in the training data

### Applications of linear regression

There are several real-world applications of linear regression. We can use it to:

1. Forecast stocks: Linear regression models can predict trends in stocks using the data of stock prices.
2. Analyze market effectiveness: After training on the past data of large-scale businesses, a linear regression model can predict their market effectiveness. We can use it to get a general idea of their position in the competitive market.
3. Study operational efficiency of machines: We can use this model to study the efficiency of machines.

### Applications of logistic regression

The real-world applications of logistic regression are quite different from those of linear regression. We can use logistic regression to:

• Detect pollution levels using images: We can use this model to classify or detect pollution levels after training on a dataset containing relevant images.
• Analyze text: We can use this model in various natural language processing tasks to claim the tone of text after training it on appropriate sentiments.
• Calculate credit scores: We can use this model to reduce the number of features that exhibit high correlation, and calculate an individual's credit score.

RELATED TAGS

CONTRIBUTOR Abdul Muizz 