What is a generalized linear model (GLM)?

A generalized linear model is an advanced statistical model that adds to the concept of the general linear model. It is applied to continuous response variables with continuous and/or categorical predictors. The general linear model includes the conventional regression models used for response variables with normal distributionThis is a continuous probability distribution for a real-valued random variable with a mean of 0 and standard deviation of 1., such as linear regression.It linearly models the relation between a response variable and one or more explanatory variables.

On the other hand, a generalized linear model (GLM) allows the response variables to have a non-linear distribution, such as a binomial distributionThis is a discrete probability distribution with independent outcomes and constant probability of success. It has two outcomes—success or failure.. The following image displays a response outcome with an exponential distribution:

GLM uses several components to generalize the relationship between the response variable and the predictors to a linear and additive relation. One of the main components is a link function. In addition to that, certain assumptions are made for this model.

Assumptions of GLM

The generalized linear model is used for non-linear, heteroscedasticThis has a variance that changes with mean. For example, increasing variance with increasing the mean. data which does not follow a normal distribution. It has certain underlying assumptions with using which it is implemented. Here are the assumptions:

The data needs to be random and independent.
Random variables should follow the same probability distribution.
The response variable follows an exponential distributionIt is the probability distribution of the time between Poisson point process events., such as a binomial or a Poisson distribution.
The response and explanatory variables do not have a linear relationship. However, a linear relationship is established between the transformed response variable (after the link function) and the explanatory variables.
We can also use transformed explanatory variables to build the GLM model, such as taking the log or square of the original variable.
Error variance of the response variable can vary with the independent variables.

How does GLM work

To develop a linear relationship between the response variable and the predictors, GLM uses three components:

A linear predictor
A link function
A probability distribution

We use well-known conventional statistical models in the process.

Linear predictor

A linear predictor, also known as a systematic component, is the linear combination of the explanatory variables (x₁, x₂, x₃, ....., x_i) and the regression coefficients:

Probability distribution

Probability distribution, also known as the random component, refers to the distribution that the response variable $Y$ follows. Some distributions that $Y$ can follow include normal, binomial, multinomial distributionThis is a discrete probability distribution with independent outcomes and constant probability of each outcome. It has multiple outcomes, or Poisson distributions.

Link function

This component is usually represented as η or g(μ) in GLM. It specifies how the response variable is related to the linear combination of explanatory variables. It is defined using the probability distribution of the response variable and the linear predictor.

Different models used in GLM

Various models are used in GLM according to the probability distribution of the response variable. Numerous probability distributions and link functions are available for this purpose. However, only three of them are discussed in this answer to cover the entirety of the model.

Linear regression

Probability distribution: A random variable $Y$ follows a continuous normal distribution.
Linear predictor: Predictors form a linear combination with the parameters. Predictor variables can be continuous or categorical. In addition, transformed variables can also be used in the linear combination, such as $log(x)$ :

Probability Distribution	Link Function
Normal	Identity
Binomial	Logit/Sigmoid
Poisson	Log