What is a generalized linear model (GLM)?
A generalized linear model is an advanced statistical model that adds to the concept of the general linear model. It is applied to continuous response variables with continuous and/or categorical predictors. The general linear model includes the conventional regression models used for response variables with
On the other hand, a generalized linear model (GLM) allows the response variables to have a non-linear distribution, such as a
GLM generalizes the relation between the response variables and predictors to a linear additive relation that looks similar to the diagram below:
GLM uses several components to generalize the relationship between the response variable and the predictors to a linear and additive relation. One of the main components is a link function. In addition to that, certain assumptions are made for this model.
Assumptions of GLM
The generalized linear model is used for non-linear,
The data needs to be random and independent.
Random variables should follow the same probability distribution.
The response variable follows an
, such as a binomial or a Poisson distribution.exponential distribution It is the probability distribution of the time between Poisson point process events. The response and explanatory variables do not have a linear relationship. However, a linear relationship is established between the transformed response variable (after the link function) and the explanatory variables.
We can also use transformed explanatory variables to build the GLM model, such as taking the log or square of the original variable.
Error variance of the response variable can vary with the independent variables.
How does GLM work
To develop a linear relationship between the response variable and the predictors, GLM uses three components:
A linear predictor
A link function
A probability distribution
We use well-known conventional statistical models in the process.
Linear predictor
A linear predictor, also known as a systematic component, is the linear combination of the explanatory variables (x1, x2, x3, ....., xi) and the regression coefficients:
Probability distribution
Probability distribution, also known as the random component, refers to the distribution that the response variable
Link function
This component is usually represented as η or g(μ) in GLM. It specifies how the response variable is related to the linear combination of explanatory variables. It is defined using the probability distribution of the response variable and the linear predictor.
Probability Distribution | Link Function |
Normal | Identity |
Binomial | Logit/Sigmoid |
Poisson | Log |
Different models used in GLM
Various models are used in GLM according to the probability distribution of the response variable. Numerous probability distributions and link functions are available for this purpose. However, only three of them are discussed in this answer to cover the entirety of the model.
Linear regression
Probability distribution: A random variable
follows a continuous normal distribution. Linear predictor: Predictors form a linear combination with the parameters. Predictor variables can be continuous or categorical. In addition, transformed variables can also be used in the linear combination, such as
:
Simple linear regression is used for one predictor. In the case of two or more predictors, multiple regression is used.
Link function: The identity function is used as a link function to transform the relationship into a linear one:
Binary logistic regression
Probability distribution: The response variable
follows a binomial distribution. Linear predictor: Predictors form a linear combination with the parameters. Predictor variables can be continuous or categorical. In addition, transformed variables can also be used in the linear combination, such as
:
Link function: The Logit link function is used to return a probability that varies between 0 and 1. It is also known as Log odds.
Poisson regression
Probability distribution: The response variable
follows a Poisson distribution. Linear predictor: Predictors form a linear combination with the parameters. Predictor variables can be continuous or categorical. In addition, transformed variables can also be used in the linear combination, such as
:
Link function: The log link function is used.
Conclusion
GLM uses its three components and some statistical models to generalize the relationship between predictors and response variables.
Free Resources