Conditions for Inference for Regression-I

Learn about residuals, linearity, and independence in inference for regression.

We stated that we can only use the standard-error-based method for constructing confidence intervals if the bootstrap distribution is bell-shaped. Similarly, there are certain conditions that need to be met in order for the results of our hypothesis tests and confidence intervals to have valid meaning. These conditions must be met for the assumed underlying mathematical and probability theory to hold true.

For inference for regression, there are four conditions that need to be met. Note the first four letters of these conditions—LINEcan serve as a nice reminder of what to check for whenever we perform linear regression.

  • Linearity of the relationship between variables

  • Independence of the residuals

  • Normality of the residuals

  • Equality of variance of the residuals

Conditions L, N, and E can be verified through what’s known as a residual analysis. Condition I can only be verified through an understanding of how the data was collected.

We’ll go over a refresher on residuals, verify whether each of the four LINE conditions holds true, and then discuss the implications.

Residuals refresher

Recall the definition of a residual: the observed value minus the fitted value denoted by yy^y - \hat{y}. Recall that residuals can be thought of as the error or the lack-of-fit between the observed value yy and the fitted value y^\hat{y} on the regression line. In the figure below, we illustrate one particular residual out of the 463 using an arrow. We’ve also illustrated its corresponding observed and fitted values using a circle and a square, respectively:

Get hands-on with 1200+ tech skills courses.