# Mathematics of Linear Correlation

Learn about the mathematical equation for the linear correlation and F-test.

## We'll cover the following

## Understanding linear correlation equation

What is linear correlation, mathematically speaking? If you’ve taken basic statistics, you are likely familiar with linear correlation already. Linear correlation works very similarly to linear regression. For two columns, $X$ and $Y$, linear correlation $ρ$ (the lowercase Greek letter “rho”) is defined as the following:

$ρ = \frac{E[(X-µ_X)(Y-µ_Y)]}{σ_Xσ_Y}$

This equation describes the **expected value** ($E$, which you can think of as the average) of the difference between the elements of $X$ and their average, $µ_X$ , multiplied by the difference between the corresponding elements of $Y$ and their average, $µ_Y$ . The average for $E$ is taken over pairs of $X$, $Y$ values. You can imagine that if, when $X$ is relatively large compared to its mean, $µ_X$ , $Y$ also tends to be similarly large, then the terms of the multiplication in the numerator will both tend to be positive, leading to a positive product and positive correlation after the expected value, $E$, is taken. Similarly, if $Y$ tends to be small when $X$ is small, both terms in the numerator will be negative and again lead to **positive correlation**. Conversely, if $Y$ tends to decrease as $X$ increases, they will have **negative correlation**.

The denominator (the product of the standard deviations of $X$ and $Y$) serves to normalize linear correlation to the scale of [-1, 1]. Because Pearson correlation is adjusted for the mean and standard deviation of the data, the actual values of the data are not as important as the relationship between $X$ and $Y$. Stronger linear correlations are closer to 1 or -1. If there is no linear relation between $X$ and $Y$, the correlation will be close to 0.

## A limitation of Pearson correlation

It’s worth noting that, while it is regularly used in this context by data science practitioners, Pearson correlation is not strictly appropriate for a binary response variable, as we have in the case study problem. Technically speaking, among other restrictions, Pearson correlation is only valid for *continuous data*. However, Pearson correlation can still accomplish the purpose of giving a quick idea of the potential usefulness of features. It is also conveniently available in software libraries such as pandas.

In data science in general, you will find that certain widely used techniques may be applied to data that violates its formal statistical assumptions. It is important to be aware of the formal assumptions underlying analytical methods. In fact, knowledge of these assumptions may be tested during interviews for data science jobs. However, in practice, as long as a technique can help us on our way to understanding the problem and finding an effective solution, it can still be a valuable tool.

That being said, linear correlation will not be an effective measure of the predictive power of all features. In particular, it only picks up on linear relationships. Shifting our focus momentarily to a hypothetical regression problem, have a look at the following examples and discuss what you expect the linear correlations to be. Notice that the values of the data on the *x* and *y* axes are not labeled; this is because the location (mean) and standard deviation (scale) of data does not affect the Pearson correlation, only the relationship between the variables, which can be discerned by plotting them together:

Get hands-on with 1200+ tech skills courses.