Challenge: Covariance and Correlation

Assess your skills by completing a challenge.

Pearson’s correlation coefficient

As we saw earlier, Pearson’s correlation coefficient is the test statistics that measure the statistical relationship, or association, between two continuous variables. It’s known as the best method of measuring the association between variables of interest because it’s based on the method of covariance. It gives information about the magnitude of the association, or correlation, as well as the direction of the relationship.

Formula

r=1n1Σi=1n(xix)(yiy)1n1Σi=1n(xix)21n1Σi=1n(yiy)2r = \frac{\frac{1}{n-1}\Sigma_{i=1}^{n}(x_i - \overline{x})(y_i - \overline{y})}{\sqrt{\frac{1}{n-1}\Sigma_{i=1}^n(x_i - \overline{x})^2}\sqrt{\frac{1}{n-1}\Sigma_{i=1}^n(y_i - \overline{y})^2}}

  • rr: Correlation coefficient
  • xix_i : Values of the x-variable in a sample
  • xˉ\bar{x} : Mean of the values of the x-variable
  • yiy_{i} : Values of the y-variable in a sample
  • yˉ\bar{y} : Mean of the values of the y-variable

Problem statement

You’re given the formula for calculating the Pearson correlation coefficient for the pwt7g dataset. The task is to compute the Pearson correlation coefficient.

Coding exercise

This problem has been designed for you to practice freely, so try to solve it on your own first. Take some time and think about the different concepts that have been explored in the course so far.

If you feel stuck, you can always check out the solution review provided in the next lesson.

Good luck!

Get hands-on with 1200+ tech skills courses.