Trusted answers to developer questions

Hassaan Waqar

A **correlation matrix** is used to show the degree of the linear relationship between variables in a dataset. It indicates the correlation using the correlation coefficient.

The **correlation coefficient** shows how strongly or weakly any two variables are related. Scores range between 1 and -1. 1 indicates *a perfect positive correlation*, whereas -1 indicates *a perfect negative correlation*. Scores closer to 0 indicate a weak correlation.

**Correlation** refers to a degree of relationship between variables. It can be *causal* or *non-causal*. We say that there is a positive correlation when an increase in variable $x$ causes an increase in variable $y$. We say that there is a negative correlation when an increase in variable $x$ causes a decrease in variable $y$.

The illustration below shows positive and negative correlations:

The table below summarizes correlation coefficients:

Coefficient | Meaning |
---|---|

1 | Perfect positive correlation. A unit increase in variable $x$ means a unit increase in variable $y$. |

-1 | Perfect negative correlation. A unit increase in variable $x$ means a unit decrease in variable $y$. |

0 | No correlation. Variables are not related. |

A correlation matrix displays the correlation between all numerical variables present in the dataset. If a dataset has $n$ numerical features, a correlation matrix may have $n^2$ values that are symmetric about the center. Therefore, it is sufficient to analyze only the top or bottom half of the matrix.

The illustration below shows a visual representation of a correlation matrix:

The diagonal always has a coefficient of 1.00, since it represents a relation between the variable with itself.

A gradient color scheme helps to improve understanding of the coefficient scores.

The code snippet below shows how we can create a correlation matrix in Python:

import pandas as pd # for creating a dataframe import seaborn as sn # for shaping our matrix import matplotlib.pyplot as plt # for creating visualizations # Data for matrix data = {'A': [45,37,42,35,39], 'B': [38,31,26,28,33], 'C': [10,15,17,21,12] } df = pd.DataFrame(data,columns=['A','B','C']) print("Original Matrix") print(df) # original matrix print("\n") corrMatrix = df.corr() # finding correlations print("Correlation Coefficients Matrix") print (corrMatrix) # printing correlations

# Visual Representation of Correlation Matrix sn.heatmap(corrMatrix, annot = True, cmap = 'Blues')

`Line 11`

creates a dataframe. Adataframecan be referred to as amatrix.

`Line 16`

uses the`corr`

function on our dataframe to calculate the correlation coefficients matrix.

The second code snippet is a continuation of the first code snippet.

It creates a visualization of the correlation matrix using Seaborn and Matplotlib. It takes in the correlation coefficients, annotates them, and colors them blue.

RELATED TAGS

correlation matrix

python

CONTRIBUTOR

Hassaan Waqar

Copyright Â©2022 Educative, Inc. All rights reserved

RELATED COURSES

View all Courses

Keep Exploring

Related Courses