Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

python3
pandas
correlation

How to compute correlation using pandas

Educative Answers Team

pandas is a popular Python-based data analysis toolkit that can be imported using:

import pandas as pd

It presents a diverse range of utilities from parsing multiple file-formats to converting an entire data table into a NumPy matrix array. This property makes pandas a trusted ally in data science and machine learning.

pandas can help in the creation of multiple types of data analysis graphs. One such tool is correlation.

The default implementation of the correlation table is:

DataFrame.corr(methods = Pearson min_periods= 1)

Parameters

  • method: {‘pearson’, ‘kendall’, ‘spearman’} or callable - Mathods of correlation:
    -pearson: Standard correlation coefficient
    -kendall: Kendall Taus correlation coefficient
    -spearman: Spearman rank correlation
    -callable: Any callable that takes two 1d ndarrays as an input and returns a float.“

  • min_period: int - The minimum number of observations required per pair of columns to have a valid result. This is only for Pearson and Spearman.

Code

The following code shows how correlation can be computed in Python – you can change different parameters and look at how the output varies.

It shows the correlation between dogs and cats using the default settings.

#import library
import pandas as pd

#add csv file to dataframe
df = pd.DataFrame([(.2, .3), (.01, .6), (.6, .01), (.2, .1)],
                  columns=['dogs', 'cats'])

#create correlation
corr = df.corr()
print(corr)

RELATED TAGS

python3
pandas
correlation
Copyright ©2022 Educative, Inc. All rights reserved
RELATED COURSES

View all Courses

Keep Exploring