How to obtain the variance over a specified axis in pandas
Overview
The var() function in pandas obtains the variance of the values of a specified axis of a given DataFrame.
Mathematically, variance is defined as the measure of the spread between the values of a data set.
It takes the formula below:
S2 =
Where:
- S2 = variance
- xi = value of the dataset
- x = the number of values in the dataset
In another context, the variance of a dataset is given as √standard deviation. That is, the square root of the standard deviation.
Syntax
The var() function takes the following syntax:
DataFrame.var(axis=NoDefault.no_default, skipna=True, numeric_only=None, **kwargs)
Syntax for the var() function in Pandas
Parameter values
The var() function takes the following optional parameter values:
axis: This represents the name of the row (designated as0or'index') or the column (designated as1orcolumns) axis.skipna: This takes a boolean value indicating whether NA or null values are to be excluded.ddof: This takes anintthat represents the delta degrees of freedom.numeric_only: This takes a boolean value indicating whether to include only float, int, or boolean columns.**kwargs: This is an additional keyword argument that can be passed to the function.
Return value
The var() function returns a DataFrame object holding the results.
Example
# A code to illustrate the var() function in Pandas# Importing the pandas libraryimport pandas as pd# Creating a DataFramedf = pd.DataFrame([[1,2,3,4,5],[1,7,5,9,0.5],[3,11,13,14,12]],columns=list('ABCDE'))# Printing the DataFrameprint(df)# Obtaining the median value vertically across rowsprint(df.var())# Obtaining the median value horizontally over columnsprint(df.var(axis="columns"))
Explanation
- Line 4: We import the
pandaslibrary. - Lines 7–10: We create a DataFrame,
df. - Line 12: We print
df. - Line 15: Using the
var()function, we obtain the variance of the values that run downwards across the rows (axis0). We print the result to the console. - Line 18: Using the
var()function, we obtain the variance of values that run horizontally across columns (axis1). We print the result to the console.