What is DataFrame.var() function in Polars?
In Polars, the var() function in the polars.dataframe module is used to calculate the ddof parameter.
Syntax
Here’s the syntax of the dataframe.var() function:
DataFrame.var(ddof: int = 1)
Parameters
ddof(Delta Degrees of Freedom): It represents the divisor used in the calculation, which isN - ddof, whereNis the number of elements. The default value forddofis1.
Variance
We are familiar with the process of calculating variance, but let's review the method for its computation. The formula for calculating variance is as follows:
Where:
is the number of data points in the dataset. represents each individual data point. is the mean (average) of the dataset.
Let's take the array named alpha with the values, [2,5,8,9,10], and calculate its variance using the formula:
Code example
The next step is to calculate the variance of alpha, beta and gamma using the dataframe.var() function:
import polars as pldf = pl.DataFrame({"alpha": [2, 5, 8, 9, 10],"beta": [8, 7, 6, 5, 4],"gamma": ["a", "b", "c", "d", "e"],})variance = df.var(ddof = 0)print(variance)
Code explanation
Now let’s break down the code shown above:
Lines 2-7: We create a DataFrame named
dfwith three columns:alpha,beta, andgamma. Thealphaandbetacolumns contain numerical data, while thegammacolumn contains categorical data (strings).Line 9: We calculate the variance of each numerical column in the DataFrame (
alphaandbeta) using thevarfunction. Theddofparameter is set to0, which means the calculation is performed without using Bessel’s correction. Bessel's correction is used to adjust the sample variance to be an unbiased estimator of the population variance. Settingddof=0implies that the variance of the population is being calculated.Line 10: We print the result.
Exercise: Try using default values and calculate variance.
Conclusion
The DataFrame.var function in Polars provides a convenient and efficient way to calculate the variance of numerical columns within a DataFrame. The function supports an optional parameter, ddof, allowing users to choose between calculating the population variance (ddof=0) or the sample variance (ddof=1).
By default, the function employs Bessel's correction (ddof=1), ensuring unbiased estimates when working with samples. It's important to note that the function excludes non-numeric columns from the variance calculation, recognizing the statistical nature of the operation. Overall, the DataFrame.var function in Polars contributes to the library's versatility in data manipulation and statistical analysis, enhancing the user's ability to extract meaningful insights from their datasets.
Free Resources