Polars is a fast and efficient data manipulation library written in Rust. It’s designed to provide high-performance operations on large datasets and handles them more quickly than pandas
. It’s particularly suitable when working with tabular data.
The dataframe.std()
method helps understand and quantify data variability. It supports various advanced statistical and machine-learning tasks, which help data scientists make informed decisions.
dataframe.std()
methodThe dataframe.std()
function aggregates the columns of a DataFrame to their
Here’s the syntax of the DataFrame.std()
function:
DataFrame.std(ddof: int = 1)
The ddof
parameter refers to the delta degrees of freedom. It denotes the divisor used in the calculation, represented by N - ddof
, where N
is the number of elements. The default value for ddof
is 1.
Note: Setting
ddof=0
would calculate the population standard deviation.
It returns a new DataFrame containing the standard deviation of the numerical columns.
Standard deviation is a statistical measure that describes the amount of variation or dispersion in a set of values. The formula to calculate standard deviation is as follows:
We will now see how to calculate the standard deviation.
Let’s illustrate the usage of the DataFrame.std()
function with an example.
import polars as pldf = pl.DataFrame({"alpha":[1, 2, 3, 4, 5],"beta": ["a", "b", "c", "d", "e"],})deviation = df.std(ddof = 0)default_deviation = df.std()print(deviation)print(default_deviation)
We will see the step-by-step explanation of the code above:
Lines 2–6: We create a DataFrame df
with two columns, alpha
and beta
. The alpha
contains numeric values [1, 2, 3, 4, 5]
, and beta
contains string values ["a", "b", "c", "d", "e"]
.
Line 8: We use the std(ddof=0)
function to compute the population standard deviation of the df
columns. It provides an exact measure of dispersion without adjusting for sampling bias.
Line 9: We use the std()
function to compute the sample standard deviation of the df
columns. It provides an exact measure of dispersion without adjusting for sampling bias. It accounts for the fact that the sample mean may not perfectly represent the population mean.
Lines 11–12: We print both the sample and population standard deviations.
Note: The
beta
column contains string values; standard deviation is not computable for it using traditional statistical methods. Therefore, the result of thebeta
column might not be meaningful.
The DataFrame.std()
is standard deviation function can be used in various fields to measure and understand data variability. For example:
In finance, analysts compute the standard deviation of stock returns to measure a stock’s volatility. A higher standard deviation indicates higher risk.
Environmental researchers might use the standard deviation of temperature readings to understand climate change in a specific region over time.
In education, teachers might analyze test scores to understand the distribution of student performance and identify any significant variability.
In conclusion, by calculating the standard deviation, one can assess data consistency, identify outliers, compare datasets, and understand the overall spread and risk associated with the data.
Free Resources