What is DataFrame.mean() function in polars?
Polars is a powerful library for data manipulation and analysis. It is designed to process and analyze large datasets more quickly and efficiently.
The DataFrame.mean() function
The DataFrame.mean() function in Polars facilitates the computation of the arithmetic mean across columns or specific columns within a DataFrame. The arithmetic mean is computed by dividing the sum of all the numbers by the count of numbers.
Syntax
Here is the syntax of the DataFrame.mean() function:
DataFrame.mean(<axis>, <null_strategy>)
axis: This is an optional parameter that specifies the axis along which the mean should be computed. Value0is a default value that computes the mean along the column, and value1computes the mean along the row.null_strategy: This is also an optional parameter that is only used ifaxis = 1. This specifies how to handle null (missing) values during the computation. The valueignoreis the default value that only ignores the null values and computes the mean for the rest of the values in the row. The valuepropagatewill exclude all the rows with null values from the computation.
Note: We can also compute the mean of a specific column by specifying the column name in mean function like the following
DataFrame[<column_name>].mean(). Remember that we can't useaxisandnull_strategyparameters while computing mean for specific columns.
Return value
The function returns a DataFrame, which contains the mean values calculated for each numeric column. The resulting DataFrame will have a single row containing the mean values for each column. If we specify the column name to compute the mean of the specific column, it will return a single mean value for that column.
Note: The non-numeric columns will be excluded from the computation.
Code
Here's the coding example of the DataFrame.mean() method to calculate the mean of numeric columns in Polars:
import polars as pldf = pl.DataFrame({"Product": ["Cookies", "Brownie", "Tortilla's wrap", "Pasta"],"Price": [10, 20, 35, 15],"Quantity": [200, None, 70, 30],})# Computing the mean of complete tableprint(df.mean())# Computing the mean of only one column, "Quantity"print("Mean of the column with Quantities: ", df["Quantity"].mean())# Computing the mean along the row# The null_strategy = ignore will compute the mean of rows excluding the null valuesprint("Mean along the row with the null_strategy = ignore: ", df.mean(axis=1, null_strategy = 'ignore'))# Computing the mean along the row# The null_strategy = propagate will exclude all the rows with the null values from the computationprint("Mean along the row with the null_strategy = propagate: ",df.mean(axis=1, null_strategy = 'propagate'))
Explanation
Line 1: We import the
polarslibrary aspl.Lines 2–9: We define our DataFrame as
dffor the cafe with the product's name, price, and quantity.Line 11: We use the
df.mean()function to print the mean of the complete table.Line 14: We use the
df.mean()function with the column nameQuantityto only print the mean of theQuantitycolumn.Line 18: We use the
df.mean()function withaxis = 1to compute the mean along the rows andnull_strategy = 'ignore'to exclude the null values from the computation.Line 22: We use the
df.mean()function withaxis = 1to compute the mean along the rows andnull_strategy = 'propagate'to exclude the rows with null values.
Free Resources