Polars is a fast `DataFrame`

library implemented in Rust with bindings for Python. It is a data manipulation library used for processing large datasets. It is similar to pandas but optimized for performance and parallel operations processing, making it well-suited for big data processing tasks. Polars supports data from various sources, including CSV,

Note:We will use the version 3.6 of Python.

We can import the `polars`

library in our Python script or notebook, as shown below:

import polars as pl

We’ll go through the `groupby()`

method of the `polars`

library.

`groupby()`

methodThe `groupby()`

method, available in data manipulation libraries like `pandas`

and `polars`

, allows us to group rows of a DataFrame based on the unique values in one or more columns. With the help of the `groupby()`

method, we can group data according to categories and then independently apply functions to the categories.

Here is the syntax for using the `groupby()`

method:

# importing polarsimport polars as pldata = {"id": [1, 2, 3],"grade": ["A", "B", "B"]}df = pl.DataFrame(data)# grouping w.r.t columnfor name, data in df.groupby("grade"):print(name)print(data)

In the above code, we iterate through groups formed by the `groupby`

operation based on the unique values in the `grade`

column. In this case, the groups are formed for unique values `A`

and `B`

.

`groupby()`

methodThere’s a list of operations we can apply to the grouped data. Let’s see the examples of a few of them.

We can find the maximum of the grouped data using the `groupby.max()`

function of the `polars`

library. This way, we can reduce our groups to show only maximum values.

# importing polarsimport polars as pldata = {"x": [10, 20, 30, 40, 50, 60],"y": [0.1, 0.2, 0.5, 1.0, 2.0, 3.0],"z": [False, True, False, False, True, True],"w": ["Red", "Blue", "Red", "Green", "Green", "Blue"]}df = pl.DataFrame(data)# fetching the maximum valueresult = df.groupby("w", maintain_order=True).max()print(result)

We can find the minimum of the grouped data using the `groupby.min()`

function. This way, we can reduce our groups to show only minimum values.

# importing polarsimport polars as pldata = {"x": [10, 20, 30, 40, 50, 60],"y": [0.1, 0.2, 0.5, 1.0, 2.0, 3.0],"z": [False, True, False, False, True, True],"w": ["Red", "Blue", "Red", "Green", "Green", "Blue"]}df = pl.DataFrame(data)# fetching the minimum valueresult = df.groupby("w", maintain_order=True).min()print(result)

We can find the sum of the grouped data using the `groupby.sum()`

function. This way, we can reduce our groups to show the sum of the values.

# importing polarsimport polars as pldata = {"x": [10, 20, 30, 40, 50, 60],"y": [0.1, 0.2, 0.5, 1.0, 2.0, 3.0],"z": [False, True, False, False, True, True],"w": ["Red", "Blue", "Red", "Green", "Green", "Blue"]}df = pl.DataFrame(data)# fetching the sum of the valuesresult = df.groupby("w", maintain_order=True).sum()print(result)

We have explored a few examples, but there are many more methods like aggregate, mean, median, tail, quantile, etc. The `DataFrame.groupby`

method is a powerful function that allows us to group data efficiently and provide us with various operations on the grouped data.

Copyright ©2024 Educative, Inc. All rights reserved

TRENDING TOPICS