How to count rows in the Polars library
The Polars library is a high-performance data manipulation and analysis tool for the Python programming language. Built on Rust and designed for parallel processing, Polars provides a memory-efficient DataFrame structure, offering a powerful and efficient alternative to traditional data manipulation libraries. One common data manipulation task is adding a column that counts the rows. Polars provide the with_row_count() function to accomplish it.
The with_row_count() function
The DataFrame.with_row_count function is used to count the number of rows. It adds a new column at index 0 of the Dataframe. Now, we will discuss syntax and coding examples to understand the concept of counting the rows in Polars.
Syntax
Here is the syntax of the with_row_count function:
DataFrame.with_row_count(name: str = 'row_nr', offset: int = 0)
The with_row_count() function adds a new column to the DataFrame. This new column is a series of integers starting from the offset specified in the function call. The offset is incremented for each row, effectively creating a row count.
name: Thenameparameter specifies the name of the new column. By default, the name of the new column isrow_nr.offset: Theoffsetparameter specifies the starting point of the row count. The offset is set to0by default, meaning the row count will start from 0. We can also change the offset value from where we want to start the count.
Code
Let’s see how to count rows with the following example (using default values):
import polars as pldf = pl.DataFrame({"a": [1, 3, 5, 6, 8, 10, 34],"b": [2, 4, 6, 5, 7, 87, 98],"c": [2, 3, 4, 5, 6, 7, 8]})row_count= df.with_row_count()print(row_count)
Explanation
Lines 3–9: We create a DataFrame named
dfwith three column values nameda,bandcrespectively.Line 10: We assign the count of rows to the
row_countvariable.Line 11: We print the number of rows.
The output shows that a new column row_nr is added, with the default starting value of offset 0, which indicates the row number for each row in the DataFrame.
Now let’s see an example of using different name and offset values for the new column by changing the name as My_name and setting the offset to start from 4.
import polars as pldf = pl.DataFrame({"a": [1, 3, 5, 6, 8, 10],"b": [2, 4, 6, 5, 7, 87],"c": [2, 3, 4, 5, 6, 7]})row_count = df.with_row_count("My_name", 4)print(row_count)
The output shows that a new column My_name is added, indicating the row number for each row in the DataFrame. This can be helpful for various analytical and data manipulation tasks where keeping track of row numbers is essential.
Free Resources