What is polars.LazyFrame.filter() function in Polars Python?
Key takeaways:
The
LazyFrame.filter()function filters rows based on specified conditions without changing the order of the remaining rows.It accepts parameters such as
predicatesfor boolean evaluation andconstraintsto filter by specific column values.The function returns a filtered
DataFrame, which is useful for tasks like removing outliers or creating subsets of data.
Polars is a Rust-based DataFrame library that provides Python developers a fast and efficient alternative to pandas for large-scale data manipulation.
It improves performance for tasks like slicing, filtering, and transformations in Python and Rust environments. LazyFrame is a concept in Polars that enables
One of its key features is the filter() function. This function lets us apply conditions to the data and refine the dataset before execution. In this Answer, we will learn how to use the filter() function with LazyFrame.
Import the library
We use the following command to import the polars library:
import polars as pl
The LazyFrame.filter() method
The LazyFrame.filter() function filters the rows in the LazyFrame based on the provided constraints, but the default order of the rest of the rows remains intact. This means that the filtering operation does not alter the original order of the rows that meet the filtering criteria.
Syntax
Here’s the syntax of the LazyFrame.filter() function:
LazyFrame.filter(predicates, constraints)
Parameters
predicates: It evaluates the expression to a boolean series.
constraints: It use the name = value format to filter rows based on specific column values. Each constraint works like pl.col(name).eq(value) and is combined with other conditions using the & operator. This expression creates a boolean condition.
It returns True for rows where the specified column matches the given value, and False for all others. This is useful for filtering rows in a DataFrame based on the value in a specific column.
Return value
It returns a filtered DataFrame, which can be stored and utilized for further implementations. The use cases for these implementations can include removing outliers, subsetting data, and even training and testing.
Code example
Let’s look at a coding example of the LazyFrame.filter() function.
Note: Click the “Run” button to test the code.
import polars as pllf = pl.LazyFrame({"id": [1, 2, 3, 4, 5],"name": ["John", "David", "Zack", "Sara", "Cardy"],"age": [23, 39, 27, 43, 22],"country": ["Germany", "Austrailia", "Germany", "Pakistan", "London"],"designation": ["Software Engineer", "Accountant", "Doctor", "Pilot", "Accountant"],})print(lf.filter(pl.col("age") < 30).collect())print(lf.filter((pl.col("age") > 20) & (pl.col("country") == "Germany")).collect())print(lf.filter((pl.col("age") < 30 ) | (pl.col("designation") == "Accountant")).collect())
Code explanation
Let’s learn about the implementation line by line:
Line 1: We import the
polarslibrary and give it theplalias to make it easier to refer to in the code.Lines 3–11: We initialize the LazyFrame containing five columns,
id,"name",age,"country", and"designation"with a list of multiple values.Line 14: We apply a filter on the LazyFrame, fetching all users whose
ageis less than30.Line 16: We apply a filter on the LazyFrame, now using multiple constraints, fetching the data of those users only whose
ageis greater than20and they live inGermany.Line 18: We apply a filter on the LazyFrame, fetching all the users who are either under
30or they areAccountant.
Conclusion
Polar's Rust-based DataFrame library and LazyFrame functionality provide an efficient alternative to pandas for large-scale data manipulation. The LazyFrame.filter() function allows filtering based on constraints while preserving row order and optimizing performance through deferred computations.
This makes Polars ideal for managing and analyzing large datasets efficiently.
Free Resources