DataFrame.lazy()
methodThe DataFrame.lazy()
method in Polars is used to initiate a lazy computation on a DataFrame. This means that the operations applied to the DataFrame will not be executed immediately but will be stored as a
Let’s see the syntax of the lazy()
method:
DataFrame.lazy()
The DataFrame.lazy()
method returns a LazyFrame
object. The LazyFrame
object is similar to a DataFrame
object, but it’s lazily evaluated.
We create a sample DataFrame with three columns a
, b
, and c
to apply a lazy()
method below:
import polars as pldf = pl.DataFrame({"a": [50, 100, 35, 87],"b": [9.2, 5.4, 2.5, 13.4],"c": [True, True, False, True],"d": [23, 65, 83, 91],})lazy_frame = df.lazy()print(lazy_frame)#Another example of lazy() method with filterlazy_frame2 = df.lazy().filter(pl.col("a") == 100)print(lazy_frame2)
Here’s a step-by-step explanation of the provided code:
Lines 3–10: We create a DataFrame named df
using the pl.DataFrame()
constructor. The DataFrame has four columns (a
, b
, c
, and d
) with some data.
Line 12: We apply the lazy()
method to the DataFrame df
, creating a LazyFrame named lazy_frame
. This LazyFrame represents a computation query or graph of deferred operations.
Line 13: We print the representation of the LazyFrame.
Lines 16–17: We apply the filter()
method on the LazyFrame returned by the df.lazy()
method.
Note: Check out the Answer on the
filter()
function for more information.
Note that directly printing the LazyFrame won’t display the content of the LazyFrame. We would need to execute some operations with the LazyFrame to view the actual content. Some LazyFrame operations are given below:
Upon the creation of a LazyFrame, we can apply a range of operations to it. It’s important to note that these operations remain inactive until called explicitly. Here are some of the methods that can be used:
fetch()
: This executes the lazy operations on a small number of rows.
collect()
: This executes the lazy operations on all the data.
describe_plan()
: This prints the unoptimized query plan.
describe_optimized_plan()
: This prints the optimized query plan.
show_graph()
: This displays the (un)optimized query plan as a Graphviz graph.
Now, let’s take a look at the fetch()
operation:
import polars as pldf = pl.DataFrame({"a": [50, 100, 35, 87],"b": [9.2, 5.4, 2.5, 13.4],"c": [True, True, False, True],"d": [23, 65, 83, 91],})lazy_frame= df.lazy()print(lazy_frame.fetch(2))lazy_frame2 = df.lazy().filter(pl.col("a") == 100)print(lazy_frame2.collect())
The fetch()
method triggers the execution of the operations and displays a DataFrame containing the first two rows of the original DataFrame. On the other hand, the collect()
method executes the query on the data and returns the result as a DataFrame object.
Using DataFrame.lazy()
is a powerful feature in Polars that enables lazy evaluation of operations on DataFrames. This allows for deferred execution of computations, providing opportunities for optimization and parallelization, which can be crucial when dealing with large datasets. When working with complex queries, utilizing lazy operations can lead to more efficient and faster data processing.