How to use the DataFrame.merge_sorted() method in Polars

Polars is a powerful library for data manipulation and analysis. It’s designed to process and analyze large datasets more quickly and efficiently. In this Answer, we’ll explore its DataFrame.merge_sorted() function with a code example.

The DataFrame.merge_sorted() function

The DataFrame.merge_sorted() function in Polars is used to merge two sorted DataFrames in such a way that the resultant DataFrame is also sorted.

Note: It’s important to note that the merge_sorted() function will only merge and sort the DataFrames if they’re already sorted by the key and both of them have same schema.

In the following illustration, we have sorted our two DataFrames using the “Age” column (considered it ato be the key) and depicted how merge_sorted() merges both DataFrames.

The merging and sorting of two DataFrames
The merging and sorting of two DataFrames

Syntax

The syntax for the merge_sorted function is given by:

DataFrame1.merge_sorted(<other>, <key>)
  • other: This is the other DataFrame that is to be merged with DataFrame1.

  • key: This is the key according to which we have to sort the DataFrames.

Return value

The function returns a DataFrame that contains the sorted values of both DataFrames.

Code

Here is the coding example of the DataFrame.merge_sorted() method to merge and sort the two DataFrames in Polars:

import polars as pl
df1 = pl.DataFrame(
{
"Name": ["John", "Joseph", "Albert"],
"Age": [18, 15, 29]
}
).sort("Age")
df2 = pl.DataFrame(
{
"Name": ["Ema", "Andrew", "Michel"],
"Age": [22, 30, 16]
}
).sort("Age")
df3 = df1.merge_sorted(df2, "Age")
print(df3)

Explanation

Let’s discuss the above code in detail:

  • Line 1: We import the polars library as pl.

  • Lines 2–7: We define a DataFrame df1 for the citizens with their names and ages. We sort this DataFrame by the Age column using the sort method.

  • Lines 9–14: We define another DataFrame df2 with the same schema as df1 and sort it.

  • Line 16: We merge the two DataFrames df1 and df2 and sort them by age using the DataFrame.merge_sorted() function and store the result in the df3 DataFrame.

  • Line 17: We print the df3 DataFrame.

Copyright ©2024 Educative, Inc. All rights reserved