Polars is a powerful library for data manipulation and analysis. It’s designed to process and analyze large datasets more quickly and efficiently. In this Answer, we’ll explore its DataFrame.merge_sorted()
function with a code example.
DataFrame.merge_sorted()
functionThe DataFrame.merge_sorted()
function in Polars is used to merge two sorted DataFrames in such a way that the resultant DataFrame is also sorted.
Note: It’s important to note that the
merge_sorted()
function will only merge and sort the DataFrames if they’re already sorted by the key and both of them have same schema.
In the following illustration, we have sorted our two DataFrames using the “Age” column (considered it ato be the key) and depicted how merge_sorted()
merges both DataFrames.
The syntax for the merge_sorted
function is given by:
DataFrame1.merge_sorted(<other>, <key>)
other
: This is the other DataFrame that is to be merged with DataFrame1
.
key
: This is the key according to which we have to sort the DataFrames.
The function returns a DataFrame that contains the sorted values of both DataFrames.
Here is the coding example of the DataFrame.merge_sorted()
method to merge and sort the two DataFrames in Polars:
import polars as pldf1 = pl.DataFrame({"Name": ["John", "Joseph", "Albert"],"Age": [18, 15, 29]}).sort("Age")df2 = pl.DataFrame({"Name": ["Ema", "Andrew", "Michel"],"Age": [22, 30, 16]}).sort("Age")df3 = df1.merge_sorted(df2, "Age")print(df3)
Let’s discuss the above code in detail:
Line 1: We import the polars
library as pl
.
Lines 2–7: We define a DataFrame df1
for the citizens with their names and ages. We sort this DataFrame by the Age column using the sort
method.
Lines 9–14: We define another DataFrame df2
with the same schema as df1
and sort it.
Line 16: We merge the two DataFrames df1
and df2
and sort them by age using the DataFrame.merge_sorted()
function and store the result in the df3
DataFrame.
Line 17: We print the df3
DataFrame.