How to use the DataFrame.merge_sorted() method in Polars
Polars is a powerful library for data manipulation and analysis. It’s designed to process and analyze large datasets more quickly and efficiently. In this Answer, we’ll explore its DataFrame.merge_sorted() function with a code example.
The DataFrame.merge_sorted() function
The DataFrame.merge_sorted() function in Polars is used to merge two sorted DataFrames in such a way that the resultant DataFrame is also sorted.
Note: It’s important to note that the
merge_sorted()function will only merge and sort the DataFrames if they’re already sorted by the key and both of them have same schema.
In the following illustration, we have sorted our two DataFrames using the “Age” column (considered it ato be the key) and depicted how merge_sorted() merges both DataFrames.
Syntax
The syntax for the merge_sorted function is given by:
DataFrame1.merge_sorted(<other>, <key>)
other: This is the other DataFrame that is to be merged withDataFrame1.key: This is the key according to which we have to sort the DataFrames.
Return value
The function returns a DataFrame that contains the sorted values of both DataFrames.
Code
Here is the coding example of the DataFrame.merge_sorted() method to merge and sort the two DataFrames in Polars:
import polars as pldf1 = pl.DataFrame({"Name": ["John", "Joseph", "Albert"],"Age": [18, 15, 29]}).sort("Age")df2 = pl.DataFrame({"Name": ["Ema", "Andrew", "Michel"],"Age": [22, 30, 16]}).sort("Age")df3 = df1.merge_sorted(df2, "Age")print(df3)
Explanation
Let’s discuss the above code in detail:
Line 1: We import the
polarslibrary aspl.Lines 2–7: We define a DataFrame
df1for the citizens with their names and ages. We sort this DataFrame by the Age column using thesortmethod.Lines 9–14: We define another DataFrame
df2with the same schema asdf1and sort it.Line 16: We merge the two DataFrames
df1anddf2and sort them by age using theDataFrame.merge_sorted()function and store the result in thedf3DataFrame.Line 17: We print the
df3DataFrame.
Free Resources