The merge()
method in Python is used to combine two or more DataFrames based on their common columns. The merge()
method works similar to the JOIN
operations in SQL. Merging multiple DataFrames is useful when we want to visualize different chunks of information from different DataFrames all in one place. It enables us to bring information together for further processing and analysis.
pd.merge(left, right, how='inner', on=None)
The merge()
method takes the following parameters:
left
: This is a DataFrame.
right
: This is another DataFrame. The left
DataFrame is merged with the right
DataFrame and a new DataFrame is returned.
Note: Nothing happens to the
left
andright
DataFrames, i.e., they do not change.
how
: This parameter specifies the type of merge to be performed. The following are the types of merges available within the merge()
method:
left
: This type of merge returns all the rows from the left
DataFrame. It also returns the rows from the right
DataFrame that are in common with those of the left
DataFrame. If there are no rows in the right
DataFrame that are in common (rows with similar values) with those of the left
DataFrame, NaN
is returned. The merge is performed on a common column or columns.Note: The number of rows and columns within the merged DataFrame may vary, i.e., it can increase, decrease, or stay the same. The type of merge being performed determines whether the number of rows and columns will change.
right
: This type of merge returns all the rows from the right
DataFrame. It also returns the rows from the left
DataFrame that are in common with those of the right
DataFrame. If there are no rows in the left
DataFrame that are in common with those of the right
DataFrame, NaN
is returned. The merge is performed on a common column or columns.outer
: This type of merge returns all the rows from the left
and right
DataFrames. This also includes the rows from the left
and right
DataFrames that are in common with each other. If no common rows are found, NaN
is returned. The merge is performed on a common column or columns.inner
: This is the default merge type. It returns only the rows that are common between the left
and right
DataFrames. If there are no common rows, an empty DataFrame is returned. The merge is performed on a common column or columns.cross
: This type of merge returns the Cartesian product of the rows from the left
and right
DataFrames, i.e., it combines and returns each row from the left
DataFrame with each row from the right
DataFrame. We don’t need to specify any common column or columns when using this merge type.on
: This parameter represents the column or columns to perform the merge on. These columns should be present in both the left
and right
DataFrames. Its default value is None
, i.e., no common column or columns are used to perform the merge()
operation.
Let’s look at an example where we will create two DataFrames and then perform the merge()
operation on them.
import pandas as pdleft_dataframe = pd.DataFrame({'id': [1, 2, 3, 4],'name': ['Leo', 'Jacob', 'James', 'Mason'],'age': [18, 20, 23, 19],'course': ['english', 'persian', 'arts', 'chemistry']})print('')right_dataframe = pd.DataFrame({'id': [1, 2, 3, 4],'name': ['William', 'Lucas', 'Henry', 'Elio'],'age': [18, 21, 26, 25],'course': ['english', 'persian', 'Literature', 'Physics']})print('')print('Left data frame')print(left_dataframe)print('')print('Right data frame')print(right_dataframe)print('')print('Left merge')print(pd.merge(left_dataframe, right_dataframe, how='left', on='course'))print('')print('Right merge')print(pd.merge(left_dataframe, right_dataframe, how='right', on='course'))print('')print('Outer merge')print(pd.merge(left_dataframe, right_dataframe, how='outer', on='course'))print('')print('Inner merge')print(pd.merge(left_dataframe, right_dataframe, how='inner', on=['age', 'course']))print('')print('Cross merge')print(pd.merge(left_dataframe, right_dataframe, how='cross'))print('')
Line 1: We import the required library, i.e., pandas
.
Lines 3–8: We create the left
DataFrame and name it left_dataframe
. This DataFrame has columns named id
, name
, age
, and course
.
Lines 11–16: We create the right
DataFrame and name it right_dataframe
. This DataFrame has columns named id
, name
, age
, and course
.
Line 20: We print the left_dataframe
object.
Line 24: We print the right_dataframe
object.
Line 28: We merge the left_dataframe
with the right_dataframe
using the left
merge method. We perform the merge()
operation on the course
column.
Line 32: We merge the left_dataframe
with the right_dataframe
using the right
merge method. We perform the merge()
operation on the course
column.
Line 36: We merge the left_dataframe
with the right_dataframe
using the outer
merge method. We perform the merge()
operation on the course
column.
Line 40: We merge the left_dataframe
with the right_dataframe
using the inner
merge method. We perform the merge()
operation on the age
and course
columns.
Line 44: We merge the left_dataframe
with the right_dataframe
using the cross
merge method. As discussed previously, we do not need to provide any common column or columns when using this merge type.
Let’s test out understanding of the merge()
method in Python by solving this multiple-choice question.
Imagine we work for a supermarket with two separate databases—one for customer information and another for orders. The customer database includes customer IDs, names, addresses, and other details, while the order database includes order IDs, customer IDs, product details, and order dates. Which method would we use to get a list of all the customers who have placed orders?
left
merge
right
merge
inner
merge
outer
merge
Free Resources