How to compare two DataFrames in pandas
Overview
The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side.
The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.
Note: To learn more about pandas, please visit this link.
Syntax
DataFrame.compare(other, align_axis=1, keep_shape=False, keep_equal=False)
Parameters
The compare method accepts the following parameters:
other: This is theDataFramefor comparison.align_axis: This indicates the axis of comparison, with0for rows, and1, the default value, for columns.keep_shape: This is a boolean parameter. Setting this toTrueprevents dropping of any row or column, andcomparedrops rows and columns with all elements same for the two data frames for the default valueFalse.keep_equal: This is another boolean parameter. Setting this toTrueshows equal values between the two DataFrames, whilecompareshows the positions with the same values for the two data frames asNaNfor the default valueFalse.
Example
import pandas as pddata = [['dom', 10], ['chibuge', 15], ['celeste', 14]]df = pd.DataFrame(data, columns = ['Name', 'Age'])data1 = [['dom', 11], ['abhi', 17], ['celeste', 14]]df1 = pd.DataFrame(data1, columns = ['Name', 'Age'])print("Dataframe 1 -- \n")print(df)print("-"*5)print("Dataframe 2 -- \n")print(df1)print("-"*5)print("Dataframe difference -- \n")print(df.compare(df1))print("-"*5)print("Dataframe difference keeping equal values -- \n")print(df.compare(df1, keep_equal=True))print("-"*5)print("Dataframe difference keeping same shape -- \n")print(df.compare(df1, keep_shape=True))print("-"*5)print("Dataframe difference keeping same shape and equal values -- \n")print(df.compare(df1, keep_shape=True, keep_equal=True))
Explanation
- Line 1: We import the
pandasmodule. - Lines 3–4: We construct a Pandas DataFrame called
dffrom the list calleddata.dfhas two columns:NameandAge. - Lines 6–7: We construct another Pandas DataFrame called
df1from the list calleddata1.df1has two columns:NameandAge. - Lines 9–14: We print
dfanddf1. - Line 18: We use
compareto obtain the difference between the two DataFramesdfanddf1. - Line 22: We use
compareto obtain the difference between the two DataFrames,dfanddf1, while settingkeep_equaltoTrue. We can see that similar values are not omitted in the printed difference. - Line 26: We use
compareto obtain the difference between the two DataFrames,dfanddf1, while settingkeep_shapetoTrue. We see that the row with the same values for the two DataFrames is not omitted in the printed difference. - Line 30: We use
compareto obtain the difference between the two DataFrames,dfanddf1, while settingkeep_shapeandkeep_equaltoTrue. We see that the row with the same values for the two DataFrames is not omitted in the printed difference, nor are the values of the positions with the same values for the two DataFrames.