Compare Total Review of 2016 and 2017

Learn to compare data in pandas and PySpark.

Comparison in Pandas

To compare the total reviews of 2016 and 2017, we first need to aggregate the data by the review year and month. Next, we need to count the number of asin for each month. Then we can subset the new DataFrame with a filter to create new, separate DataFrames for 2016 and 2017. Finally, we join the two new DataFrames to create a wide DataFrame, where the total reviews for each month for the years 2016 and 2017 will be side by side.

Get hands-on with 1200+ tech skills courses.