Search⌘ K

Split-Apply-Combine Technique

Explore the split-apply-combine technique to manage and analyze grouped data efficiently. Learn how to split data into groups, apply operations like ranking or summing, and then combine results to extract meaningful insights. This lesson guides you through practical steps using air quality data to identify peak pollutant times, enhancing your data wrangling skills in Python.

Split-Apply-Combine method

In this technique, we split the data into specific groups, like in the previous lesson. Then certain operations are applied to those groups separately. Finally, all the groups are again combined to form the final required dataset. Let’s review the following example.

The initial data set is first split into three groups, A, B, and C. Then, the sum operation is applied to every element of each group. Finally, the results are combined at the end, and a dataset with concise required information is formed.

Let’s perform this technique on air quality index data and see what type of useful information can be extracted.

Python 3.5
import pandas as pd
df = pd.read_csv('air.csv') # reading data from file
print(df)

As can be seen from the output, the file contains the Date, Time, and the number of different pollutants that are in the air in that time frame. Pollutant data for every hour of each day is ...