Search⌘ K

Grouping Our Dataset

Explore the process of grouping and indexing data in Python to analyze categorical variables like continents and countries. This lesson helps you understand how to segment datasets using pandas groupby and indexing to reveal data distribution patterns and prepare for focused analysis in visual storytelling.

Categorical variable visualization

Let's begin our EDA process, starting with our categorical variables, continent and country. Let's start with a bar plot to understand the representation of continents in the dataset, so we can answer the following question:

What countries and continents are represented in the dataset?

Python 3.10.4
import plotly
import matplotlib.pyplot as plt
gapminder_data = plotly.data.gapminder()
#Generate a bar plot using the no. of instances of continents
fig = gapminder_data['continent'].value_counts().plot(kind='bar').figure
#Print the no. of instances
print(gapminder_data['continent'].value_counts())
#Label the axes and title
plt.xlabel("Continents")
plt.xticks(rotation=30, horizontalalignment="center")
plt.title("Continents in the Gapminder dataset")
plt.ylabel("Number of instances")
fig.savefig('output/to.png')
plt.close(fig)

With this bar plot, we can see a higher representation of instances corresponding to Africa in the dataset.

Now that we've ...