Grouping Our Dataset

Build on the fundamentals of exploratory data analysis by learning grouping and indexing.

Categorical variable visualization

Let's begin our EDA process, starting with our categorical variables, continent and country. Let's start with a bar plot to understand the representation of continents in the dataset, so we can answer the following question:

What countries and continents are represented in the dataset?

Press + to interact
import plotly
import matplotlib.pyplot as plt
gapminder_data = plotly.data.gapminder()
#Generate a bar plot using the no. of instances of continents
fig = gapminder_data['continent'].value_counts().plot(kind='bar').figure
#Print the no. of instances
print(gapminder_data['continent'].value_counts())
#Label the axes and title
plt.xlabel("Continents")
plt.xticks(rotation=30, horizontalalignment="center")
plt.title("Continents in the Gapminder dataset")
plt.ylabel("Number of instances")
fig.savefig('output/to.png')
plt.close(fig)

With this bar plot, we can see a higher representation of instances corresponding to Africa in the dataset.

Now that we've ...