Grouping Our Dataset
Build on the fundamentals of exploratory data analysis by learning grouping and indexing.
We'll cover the following...
Categorical variable visualization
Let's begin our EDA process, starting with our categorical variables, continent
and country
. Let's start with a bar plot to understand the representation of continents in the dataset, so we can answer the following question:
What countries and continents are represented in the dataset?
Press + to interact
import plotlyimport matplotlib.pyplot as pltgapminder_data = plotly.data.gapminder()#Generate a bar plot using the no. of instances of continentsfig = gapminder_data['continent'].value_counts().plot(kind='bar').figure#Print the no. of instancesprint(gapminder_data['continent'].value_counts())#Label the axes and titleplt.xlabel("Continents")plt.xticks(rotation=30, horizontalalignment="center")plt.title("Continents in the Gapminder dataset")plt.ylabel("Number of instances")fig.savefig('output/to.png')plt.close(fig)
With this bar plot, we can see a higher representation of instances corresponding to Africa in the dataset.
Now that we've ...