Search⌘ K
AI Features

Visualization with Count Plots

Explore how to visualize categorical data with count plots using Seaborn. Learn to create basic and grouped count plots, customize colors and labels, adjust plot orientation, and refine plot styling. Understand how to organize category orders and manipulate plot saturation to enhance clarity and visual appeal.

Overview

Count plots are used to count the number of instances each category has. We use it for categorical data to represent the number of instances of each unique category in the form of bars.

Plotting count plots

Let’s start by importing the required libraries and the tips dataset from seaborn library using the sns.load_dataset() function. Next, we can view the first five data records to get an overview of the data using the pandas head() function.

Python
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme()
tips_df = sns.load_dataset('tips')
print(tips_df.head())

Let’s check the ratio of male and female customers in the tips dataset by passing x='sex' to the sns.countplot() function. The sns.countplot() function returns a bar container with all “artists” describing the bars in a count plot. Next, we call the bar_label() function to add labels to the bar. The function takes a container as input to add labels to the bars of the count plot.

Python
cplot = sns.countplot( data = tips_df , x ='sex') #returns bar container
plt.bar_label(cplot.containers[0])
plt.savefig('output/graph.png') # save figure

In the example above, we saw that different colors represent two categories. However, we can also set it to one color by specifying the required color in the color parameter of the sns.countplot() function. We can also customize the bar labels by selecting padding=2 (the distance between the bars and labels), color='red', fontsize=15, and label_type ='edge' (displays the labels on top of the bars). Moreover, adding the labels on top of the bars clutters them, so we increase the range of the y-axis using the ylim() function.

Python
cplot = sns.countplot( data = tips_df , x ='sex', color ='green')
plt.bar_label(cplot.containers[0], padding=2, color='red',
fontsize=15, label_type= 'edge') # customise bar labels
plt.ylim(top = 180) # customise axis range
plt.savefig('output/graph.png')

Styling count plots

To color encode the count plot, we pass the hue parameter in the sns.countplot() function. As shown in the plot, we categorize the male and female customers based on whether or not they smoke. We can see that the number of male smokers is higher than female smokers. Moreover, to label the bars, we call the bar_label() function twice because it’s a grouped count plot with two groups in each bar. We display red labels on top of the bars for the first group (smokers) and black labels in the center of the bars for the second group (non-smokers).

Python
c_plot = sns.countplot( data = tips_df , x ='sex', hue='smoker')
plt.bar_label(c_plot.containers[0], padding=1, color='red',
fontsize=15, label_type= 'edge') # first group
plt.bar_label(c_plot.containers[1], padding=1, color='black',
fontsize=15, label_type= 'center') # second group
plt.ylim(top = 120)
plt.savefig('output/graph.png')

For further visualizations, we can import the penguins dataset and store it in the penguins_df DataFrame. First, we reduce the font size with the sns.set() function so that complete species names are visible on the plot. Next, to change the orientation of a count plot from vertical to horizontal, we pass the species column to the y parameter in the sns.countplot() function. We can also customize the bar labels because this is a horizontal plot. To do this, we can increase the x-axis range to fit the labels and the bars within the figure using the xlim() function.

Python
sns.set(font_scale = 0.9)
c_plot = sns.countplot(y='species', data = penguins_df)
plt.bar_label(c_plot.containers[0], padding=1, color='black',
fontsize=12, label_type= 'edge')
plt.xlim(right=170) # set xlim for horizontal plot
plt.savefig('output/graph.png')

In a count plot, we count the number of instances for each categorical column. We can verify this using the pandas functions, as shown in the code below. We access the species column from the DataFrame and call the value_counts() function on it. We can see each unique category’s count precisely mapped in the count plot.

Python
print(penguins_df.species.value_counts())

We can change the order in which we display different categories in our count plot by specifying our required order in the order parameter of the sns.countplot() function. For example, we pass order = ['Adelie','Gentoo','Chinstrap'] to the function. The first bar represents the Adelie species, the second bar represents the Gentoo species, and so on. We can also call the bar_label() function to customize the bar labels.

Python
c_plot = sns.countplot(x='species', data = penguins_df,
order = ['Adelie','Gentoo','Chinstrap'])
plt.bar_label(c_plot.containers[0], padding=1, color='red',
fontsize=15, label_type= 'edge')
plt.ylim(top = 170)
plt.savefig('output/graph.png')

Similarly, we can specify how our color encoded categories are shown in the plot using the hue_order parameter. We can specify any order of hue categories in the parameter or limit the number of categories we want color encoded. For example, we only use two values for the island variable instead of three in the plot below:

Python
c_plot = sns.countplot(x='species', data = penguins_df,
hue = 'island', hue_order=['Torgersen','Dream'])
plt.bar_label(c_plot.containers[0], padding=1, color='black',
fontsize=15, label_type= 'edge') # first group
plt.bar_label(c_plot.containers[1], padding=1, color='black',
fontsize=15, label_type= 'edge') # second group
plt.ylim(top=100)
plt.savefig('output/graph.png')

We can verify this from the pandas library, as shown in the code below. First, we group our data for species, then we access the island column and call the value_counts() function to display the count against each category. We can see that the Gentoo species is only found on the Biscoe island. As demonstrated in the count plot, we see no bar for the Gentoo species.

Python
print(penguins_df.groupby('species').island.value_counts())

We can style our count plots using the saturation parameter in the sns.countplot() function. The lower the saturation level, the duller the plot is, and the higher the saturation level, the brighter the plot is. In the count plot below, we specify saturation = 0.4 in the sns.countplot() function. The colors of the plot are fairly dull.

Python
c_plot = sns.countplot(data=tips_df , x='sex', hue = 'time', saturation = 0.4)
plt.bar_label(c_plot.containers[0], padding=1, color='black',
fontsize=15, label_type= 'edge') # first group
plt.bar_label(c_plot.containers[1], padding=1, color='red',
fontsize=15, label_type= 'edge') # second group
plt.ylim(top = 140) # set y-axis limit
plt.savefig('output/graph.png')

Similarly, in the count plot below, we specify saturation=3.0 in the sns.countplot() function. As a result, the colors of the plot are rather bright.

Python
c_plot = sns.countplot(data=tips_df , x='sex', hue = 'time', saturation = 3.0)
plt.bar_label(c_plot.containers[0], padding=1, color='black',
fontsize=15, label_type= 'edge') # first group
plt.bar_label(c_plot.containers[1], padding=1, color='red',
fontsize=15, label_type= 'edge') # second group
plt.ylim(top = 140)
plt.savefig('output/graph.png')