Visualization with Count Plots
Explore how to visualize categorical data with count plots using Seaborn. Learn to create basic and grouped count plots, customize colors and labels, adjust plot orientation, and refine plot styling. Understand how to organize category orders and manipulate plot saturation to enhance clarity and visual appeal.
We'll cover the following...
Overview
Count plots are used to count the number of instances each category has. We use it for categorical data to represent the number of instances of each unique category in the form of bars.
Plotting count plots
Let’s start by importing the required libraries and the tips dataset from seaborn library using the sns.load_dataset() function. Next, we can view the first five data records to get an overview of the data using the pandas head() function.
Let’s check the ratio of male and female customers in the tips dataset by passing x='sex' to the sns.countplot() function. The sns.countplot() function returns a bar container with all “artists” describing the bars in a count plot. Next, we call the bar_label() function to add labels to the bar. The function takes a container as input to add labels to the bars of the count plot.
In the example above, we saw that different colors represent two categories. However, we can also set it to one color by specifying the required color in the color parameter of the sns.countplot() function. We can also customize the bar labels by selecting padding=2 (the distance between the bars and labels), color='red', fontsize=15, and label_type ='edge' (displays the labels on top of the bars). Moreover, adding the labels on top of the bars clutters them, so we increase the range of the y-axis using the ylim() function.
Styling count plots
To color encode the count plot, we pass the hue parameter in the sns.countplot() function. As shown in the plot, we categorize the male and female customers based on whether or not they smoke. We can see that the number of male smokers is higher than female smokers. Moreover, to label the bars, we call the bar_label() function twice because it’s a grouped count plot with two groups in each bar. We display red labels on top of the bars for the first group (smokers) and black labels in the center of the bars for the second group (non-smokers).
For further visualizations, we can import the penguins dataset and store it in the penguins_df DataFrame. First, we reduce the font size with the sns.set() function so that complete species names are visible on the plot. Next, to change the orientation of a count plot from vertical to horizontal, we pass the species column to the y parameter in the sns.countplot() function. We can also customize the bar labels because this is a horizontal plot. To do this, we can increase the x-axis range to fit the labels and the bars within the figure using the xlim() function.
In a count plot, we count the number of instances for each categorical column. We can verify this using the pandas functions, as shown in the code below. We access the species column from the DataFrame and call the value_counts() function on it. We can see each unique category’s count precisely mapped in the count plot.
We can change the order in which we display different categories in our count plot by specifying our required order in the order parameter of the sns.countplot() function. For example, we pass order = ['Adelie','Gentoo','Chinstrap'] to the function. The first bar represents the Adelie species, the second bar represents the Gentoo species, and so on. We can also call the bar_label() function to customize the bar labels.
Similarly, we can specify how our color encoded categories are shown in the plot using the hue_order parameter. We can specify any order of hue categories in the parameter or limit the number of categories we want color encoded. For example, we only use two values for the island variable instead of three in the plot below:
We can verify this from the pandas library, as shown in the code below. First, we group our data for species, then we access the island column and call the value_counts() function to display the count against each category. We can see that the Gentoo species is only found on the Biscoe island. As demonstrated in the count plot, we see no bar for the Gentoo species.
We can style our count plots using the saturation parameter in the sns.countplot() function. The lower the saturation level, the duller the plot is, and the higher the saturation level, the brighter the plot is. In the count plot below, we specify saturation = 0.4 in the sns.countplot() function. The colors of the plot are fairly dull.
Similarly, in the count plot below, we specify saturation=3.0 in the sns.countplot() function. As a result, the colors of the plot are rather bright.