Joint Plots
Learn how to create joint plots using Seaborn to visualize the relationship between two variables along with their distributions. Discover different plot types like scatter, KDE, regression, and histograms, and explore customization options for styling and enhancing your visualizations effectively.
We'll cover the following...
Overview
A joint plot allows us to see the relationship between two variables and the distribution of the variables together. It combines bivariate and univariate graphs in a single plot.
Plotting joint plots
To get started, we import the required libraries and storing the mpg dataset in the DataFrame mpg_df (after removing the null values). Let’s draw a joint plot for the horsepower and mpg variables using the sns.jointplot() function. A joint plot plots a relational plot as the main plot and a distribution plot along the axis.
We can observe from the plot above that as the horsepower increases, we see a decrease in mpg. This results in a negative correlation between mpg and horsepower. Similarly, on the x-axis, we see the horsepower marginal distribution represented in histograms. We see that most cars have a horsepower between 50–100. Similarly, on the y-axis, we see that most cars have an mpg below 40. Very few cars have an mpg above 40.
The scatter plot is the default kind of joint plot. We can create six different plots in the joint plot by specifying them in the kind argument. For example, let’s pass kind='kde' in the sns.jointplot() function below. Moreover, we can apply all the styling properties to a KDE plot drawn from a joint plot, just as we pass shaded=True to shade a KDE plot. We can observe the data distribution in terms of the probability distribution function. The darker areas of the KDE plot represent a denser area with more observations.
Let’s set the kind='reg' in the sns.jointplot() function below to plot a regression plot. This draws a regression line to fit the data points on a straight line, and the histogram and the KDE plot are fit onto the x and y axes.
We can plot a bivariate histogram in a joint plot by setting kind='hist' in the sns.jointplot() function. The darker areas of the bivariate histogram represent high density (greater number of observations). For example, we can observe from the plot below that most cars have an mpg between 20–25 (these areas appear darker than the rest).
By default, the sns.jointplot() returns a JointGrid object. We can use it to add multiple plots on top of one another. We call the sns.jointplot() function and save the JointGrid object returned in the variable j_plt in the code below. Next, we use the j_plt variable to call the plot_joint() function and pass sns.scatterplot to plot a scatter plot in the main plot. We also pass the styling parameters such as color='gray' to change the point’s color and alpha=0.5 to decrease the transparency of the points. The default value of alpha is 1.
We can observe from the plot above that it has more scatter points clustered in the darker regions of the KDE plot, representing high density.
Styling
We can color encode the joint plot by passing hue='origin' in the sns.jointplot() function. It’s evident from the plot below that most of the American-origin cars have a higher horsepower.
There are several other parameters available to style the joint plots. For example, we have height to adjust the height of the figure, space to determine the space between the joint and marginal axes, and ratio to determine the ratio of joint and marginal axes height. If we marginalticks to True, we’ll have an axis scale on the marginal axis. Finally, we can use palette to change the theme, as shown in the plot below:
We can customize the styling of plots in the joints of the plot using the joint_kws. The kind of arguments we can pass to joint_kws depends on the type of plot used in the joint plot. By default, a joint plot draws a scatter plot, so all the styling we can apply to scatter plots can be passed to the joint_kws parameter. The styling parameters can be modified accordingly if we change the type of plot in a joint plot.
Similarly, we can customize the plots on the joint plot axis by using marginal_kws, using the styling parameters that can be applied to the kind of plot on the marginal axis, as shown in the code below. The default plot on the marginal axis is a histogram, so we can use all the styling properties applicable to histograms here.
We can use different styles for each plot overlaid with the JointGrid object. For example, in the code below, we call the sns.jointplot() function and save the JointGrid object returned in the variable j_plt. Next, we use the j_plt variable to call the plot_joint() function and pass sns.kdeplot to plot a KDE plot on the main plot, along with styling options applicable to the KDE plot. Likewise, we use j_plt to call the plot_marginals and pass sns.rugplot in the function, along with styling options applicable to rug plots, as shown in the code below.