What is pandas.plotting.scatter_matrix()?

Python offers several libraries, like seaborn, Matplotlib, and pandas for data manipulation and visualization. Data visualization visually represents data through graphs, charts, and plots to recognize patterns, trends, and relations between variables in simple and complex data. We can process information and display data efficiently, especially for visual learners. For this Answer, we’ll plot our scatter matrix using pandas.plotting.scatter_matrix().

Scatter matrix

A scatter matrix plots all the variables in the data against each other. Suppose the total variables in the dataset are ${n}$ then the scatter matrix will have ${n}$ total rows and $n$ total columns as well. Thus, these plots let us analyze the correlation between independent variables. The scatter matrix estimates the covariance matrix when we can’t calculate it and can also be used in dimensionality reduction. Observe the diagram given below; note how each diagonal entry’s scatter plot is a histogram while others are just scatter diagrams. This happens because a variable plotted against itself gives a correlation of one. Thus, a histogram or kernel density estimate plot is displayed along the diagonal.

dataframe: This is the Pandas DataFrame object.
alpha: It’s the amount of transparency as a floating point value is specified by this variable.
figsize: It passes a tuple containing the width and height of the matrix to set the figure size.
ax: This is Matplotlib’s axis object.
grid: Passing it a value of True displays the entire grid.
marker: This defines the shape of the marker—data value displayed on the plot—on the scatter plot.
diagonal: The diagonal can display a hist—histogram—or a kde—kernel density estimate plotA plot that visualizes any observations in a dataset..
hist_kwds: This parameter will be passed keyword arguments for a histogram as a dictionary.
density_kwds: Similar to hist_kwds, this parameter will be passed arguments for a kernel density estimate plot.
range_padding: It sets the value of range paddingThe additional space added to the minimum and maximum data points in a plot to prevent clutter..
**kwargs: This is any additional keyword arguments.

Code example

Check out how we can plot a histogram and a kernel density estimate plot with this simple function. We can tweak the code by pressing “Run” and manipulating it as we like it on the Jupyter Notebook.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

l = ["column1", "column2","column3", "column4"]
educatives_dataframe = pd.DataFrame(np.random.randint(0,50,size=(50,4)), columns=l)

educatives_dataframe.tail()

educatives_scatter_plot = pd.plotting.scatter_matrix(educatives_dataframe, alpha=0.9,figsize=(15,15), grid=True, marker="*", diagonal="hist", hist_kwds={"bins":5,"color":"pink"}, range_padding=0.1, color="red")
plt.suptitle("Scatter matrix",fontsize=50)
plt.show()

educatives_scatter_plot = pd.plotting.scatter_matrix(educatives_dataframe, alpha=0.9,figsize=(15,15), grid=True,marker="D", diagonal="kde", density_kwds={"alpha":0.3, "color":"red"}, range_padding=0.1, color="green")
plt.suptitle("Scatter matrix",fontsize=50)
plt.show()

Working proof of pandas.plotting.scatter_matrix()

Explanation

In the code above:

Lines 1–3: We make the necessary imports as described above.
Lines 5–8: These lines initialize a pandas DataFrame object. They create a DataFrame with 50 rows and four columns filled with random values from zero to 50. Ultimately, we display the last values from the generated DataFrame.
Lines 10–12: Here, we create a scatter matrix. Notably, we’ll see a histogram along the diagonal with five bins. alpha has been set to 0.9 meaning the plot will mostly be opaque with red-colored markers. We have also used the suptitle function to give a title to the entire matrix, not just to a single plot.
Lines 14–16: This piece of code does the same, except it would display a kernel density estimate plot instead of histograms along the diagonal.

In conclusion, as we have seen above, pandas.plotting.scatter_matrix() is a useful function for plotting scatter matrices to better visualize our data. This plot shows the correlation between data points since they are plotted against each other.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

What is pandas.plotting.scatter_matrix()?

Scatter matrix

Installation and imports

Syntax

Code example

Explanation