What is pandas.plotting.scatter_matrix()?

Python offers several libraries, like seaborn, Matplotlib, and pandas for data manipulation and visualization. Data visualization visually represents data through graphs, charts, and plots to recognize patterns, trends, and relations between variables in simple and complex data. We can process information and display data efficiently, especially for visual learners. For this Answer, we’ll plot our scatter matrix using pandas.plotting.scatter_matrix().

Scatter matrix

A scatter matrix plots all the variables in the data against each other. Suppose the total variables in the dataset aren{n}then the scatter matrix will haven{n}total rows andnntotal columns as well. Thus, these plots let us analyze the correlation between independent variables. The scatter matrix estimates the covariance matrix when we can’t calculate it and can also be used in dimensionality reduction. Observe the diagram given below; note how each diagonal entry’s scatter plot is a histogram while others are just scatter diagrams. This happens because a variable plotted against itself gives a correlation of one. Thus, a histogram or kernel density estimate plot is displayed along the diagonal.

Installation and imports

To get started, simply install Python and its packages, numpy, pandas, and matplotlib. Then, we’ll import these packages in our Python file. The following command shows how to install these packages:

!pip install numpy pandas matplotlib
Install dependencies

The following command shows how to import them:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Import dependencies

Syntax

Now, we’ll go over the real players that enable the plotting of a scatter matrix using pandas.plotting.scatter_matrix().

Note: All the parameters specified below are optional except for dataframe.

pandas.plotting.scatter_matrix(dataframe, alpha=0.5, figsize=None, ax=None, grid=False, marker=".", diagonal="hist", hist_kwds=None, density_kwds=None, range_padding=0.05, **kwargs)
Paramter list for pandas.plotting.scatter_matrix()
  • dataframe: This is the Pandas DataFrame object.

  • alpha: It’s the amount of transparency as a floating point value is specified by this variable.

  • figsize: It passes a tuple containing the width and height of the matrix to set the figure size.

  • ax: This is Matplotlib’s axis object.

  • grid: Passing it a value of True displays the entire grid.

  • marker: This defines the shape of the marker—data value displayed on the plot—on the scatter plot.

  • diagonal: The diagonal can display a hist—histogram—or a kdekernel density estimate plotA plot that visualizes any observations in a dataset..

  • hist_kwds: This parameter will be passed keyword arguments for a histogram as a dictionary.

  • density_kwds: Similar to hist_kwds, this parameter will be passed arguments for a kernel density estimate plot.

  • range_padding: It sets the value of range paddingThe additional space added to the minimum and maximum data points in a plot to prevent clutter..

  • **kwargs: This is any additional keyword arguments.

Code example

Check out how we can plot a histogram and a kernel density estimate plot with this simple function. We can tweak the code by pressing “Run” and manipulating it as we like it on the Jupyter Notebook.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

l = ["column1", "column2","column3", "column4"]
educatives_dataframe = pd.DataFrame(np.random.randint(0,50,size=(50,4)), columns=l)

educatives_dataframe.tail()

educatives_scatter_plot = pd.plotting.scatter_matrix(educatives_dataframe, alpha=0.9,figsize=(15,15), grid=True, marker="*", diagonal="hist", hist_kwds={"bins":5,"color":"pink"}, range_padding=0.1, color="red")
plt.suptitle("Scatter matrix",fontsize=50)
plt.show()

educatives_scatter_plot = pd.plotting.scatter_matrix(educatives_dataframe, alpha=0.9,figsize=(15,15), grid=True,marker="D", diagonal="kde", density_kwds={"alpha":0.3, "color":"red"}, range_padding=0.1, color="green")
plt.suptitle("Scatter matrix",fontsize=50)
plt.show()
Working proof of pandas.plotting.scatter_matrix()

Explanation

In the code above:

  • Lines 1–3: We make the necessary imports as described above.

  • Lines 5–8: These lines initialize a pandas DataFrame object. They create a DataFrame with 50 rows and four columns filled with random values from zero to 50. Ultimately, we display the last values from the generated DataFrame.

  • Lines 10–12: Here, we create a scatter matrix. Notably, we’ll see a histogram along the diagonal with five bins. alpha has been set to 0.9 meaning the plot will mostly be opaque with red-colored markers. We have also used the suptitle function to give a title to the entire matrix, not just to a single plot.

  • Lines 14–16: This piece of code does the same, except it would display a kernel density estimate plot instead of histograms along the diagonal.

In conclusion, as we have seen above, pandas.plotting.scatter_matrix() is a useful function for plotting scatter matrices to better visualize our data. This plot shows the correlation between data points since they are plotted against each other.

Copyright ©2024 Educative, Inc. All rights reserved