Python offers several libraries, like seaborn, Matplotlib, and pandas for data manipulation and visualization. **Data visualization** visually represents data through graphs, charts, and plots to recognize patterns, trends, and relations between variables in simple and complex data. We can process information and display data efficiently, especially for visual learners. For this Answer, we’ll plot our scatter matrix using `pandas.plotting.scatter_matrix()`

.

A **scatter matrix** plots all the variables in the data against each other. Suppose the total variables in the dataset are

To get started, simply install Python and its packages, `numpy`

, `pandas`

, and `matplotlib`

. Then, we’ll import these packages in our Python file. The following command shows how to install these packages:

!pip install numpy pandas matplotlib

Install dependencies

The following command shows how to import them:

import numpy as npimport pandas as pdimport matplotlib.pyplot as plt

Import dependencies

Now, we’ll go over the real players that enable the plotting of a scatter matrix using `pandas.plotting.scatter_matrix()`

.

Note:All the parameters specified below are optional except for`dataframe`

.

pandas.plotting.scatter_matrix(dataframe, alpha=0.5, figsize=None, ax=None, grid=False, marker=".", diagonal="hist", hist_kwds=None, density_kwds=None, range_padding=0.05, **kwargs)

Paramter list for pandas.plotting.scatter_matrix()

`dataframe`

**:**This is the Pandas DataFrame object.`alpha`

**:**It’s the amount of transparency as a floating point value is specified by this variable.`figsize`

**:**It passes a tuple containing the width and height of the matrix to set the figure size.`ax`

**:**This is`Matplotlib`

’s axis object.`grid`

**:**Passing it a value of`True`

displays the entire grid.`marker`

**:**This defines the shape of the marker—data value displayed on the plot—on the scatter plot.`diagonal`

**:**The diagonal can display a`hist`

—histogram—or a`kde`

— .kernel density estimate plot A plot that visualizes any observations in a dataset. `hist_kwds`

**:**This parameter will be passed keyword arguments for a histogram as a dictionary.`density_kwds`

**:**Similar to`hist_kwds`

, this parameter will be passed arguments for a kernel density estimate plot.`range_padding`

**:**It sets the value of .range padding The additional space added to the minimum and maximum data points in a plot to prevent clutter. `**kwargs`

**:**This is any additional keyword arguments.

Check out how we can plot a histogram and a kernel density estimate plot with this simple function. We can tweak the code by pressing “Run” and manipulating it as we like it on the Jupyter Notebook.

import numpy as np import pandas as pd import matplotlib.pyplot as plt l = ["column1", "column2","column3", "column4"] educatives_dataframe = pd.DataFrame(np.random.randint(0,50,size=(50,4)), columns=l) educatives_dataframe.tail() educatives_scatter_plot = pd.plotting.scatter_matrix(educatives_dataframe, alpha=0.9,figsize=(15,15), grid=True, marker="*", diagonal="hist", hist_kwds={"bins":5,"color":"pink"}, range_padding=0.1, color="red") plt.suptitle("Scatter matrix",fontsize=50) plt.show() educatives_scatter_plot = pd.plotting.scatter_matrix(educatives_dataframe, alpha=0.9,figsize=(15,15), grid=True,marker="D", diagonal="kde", density_kwds={"alpha":0.3, "color":"red"}, range_padding=0.1, color="green") plt.suptitle("Scatter matrix",fontsize=50) plt.show()

Working proof of pandas.plotting.scatter_matrix()

In the code above:

**Lines 1–3:**We make the necessary imports as described above.**Lines 5–8:**These lines initialize a`pandas`

DataFrame object. They create a DataFrame with 50 rows and four columns filled with random values from zero to 50. Ultimately, we display the last values from the generated DataFrame.**Lines 10–12:**Here, we create a scatter matrix. Notably, we’ll see a histogram along the diagonal with five bins.`alpha`

has been set to`0.9`

meaning the plot will mostly be opaque with red-colored markers. We have also used the`suptitle`

function to give a title to the entire matrix, not just to a single plot.**Lines 14–16:**This piece of code does the same, except it would display a kernel density estimate plot instead of histograms along the diagonal.

In conclusion, as we have seen above, `pandas.plotting.scatter_matrix()`

is a useful function for plotting scatter matrices to better visualize our data. This plot shows the correlation between data points since they are plotted against each other.

Copyright ©2024 Educative, Inc. All rights reserved

TRENDING TOPICS