What is the RadViz plot in pandas?
RadViz plot is a visualization technique used to display a dataset’s features in a circle, simply put, as a 2-dimension dataset.
RadViz plots display the data points in a circle by showing which data points tend to have certain features and therefore have similar characteristics.
Since the position of each data point is determined by the values of all the variables that make it, data points with extreme characteristics will appear on extreme sides of the circle while those that stand out less will mostly be at the center of the circle.
It’s helpful in displaying cluster attributes or different class characteristics in classification problems as they allow for as many features as will fit within a circle’s perimeter to be displayed and therefore well visualized.
Practical use of RadViz plots
Suppose we have a dataset containing diabetes patients. This dataset has 10 features that contain the patients’ names, ages, blood sugar levels, and other relevant characteristics useful in identifying whether a patient has diabetes or not.
In this case, a RadViz plot will be helpful in displaying which features are related to patients with diabetes and which ones are not.
This allows for distinguishable characteristic features belonging to a certain class, to be known.
Implementing RadViz plots in pandas
Syntax
pandas.plotting.radviz(frame, class_column, ax=None, color=None, colormap=None, **kwds)
Parameters
frame: Dataset/dataframe under consideration
class_column: Column containing the class attribute
ax(optional): A plot to which the information will be added
color(optional): Assigns a color to each category.
colormap(optional): Colormap to select colors from
**kwds: Options to pass to matplotlib scatter plotting methods
Coding example
import pandas as pdfrom sklearn.datasets import load_irisimport matplotlib.pyplot as pltdata = load_iris(as_frame = True)df = pd.DataFrame(data=data['data'],columns=data['feature_names'])df['class'] = data.targetplt.figure(figsize = (10,6),dpi=600)pd.plotting.radviz(frame = df,class_column = 'class')
Explanation
Lines 1 and 3: We import needed libraries.
Lines 5–7: We load the iris dataset into a DataFrame.
Line 10: We plot the RadViz plots of the dataset.
Conclusion
The RadViz plot shows three classes of the iris dataset with their different characteristic features along a 2-D space/circle.
-
If two variables have a strong correlation, their data points will be located close to each other on the RadViz plot. Conversely, if two variables are uncorrelated, their dots will be located far apart from each other. This RadViz plot, therefore, clearly shows that classes
1and2are strongly correlated. -
Further, if the dots for each species are well separated, it suggests that the species can be differentiated based on the features. So, it’s clear that class
0can be differentiated from classes1and2on the basis of features. -
Lastly, it is also evident that sepal width has more influence in the classification of class
0from other classes.
Free Resources