How to plot a parallel coordinates chart in Pandas

Pandas is one of the most popular Python libraries used for data manipulation and visualization. Real-world problems are often multivariate in nature because they require several features or variables to predict an outcome accurately.

Parallel coordinate charts

Parallel coordinate charts display variables, both on their own axis and scale, as opposed to the traditional x and y axes that are used in other plots. Since each feature is now represented in its own axis, data points appear to be connected through lines. If these lines are parallel, it means that the relationship between these variables is positive. Otherwise, they have a negative relationship.

Cons

The downside of these charts is that too many variables and data points crowd the chart, and no meaningful patterns are visible. It’s advisable to have a few variables showcased in each case or use a technique named brushing that allows you to emphasize only on some data points and ignore the noise.

The other disadvantage is that each variable has its own scale and axis. In cases where the variation is high, data normalization is recommended for better results.

These charts are recommended for data that has perceived classes in it. Otherwise, for regression problems, the results might not be satisfactory.

Advantages

Parallel coordinates charts can be used to achieve the below:

Visualize different features at the same time without plotting univariate charts.
Detect outliers that exist in a dataset.
Compare characteristics in different classes in classification problems.
Feature selection and dimensionality reduction as these charts can reveal variables that have similar characteristic patterns.
Communicate complex patterns that exist in the data.

How to plot a parallel coordinates chart in Pandas

The syntax to plot a parallel coordinates chart is as below:

Syntax

pandas.plotting.parallel_coordinates(
frame, class_column, cols=None, ax=None, color=None, use_columns=False, xticks=None, colormap=None, axvlines=True, axvlines_kwds=None, sort_labels=False, **kwargs
)

Parameters

frame [required] refers to the data frame to be used.
class_column [required] column name containing the class names.
cols [optional] refers to the list of column names to be used.
ax [optional] refers to the axis object to be used.
color[optional] refers to the colors to be used for the different classes.
use_columns's [optional] default is False. If True, columns will be used as the xticks.
xticks [optional] is used to pass a list of values for xticks.
colmap[optional] is used to indicate colormap to use for the lines.
axvlines's [optional] default is True. This adds vertical lines at each xtick.
axvlines_kwds is [optional] used to provide for options to the axvlines above.
sort_labels[optional] is used to sort the class labels especially if colors have been used.
kwargs stands for keyword arguments. This means that more parameters can be passed to customize the chart.

Example code

Note: We are using Python 3.10.4 in this answer.

Code explanation

Lines 1-3: We import necessary libraries.

Lines 5-8: We create a DataFrame consisting of 5 columns, 50 rows and 2 classification classes.

Lines 13-17: We use Pandas to plot a parallel coordinates chart, indicating “y” as the class_column for reference.

We notice how these two classes have distinct variations, with class 1 represented by blue, taking up higher values in all variables, while class 0 is on the lower side of the different axes.