It’s versatile, highly customizable, and integrates well with libraries like pandas and NumPy.
Data visualization with Matplotlib
Key takeaways:
Matplotlib supports a wide range of plot types, from basic charts to advanced 3D and animated plots.
Matplotlib offers extensive options to tailor plots and integrates well with libraries like pandas and NumPy.
Matplotlib is beginner-friendly with a simple interface with advanced features for professional-grade visualizations.
Matplotlib is a versatile Python library that empowers data scientists and analysts to create various visualizations. Matplotlib gives us the means to visualize data in a variety of ways, from straightforward line plots to complex 3D representations. Users can tailor plots to specific needs by leveraging its extensive customization options, enhancing data exploration and insight extraction.
Installing Matplotlib
Use the pip command to install this library:
pip install matplotlib
Importing pyplot from matplotlib
The Matplotlib library contains the pyplot module, which offers a MATLAB-like interface for making visualizations.
It offers a stateful approach, meaning that each function call modifies the current figure or axes. This makes it easy to create quick and simple plots without needing to explicitly create figure and axes objects.
import matplotlib.pyplot as plt
Why use Matplotlib?
Comprehensive visualization tools: Matplotlib covers a wide variety of plot types and supports advanced features like subplots, annotations, and 3D visualizations.
Highly customizable: Create professional, publication-ready graphs by tweaking fonts, colors, line styles, and more.
Seamless integration: Works seamlessly with other libraries like NumPy, pandas, and seaborn for extended functionality.
Open source and widely supported: Free to use, with active community support and extensive documentation.https://how.dev/answers/how-to-create-a-line-chart-using-d3
Plotting with Matplotlib
We can create a whole variety of plots using Matplotlib, with some examples listed below:
- Line charts: Best for visualizing trends over time or other continuous data.
- Bar charts: Ideal for comparing categories or groups.
- Histograms: Represent the frequency distribution of numerical data.
- Scatter plots: Highlight relationships or correlations between two variables.
- Pie charts: Show proportions of a whole.
- Subplots: Enable multiple plots in a single figure for side-by-side comparisons.
Basics of plotting using Matplotlib
A plot contains a few important elements that you can add using this library:
Adding a title: Sets the main title of the plot.
matplotlib.pyplot.title(label, fontdict=None, loc=’center’, pad=None, **kwargs)
Adding X and Y labels: Sets the x-axis- and y-axis labels to describe the data.
matplotlib.pyplot.xlabel(xlabel, fontdict=None, labelpad=None, **kwargs)matplotlib.pyplot.ylabel(ylabel, fontdict=None, labelpad=None, **kwargs)
Setting limits and tick labels: Defines the range of values displayed on the axes and customizes the tick marks and their labels.
matplotlib.pyplot.xticks([x1, x2, x3], ['label1', 'label2', 'label3'])matplotlib.pyplot.yticks([y1, y2, y3], ['label1', 'label2', 'label3'])
Adding legends: Creates a legend to identify different plot elements.
matplotlib.pyplot.legend(['label1', 'label2', 'label3'])
Line chart
In Matplotlib, a line chart is a graphic depiction of data points joined by straight lines. It is helpful for displaying correlations, trends, and patterns among continuous variables or over time.
# Importing required librariesimport numpy as npimport pandas as pdimport matplotlib.pyplot as plt# Generation of variablesx=np.arange(0,10) #Array of range 0 to 9y=x**3# Printing the variablesprint(x)print(y)plt.plot(x,y) # Function to plotplt.title('Line Chart') # Function to give title# Functions to give x and y labelsplt.xlabel('X-Axis')plt.ylabel('Y-Axis')# Functionn to show the graphplt.show()
Line 18: This line generates a line plot, where
xandyare plotted as continuous points connected by a line.
Multiple line chart
A multiple line chart in Matplotlib is a visualization technique used to compare trends of multiple datasets over a common x-axis.
# importing required librariesimport numpy as npimport pandas as pdimport matplotlib.pyplot as plt# Generation of 1 set of variablesx = np.arange(0,11)y = x**3# Generation of 1 set of variablesx2 = np.arange(0,11)y2 = (x**3)/2# Printing all variablesprint(x,y,x2,y2,sep="\n")# "linewidth" is used to specify the width of the lines# "color" is used to specify the colour of the lines# "label"is used to specify the name of axes to represent in the lengendplt.plot(x,y,color='r',label='first data', linewidth=5)plt.plot(x2,y2,color='y',linewidth=5,label='second data')plt.title('Multiline Chart')# Uses the label attribute to display reference in legendplt.ylabel('Y axis')plt.xlabel('X axis')# Shows the legend in the best postion with respect to the graphplt.legend()plt.show()
Lines 21–22: These lines plot multiple line plots with additional customization: color (
'r','y'), line width (linewidth=5), and a legend (label='first data',label='second data').
Bar chart
A bar chart is a data visualisation in which various categories are represented by rectangular bars or columns. Each bar’s length reflects the value it stands for.
# Importing required librariesimport numpy as npimport pandas as pdimport matplotlib.pyplot as plt# Generation of variablesx = ["India",'USA',"Japan",'Australia','Italy']y = [6,7,8,9,2]# Printing the variablesprint(x)print(y)plt.bar(x,y, label='Bars1', color ='r') # Function to plot# Function to give x and y labelsplt.xlabel("Country")plt.ylabel("Inflation Rate%")# Function to give heading of the chartplt.title("Bar Graph")# Function to show the chartplt.show()
Line 14: This line generates a bar chart with bars represented by the
xandydata points. The color is set to red (color='r') and a label is added for reference in a legend.
Multiple bar chart
A multiple bar chart, also known as a grouped bar chart, is used to compare multiple categories across different groups. It’s particularly useful for visualizing comparisons between different groups or time periods.
# importing required librariesimport numpy as npimport pandas as pdimport matplotlib.pyplot as plt# Generation of 1 set of variablesx = ["India",'USA',"Japan",'Australia','Italy']y = [6,7,8,9,5]# Generation of 2 set of variablesx2 = ["India",'USA',"Japan",'Australia','Italy']y2 = [5,1,3,4,2]# Printing all variablesprint(x,y,x2,y2,sep="\n")# Functions to plotplt.bar(x,y, label='Inflation', color ='y')plt.bar(x2,y2, label='Growth', color ='g')# Functions to give x and y labelsplt.xlabel("Country")plt.ylabel("Inflation & Growth Rate%")plt.title("Multiple Bar Graph")plt.legend()plt.show()
Line 18–19: These lines generate multiple bar charts with different sets of data. Each bar chart is given a label (
label='Inflation',label='Growth') and a different color ('y','g').
Histogram
A histogram graphically represents the distribution of numerical data. It counts the number of data points in each bin after dividing the data into bins. The height of each bar in the histogram shows the frequency of data points within each bin.
import numpy as npimport pandas as pdimport matplotlib.pyplot as plt# Generation of variablestock_prices = [32,67,43,56,45,43,42,46,48,53,73,55,54,56,43,55,54,20,33,65,62,51,79,31,27]# Function to show the chartplt.figure(figsize = (8,5))plt.hist(stock_prices, bins = 5)
Line 11: This line creates a histogram of the
stock_pricesdata. It divides the data into 5 bins (bins=5), showing the frequency distribution.
Scatter plot
Data points are represented graphically on a two-dimensional plane in a scatter plot. It’s helpful for illustrating how two numerical variables relate to one another. On the plot, each data point is represented by a dot, whose location is established by its x- and y- coordinates.
# Importing required librariesimport numpy as npimport pandas as pdimport matplotlib.pyplot as plt# Generation of x and y variablesx = [1,2,3,4,5,6,7,8]y = [5,2,4,2,1,4,5,2]# Function to plot the graphplt.scatter(x,y)plt.xlabel('x')plt.ylabel('y')plt.title('Scatter Plot')
Line 11: This line generates a scatter plot, where individual points are plotted based on their coordinates (
xandy).
Pie chart
A pie chart is a circular diagram with slices that each show a different percentage of the total. It’s useful for visualizing categorical data and showing the relative sizes of different categories.
# Importing required librariesimport numpy as npimport pandas as pdimport matplotlib.pyplot as plt# Collection of raw dataraw_data={'names':['Nick','Sani','John','Rubi','Maya'],'jan_score':[123,124,125,126,128],'feb_score':[23,24,25,27,29],'march_score':[3,5,7,6,9]}# Segregating the raw data into usuable form/variablesdf=pd.DataFrame(raw_data,columns=['names','jan_score','feb_score','march_score'])df['total_score']=df['jan_score']+df['feb_score']+df['march_score']# Printing the dataprint(df)# Function to plot the graphplt.pie(df['total_score'],labels=df['names'],autopct='%.2f%%')plt.axis('equal')plt.axis('equal')plt.show()
Line 20: This line creates a pie chart, where each slice represents the
total_scoreof each individual, with the names labeled, and the percentage is displayed (autopct='%.2f%%').
Advanced plotting: Subplots
Using subplots, you can create several plots inside a single figure. This is helpful for visualizing several variables, comparing different datasets, and decomposing complex data into smaller, more focussed plots.
# Importing required librariesimport numpy as npimport pandas as pdimport matplotlib.pyplot as plt# Defining the sixe og the figuresplt.figure(figsize=(10,10))# Generation of variablesx = np.array([1,2,3,4,5,6,7,8])y = np.array([5,2,4,2,1,4,5,2])# Generating 4 subplots in form of 2x2 matrix# In the line below the arguments of plt.subplot are as follows:# 2- no. of rows# 2- no. of columns# 1- position in matrix# Position (0,0)plt.subplot(2,2,1)plt.plot(x,y,'g')plt.title('Sub Plot 1')plt.xlabel('X-Axis')plt.ylabel('Y-Axis')# Position (0,1)plt.subplot(2,2,2)plt.plot(y,x,'b')plt.title('Sub Plot 2')plt.xlabel('X-Axis')plt.ylabel('Y-Axis')# Position (1,0)plt.subplot(2,2,3)plt.plot(y*2,x*2,'y')plt.title('Sub Plot 3')plt.xlabel('X-Axis')plt.ylabel('Y-Axis')# Position (1,1)plt.subplot(2,2,4)plt.plot(x*2,y*2,'m')plt.title('Sub Plot 4')plt.xlabel('X-Axis')plt.ylabel('Y-Axis')# Function for layout and spacingplt.tight_layout(h_pad=5, w_pad=10)
Line 19: This line creates a grid of subplots (2 rows and 2 columns) in the same figure. Each subplot contains a different plot, and
plt.subplot()is used to specify the position of the plot within the grid.
Elevate your data science expertise with “Matplotlib for Python: Visually Represent Data with Plots.” Learn to craft stunning plots, manage axes, and create intricate layouts to showcase your data insights.
Conclusion
Matplotlib is a robust and flexible library for data visualization in Python. Its extensive customization options, compatibility with other libraries, and range of visualization types make it an essential tool for anyone working with data. Whether you’re a beginner exploring simple plots or an expert creating complex visualizations, Matplotlib has you covered.
Frequently asked questions
Haven’t found what you were looking for? Contact Us