Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

python
pandas

What is Pandas library in Python?

Hassaan Waqar

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Pandas is widely used for data manipulation and analysis in Python. It is built on top of the Matplotlib and NumPy. Thus, it offers a variety of functions for both handling data and visualizing them.

Structure of Pandas

Pandas stores data as series and dataframes.

  • A series is a single column in Pandas. It has a 1-dimensional structure.
  • A dataframe is a series collection (multiple columns) and thus has a 2-dimensional structure.

Both series and dataframe have indices.

  • Indices are used to identify individual records (rows) in Pandas.

The illustration below shows a dataframe, series, and indices:

Series and Dataframe

Reading files

Pandas can be used to read a variety of file formats. Each file is converted to a dataframe once it is read.

Some widely used file formats are listed below:

  • .csv
  • .xlsx
  • .json
  • .xml
  • .html
  • .SQL

Data manipulation

Pandas can be used to perform functions on individual series and entire dataframes. This includes finding descriptive statistics (mean, median, and mode), grouping data based on specific conditions, filtering out rows and columns, merging data, and dealing with missing values.

Data visualization

Pandas is built upon Matplotlib, which offers extensive support for visualizations. We can draw a variety of plots which include:

  • Histograms
  • Bar plots
  • Pie charts
  • Box plots
  • Line plots
  • Scatter plots
  • Rug plots
  • Mosaic plots
  • Area plots
  • Lag plots

The illustration below shows some of the plots in Pandas:

Plots in Pandas (image from Python Awesome)
Plots in Pandas (image from Python Awesome)

Data science in Pandas

Pandas is widely used to perform the entire process of data science. This includes reading vast amounts of data from different formats, cleaning the data, performing exploratory data analysis (EDA), plotting visualizations, conducting statistical learning, and machine learning.

RELATED TAGS

python
pandas

CONTRIBUTOR

Hassaan Waqar
Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring