Introduction

Learn about the data manipulation library pandas in this lesson.

What is pandas?

Pandas is an open-source Python library that provides powerful, flexible and high-performance tools to process data. It was developed in 2008 by Wes McKinney.

Some of its key features are as follows:

  • Provides high-performance DataFrame objects with effective indexing.
  • Provides tools for loading data into memory with multiple file formats.
  • Provides high performance for operations such as data merges and joins.
  • Supports manipulation of time series data.
  • Easy to manipulate row and column data.
  • Supports SQL-like operations.
  • Supports vectorized operations.
  • Provides label-based slicing, indexing, and subsetting of large data sets.

The standard Python installation doesn’t contain this library, so you can install it using pip as below.

pip install pandas

What you will learn from this course

  • How to manipulate Tabular data.
  • Aggregation data on multiple dimensions.
  • SQL-like data join, group, and sort.
  • Powerful data filtering.
  • I/O of multiple file formats.
  • Extensive type support, such as int, float, string, and datetime.
  • Advanced usage of pandas, such as memory usage reduction, sped up file loading and sped up operations.