Data Structures
Learn what kind of data structures work in pandas.
We'll cover the following
The pandas library includes two main data structures—Series and DataFrame—and associated functions for manipulating them. First, we’ll look at Series because a DataFrame can be considered as a collection of columns represented as Series objects.
pandas data model
One of the keys to understanding pandas is to understand the data model. At the core of pandas are two data structures. The most widely used data structures are Series and DataFrame for dealing with array data and tabular data. This table shows their analogs in the spreadsheet and database world.
Different dimensions of pandas data structures
Data Structure | Dimensionality | Spreadsheet Analog | Database Analog | Linear Algebra |
Series | 1D | Column | Column | Column Vector |
DataFrame | 2D | Single Sheet | Table | Matrix |
DataFrame vs Series
An analogy with the spreadsheet world illustrates the basic differences between these types. A DataFrame is similar to a sheet with rows and columns, while a Series is similar to a single column of data (when we refer to a column of data in this text, we are referring to a Series).
Diving into these core data structures a little more is helpful because a bit of understanding goes a long way toward better use of the library. We will spend a good portion of time discussing Series and DataFrame. Both Series and DataFrame share features. For example, they both have an index, which we’ll need to examine to understand how pandas work.
Also, because a DataFrame can be thought of as a collection of columns that are really Series objects, it’s imperative that we have a comprehensive understanding of Series first. Additionally (and perhaps oddly to some), we’ll see this:
Note: A DataFrame can have one or many Series.
When we iterate over rows, the rows are represented as a Series (however, if we find ourselves consistently dealing with rows instead of columns, we’re probably not using pandas in an optimal way).
Sometimes we compare the data structures to Python lists or dictionaries. However, doing so doesn’t provide much benefit. Mapping the list and dictionary methods on top of pandas’ data structures just leads to confusion.
Summary
The pandas library includes two main data structures and associated functions for manipulating them. This course will focus on the Series and DataFrame data structures, beginning with Series since a DataFrame can be considered a collection of columns represented as Series objects.
If we had a spreadsheet with data, which pandas data structure would we use to hold the data?
Series
DataFrame
List
None of the above