Series Introduction

Learn the basics of pandas' Series data structure in this lesson.

A Series is used to model one-dimensional data. The Series object also has a few more bits of data, including an index and a name. A common idea in pandas is the notion of an axis. Because a series is one-dimensional, it has a single axis—the index.

Below is a table of counts of songs several artists composed. We’ll use this to explore the series:

Counts of songs artists composed

Artist

Data

0

145

1

142

2

38

3

13

Data representation in Python

If we wanted to represent this data in pure Python, we could use the Dictionary data structure. The series dictionary has a list of the data points stored under the data key. In addition to an entry in the dictionary for the actual data, there is an explicit entry for the corresponding index values for the data (in the index key) as well as an entry for the name of the data (in the name key):

series = {'index':[0 , 1, 2, 3], 'data':[145 , 142, 38, 13], 'name':'songs'}
print(series)

The get function defined below can pull items out of this data structure based on the index:

series = {'index':[0 , 1, 2, 3], 'data':[145 , 142, 38, 13], 'name':'songs'}
def get(series , idx ):
value_idx = series['index'].index(idx)
return series['data'][value_idx]
print(get(series , 1))

The index abstraction

This double abstraction of the index seems unnecessary at first glance—a list already has integer indexes. But there is a trick up pandas’ sleeves. By allowing non-integer values, the data structure supports other index types such as strings and dates as well as arbitrarily ordered indices or even duplicate index values.

Below is an example that has string values for the index:

songs = {
'index':['Paul', 'John', 'George', 'Ringo'],
'data':[145, 142, 38, 13],
'name':'counts'
}
print(get(songs, 'John'))

The index is a core feature of pandas’ data structures given the library’s past in the analysis of financial data or time-series data. Many of the operations performed on a Series operate directly on the index or by index lookup.

The pandas Series

With that background in mind, let’s look at how to create a Series in pandas. It’s easy to create a Series object from a list:

import pandas as pd
songs2 = pd.Series([145, 142, 38, 13], name='counts')
print(songs2)

When the interpreter prints our Series, pandas makes the best effort to format it for the current terminal size. The series is one-dimensional. However, it looks like it’s two-dimensional. The leftmost column is the index. The index is not part of the values. The generic name for an index is an axis, and the values of the index—0, 1, 2, 3—are called axis labels. The data—145, 142, 38, and 13—are also called the values of the series. The two-dimensional structure in pandas—DataFrame—has two axes, one for the rows and another for the columns.

The rightmost column in the output contains the values of the series—145, 142, 38, and 13. In this case, they’re integers (the console representation says dtype: int64, in which dtype means data type and int64 means 64-bit integer), but in general, the values of a Series can hold strings, floats, booleans, or arbitrary Python objects.

To get the best speed (and to leverage vectorized operations), the values should be of the same type, though this is not required. It’s easy to inspect the index of a Series (or DataFrame), since it’s an attribute of the object:

x = songs2.index
print(x)

The default values for an index are monotonically increasing integers. songs2 has an integer-based index.

The index can be string-based as well, in which case pandas indicates that the data type for the index is the object (not string):

songs3 = pd.Series([145, 142, 38, 13],
name='counts',
index=['Paul', 'John', 'George', 'Ringo'])
print(songs3)

Note: The dtype that we see when we print a Series is the type of the values, not the index. Even though this looks two-dimensional, remember that the index is not part of the values.

When we inspect the index attribute, we see that the dtype is an object:

x = songs3.index
print(x)

The actual data (or values) for a series does not have to be numeric or homogeneous. We can insert Python objects into a series:

class Foo:
pass
ringo = pd.Series(['Richard', 'Starkey', 13, Foo()], name='ringo')
print(ringo)

In the above case, the dtype—data type—of the Series is the object (meaning a Python object). This can both be good or bad.

The object data type is also used for a series with string values. In addition, it’s also used for values that have heterogeneous or mixed types. If we have only numeric data in a Series, we wouldn’t want it stored as a Python object but rather as an int64 or float64, which allows us to do vectorized numeric operations.

If we have time data and it says that it has the object type, we probably have strings for the dates. Using strings instead of date types is bad because we don’t get the date operations that we would get if the type were datetime64[ns]. A series with string data, on the other hand, has the object type. Don’t worry; we’ll see how to convert types later in the course.