Effective Data Manipulation with pandas/

...

Working with Time Series

Explore how to manipulate and work with time series data.

We'll cover the following...

Loading the Data
Adding timezone information
Exploring the data
Slicing time series
Missing time series data
Exploring seasonality
Resampling data
Rules with offset aliases
Combining offset aliases
Anchored offset aliases
Resampling to the finer-grain frequency
Grouping a date column with pd.Grouper
Summary

One thing to note when we say “time series” is that we’re not talking about the pandas Series object but rather data that has a date component. Often we’ll have that date component in the index of a pandas Series or DataFrame because that allows us to do time aggregations easily.

Loading the Data

For this section, we’re going to explore a dataset from the US Geologic Survey, which deals with the flow of a river in Utah called the Dirty Devil river. This data is a tab-delimited ASCII file and is described here in detail. The columns are:

agency_cd: Agency collecting data.
site_no: USGS identification number of site.
datetime: Date.
tz_cd: Timezone.
144166_00060: Discharge (cubic feet per second).
144166_00060_cd: Status of discharge. “A” (approved), “P” (provisional), “e” (estimate).
144167_00065: Gauge height (feet).
144167_00065_cd: Status of gauge_height. “A” (approved), “P” (provisional), “e” (estimate).

Here’s the code to load the data. We’ve also included a tweak function that converts the date information to actual dates and renames some columns. Note that the file is not a CSV file, but we can specify tab as a separator. Also, we need to skip a few of the rows:

Press + to interact

Introduction

Series Deep Dive

DataFrames

Manipulating Data

Wrapping Up

Appendix

Working with Time Series

Loading the Data

Adding timezone information