One thing to note when we say “time series” is that we’re not talking about the pandas Series object but rather data that has a date component. Often we’ll have that date component in the index of a pandas Series or DataFrame because that allows us to do time aggregations easily.

Loading the Data

For this section, we’re going to explore a dataset from the US Geologic Survey, which deals with the flow of a river in Utah called the Dirty Devil river. This data is a tab-delimited ASCII file and is described here in detail. The columns are:

  • agency_cd: Agency collecting data.
  • site_no: USGS identification number of site.
  • datetime: Date.
  • tz_cd: Timezone.
  • 144166_00060: Discharge (cubic feet per second).
  • 144166_00060_cd: Status of discharge. “A” (approved), “P” (provisional), “e” (estimate).
  • 144167_00065: Gauge height (feet).
  • 144167_00065_cd: Status of gauge_height. “A” (approved), “P” (provisional), “e” (estimate).

Here’s the code to load the data. We’ve also included a tweak function that converts the date information to actual dates and renames some columns. Note that the file is not a CSV file, but we can specify tab as a separator. Also, we need to skip a few of the rows:

Get hands-on with 1200+ tech skills courses.