Search⌘ K
AI Features

Debugging Chains

Explore methods to debug chained operations in pandas effectively. Learn how to use commenting, the pipe function, and Jupyter's pdb debugger to inspect and troubleshoot intermediate DataFrame states safely and efficiently.

In this section, we’ll explore debugging chains of operations on DataFrames or Series. Almost universally, pandas code is a bit messy. We get it. The chaining produces less code. The pandas library is an in-memory library that works by copying data, this argument is a moot point. Let’s address the debugging complaint.

We’re going to see a “tweak” function that analyzes the fuel economy data.

Here is our tweak function:

Python 3.8
# Import pandas library
import pandas as pd
# Read vehicles.csv file from GitHub and store it in a DataFrame named autos
autos = pd.read_csv('https://github.com/mattharrison/datasets/raw/'
'master/data/vehicles.csv.zip')
# Define a function to convert a datetime column to a specified timezone
def to_tz(df_, time_col, tz_offset, tz_name):
return (df_
.groupby(tz_offset)
[time_col]
.transform(lambda s: pd.to_datetime(s)
.dt.tz_localize(s.name, ambiguous=True)
.dt.tz_convert(tz_name))
)
# Define a function to tweak the autos DataFrame
def tweak_autos(autos):
# Define a list of columns to keep
cols = ['city08', 'comb08', 'highway08', 'cylinders',
'displ', 'drive', 'eng_dscr', 'fuelCost08',
'make', 'model', 'trany', 'range', 'createdOn',
'year']
# Return a modified DataFrame with the specified columns and modifications
return (autos
[cols]
.assign(cylinders=autos.cylinders.fillna(0).astype('int8'),
displ=autos.displ.fillna(0).astype('float16'),
drive=autos.drive.fillna('Other').astype('category'),
automatic=autos.trany.str.contains('Auto'),
speeds=autos.trany.str.extract(r'(\d)+').fillna('20')
.astype('int8'),
offset=autos.createdOn
.str.extract(r'\d\d:\d\d ([A-Z]{3}?)')
.replace('EDT', 'EST5EDT'),
str_date=(autos.createdOn.str.slice(4,19) + ' ' +
autos.createdOn.str.slice(-4)),
createdOn=lambda df_: to_tz(df_, 'str_date',
'offset', 'America/New_York'),
ffs=autos.eng_dscr.str.contains('FFS')
)
.astype({'highway08': 'int8', 'city08': 'int16',
'comb08': 'int16', 'fuelCost08': 'int16',
'range': 'int16', 'year': 'int16',
'make': 'category'})
.drop(columns=['trany', 'eng_dscr'])
)
# Print the tweaked autos DataFrame
print(tweak_autos(autos))

Say we come across this tweak_autos function, and we want to understand what it does. First of all, realize that it’s written like a recipe, step by step:

  • Pull out columns found in columns.
  • Create various columns (assign).
  • Convert column types (astype).
  • Drop extra columns that are no longer needed after we’ve created new columns from them (drop).

Those who don’t support chaining say there’s no way to debug this. We have a few ways to debug the chain. The first is by using comments. We comment out all of the operations and then go through them one at a time. This comes in really handy to visually see what’s ...