Search⌘ K
AI Features

Handling Missing Data

Explore techniques for handling missing data in pandas, including detecting missing values, filling them with appropriate replacements, and using interpolation for ordered data. This lesson helps you ensure data quality and improve analysis outcomes by managing incomplete data effectively.

Missing data

Filling in missing data is another common operation, and this is important because many machine learning algorithms do not work if there is missing data. Also, it’s prudent to be aware of how much data is missing to ensure we get the full story from our data.

The “cylinders” column in our dataset has missing values. Remember our trick to calculate the count of items that have some property? We can use it here to determine the count of missing entries. We convert the property to booleans (using .isna), then call .sum on it:

Python 3.8
cyl = df.cylinders
print(cyl.isna().sum())

It’s hard to determine why these values are missing from the “cylinders” Series alone. Typically we’ll have more context, and a DataFrame gives that to us. ...