Search⌘ K

Handling Missing Data

Explore methods for handling missing data in real-world datasets, including identifying types of missing data, visualizing gaps with heatmaps and bar plots, and applying imputation techniques using Python. Understand how to validate imputation effects to preserve data relationships and improve analysis accuracy.

Missing data is present in many real-world datasets and is often handled by removing these data points or imputing them. Imputing is defined as replacing the data with estimated values. In this lesson, we'll learn how data storytellers handle missing data.

Why analyze missing data?

In some cases, missing data can be helpful to understand potential trends/insights that are not part of our dataset. Missing data can be caused due to several different factors, such as:

  • Erroneous reporting. For example, consider a digital surveillance camera that is damaged due to weather conditions and is consistently producing blurry footage, or a damaged temperature sensor on a manufacturing floor that is reporting incorrect measurements.

  • Participants who don't wish to provide certain data for survey/

Depending on the programming framework and libraries we are using, examples of types of formats of missing data include Nan, N/A, NA, 0 values, and more.

There are also types of missing data including:

  • Structurally missing data: The missing data is data that does not exist in the first place.

  • Missing completely at ...