Introduction

Dealing with missing data is an essential aspect of data analysis. The data we receive is often incomplete, with missing values that need to be managed. Given that missing data can significantly affect the outcomes of our analysis or models, it’s important that we know how to work with missing values so that their negative impact is minimized.

Over the next few lessons, we’ll discover how to leverage the robust methods in pandas to represent, detect, analyze, and manage missing data.

Representation of missing data

Let's start by exploring how missing data is represented and displayed in pandas.

General representations

The two common missing data representations in pandas are NaN (an acronym for not a number) and None. Although NaN is considered the default missing value indicator for reasons of computational speed and convenience, it’s important to understand both representations because they have some key differences in their underlying data types.

Here are some details about each missing data representation:

NaN:
- A special floating-point value from NumPy that specifically represents missing numerical data.
- The default missing value marker in pandas for real or floating-point values. It is based on the IEEE 754 floating-point standard.
- It’s of the floating-point type (rather than a Python object like None).
- NaN is contagious in computations, which means that almost any operation involving NaN will also result in NaN. For example, if we perform an arithmetic operation with NaN and another number, the result is always NaN. This phenomenon is also known as the propagation of NaN in mathematical operations, which will be discussed in the next lesson.
- The following code shows two ways we can generate NaN values:

Press + to interact

Before We Begin

Reading Data into pandas

Combining Data

Reshaping and Manipulating Data

Encoding Data Types

Handling Numerical Data

Handling Categorical Data

Handling Text Data

Handling Time Series Data

Handling Sparse Data Structures

Handling Missing Data

Data Analysis and Visualization with sidetable and Bokeh

Leveraging Further Features of pandas

Utilizing Extended Libraries

Wrap Up

Appendix

Time Series Analysis and Visualization Using Python and Plotly

Missing Data Representation

Introduction

Representation of missing data

General representations