# Missing Data Representation

Learn the different ways in which missing data is represented in pandas.

## Introduction

Dealing with missing data is an essential aspect of data analysis. The data we receive is often incomplete, with missing values that need to be managed. Given that missing data can significantly affect the outcomes of our analysis or models, it’s important that we know how to work with missing values so that their negative impact is minimized.

Over the next few lessons, we’ll discover how to leverage the robust methods in `pandas`

to represent, detect, analyze, and manage missing data.

## Representation of missing data

Let's start by exploring how missing data is represented and displayed in `pandas`

.

### General representations

The two common missing data representations in `pandas`

are `NaN`

(an acronym for not a number) and `None`

. Although `NaN`

is considered the default missing value indicator for reasons of computational speed and convenience, it’s important to understand both representations because they have some key differences in their underlying data types.

Here are some details about each missing data representation:

`NaN`

:A special floating-point value from

`NumPy`

that specifically represents missing numerical data.The default missing value marker in

`pandas`

for real or floating-point values. It is based on the IEEE 754 floating-point standard.It’s of the floating-point type (rather than a Python object like

`None`

).`NaN`

is contagious in computations, which means that almost any operation involving`NaN`

will also result in`NaN`

. For example, if we perform an arithmetic operation with`NaN`

and another number, the result is always`NaN`

. This phenomenon is also known as the propagation of`NaN`

in mathematical operations, which will be discussed in the next lesson.The following code shows two ways we can generate

`NaN`

values:

Get hands-on with 1200+ tech skills courses.