Search⌘ K

Missing Values

Explore techniques to manage missing data in datasets using Pandas, including detecting missing values, filtering out incomplete rows, and filling gaps with methods like mean, median, or interpolation. Understand how to handle both numerical and non-numerical missing data to prepare your data for accurate analysis.

Missing values

During data collection and entry, it is possible that some values are missed, or data was not available for some entries. Hence, missing data is very common among data science applications.

Pandas makes it very easy to work with missing data. It does not include missing values in all of its different calculations such as sum, mean, etc. by default.

Pandas writes the value NaN(Not a Number) when it finds a missing value.

Detecting missing values

We can detect missing values using the function isnull. It returns True wherever there is a missing value, and False, otherwise.

Python 3.5
import pandas as pd
df = pd.read_csv('housing.csv')
# check which columns have how many missing values
print("Missing values in every column : \n" ,df.isnull().sum())
print("\n\n Missing values total : ",df.isnull().sum().sum())
# Display rows that have missing values
missing = df['total_bedrooms'].isnull()
print(df[missing])

In line 5, we use the function isnull and then use sum on it. This gives us a list of all columns with the number of missing values in them. From the list, we see that total_bedrooms has ...