In this shot, we’ll learn about the missing data values in the datasets. The real-world data is never clean and contains many missing values. This usually happens due to many reasons.
Some of the common reasons could be:
Hence, there is a need to handle the missing values while building machine learning models or performing data analytics.
We’ll discuss two approaches to handle the missing values:
In this approach, we delete the entire record or row with missing values. Let’s understand this with an example:
import pandas as pd dataset = pd.read_csv('data.csv') print('Data with missing values') print(dataset) print('\nData without missing values') dataset.dropna(inplace = True) print(dataset)
Line 1: we import the required package.
Line 3: we read the dataset using the
Line 6: We print the dataset. It can be observed in the output that few records contain
NaN values, that is, missing values.
Line 9: We use the
dropna() method on the dataset. This method deletes the records which have missing values. We also pass the parameter
inplace to be
True so that the records are deleted and the new dataset is updated in the same variable.
Line 10: We print the dataset. Now, no records have missing values.
Let’s now explore the second way to handle the missing values:
In this approach, we replace the missing values with a default value. In this case, no records are deleted. Let’s understand this with an example:
import pandas as pd dataset = pd.read_csv("data.csv") print("Data with missing values") print(dataset) print("\nData with default values") dataset.Salary.fillna("45000", inplace = True) dataset.Company.fillna("Google", inplace = True) dataset.Country.fillna("Australia", inplace = True) print(dataset)
The code is almost the same as above, with just one difference.
fillna()method to fill the missing values with a default value.
Here, we have filled a value of
Australia for the columns
Also, a parameter passed as
inplace to be
True so that the operation is performed in the same dataset and the result is stored back in the same variable.
View all Courses