Data Preprocessing: Missing Values

Learn data preprocessing and how you can fix missing values.

Data preparation and cleaning

Our data have different types. There are numerical data, such as “Age,” “SibSp,” “Parch,” and “Fare.” Then there are categorical data. Some of the categories are represented by numbers (“Survived,” “P-class”). Some are represented by text (“Sex” and “Embarked”). And finally, there is textual data (“Name,” “Ticket,” and “Cabin”).

This is quite a mess for data that we want to feed into a computer. Furthermore, when looking at train.info(), we can see that the counts vary for different columns. While we have 891 values for most columns, we only have 714 for “Age,” 204 for “Cabin,” and 889 for “Embarked”.

Before we can feed our data into any machine learning algorithm, we need to clean up. The following methods are used to preprocess the data:

  1. Missing values

  2. Identifiers

  3. Handling text and categorical attributes

  4. Feature scaling

  5. Training and testing

Get hands-on with 1200+ tech skills courses.