Summary
Go over a summary of what we have learned in this chapter.
In this chapter, we learned about the following concepts.
The data and exploratory data analysis
The Titanic dataset is considered a first step towards classification in machine learning. The goal here is to predict if a passenger survived the sinking of the Titanic or not.
EDA of the data reveals that:
The
Cabin
column is missing 77.1%, theAge
column is missing 19.9%, and theEmbarked
column is missing 0.2% of its data.Among the deceased, most were male.
The rate of survival was higher for the
class-1
passengers.The
S
port was the busiest port for each class. We can expect more people to survive. However, the rate of survival was higher for portC
.
Data preprocessing and preparation
Moving toward the model training and evaluation phase involves preprocessing, such as removing missing values, handling categorical features by creating dummies, dropping the extraneous features, and creating and scaling the train features and target.
Get hands-on with 1200+ tech skills courses.