In this chapter, we learned about the following concepts.

The data and exploratory data analysis

  • The Titanic dataset is considered a first step towards classification in machine learning. The goal here is to predict if a passenger survived the sinking of the Titanic or not.

  • EDA of the data reveals that:

    • The Cabin column is missing 77.1%, the Age column is missing 19.9%, and the Embarked column is missing 0.2% of its data.

    • Among the deceased, most were male.

    • The rate of survival was higher for the class-1 passengers.

    • The S port was the busiest port for each class. We can expect more people to survive. However, the rate of survival was higher for port C.

Data preprocessing and preparation

Moving toward the model training and evaluation phase involves preprocessing, such as removing missing values, handling categorical features by creating dummies, dropping the extraneous features, and creating and scaling the train features and target.

Get hands-on with 1200+ tech skills courses.