Search⌘ K
AI Features

The Dataset and Exploratory Data Analysis

Explore the Titanic dataset to understand its composition, missing data, and feature distributions. Learn to visualize survival patterns by gender, class, and port of embarkation through various plots. This lesson prepares you to preprocess data and apply logistic regression for classification tasks.

Let's explore one of the most famous and benchmark datasets of the Titanic disaster history. This dataset is considered a first step toward classification in machine learning.

Dataset

In the Titanic dataset, we have the following features. We want to predict if the passenger survived or not. Therefore, the target will be the Survived column.

Data dictionary

  • PassengerId: Passenger ID

  • Pclass: Ticket class, where 1 = 1st, 2 = 2nd, and 3 = 3rd

  • Name: Passenger name

  • Sex: Male/femaleAge: Age in years

  • SibSp: Number of siblings and/or spouses aboard the Titanic

  • Parch: Number of parents and/or children aboard the Titanic

  • Ticket: Ticket number

  • Fare: Passenger fare

  • Cabin: Cabin number

  • Embarked: Port of embarkation, where C = Cherbourg, Q = Queenstown, and S = Southampton

  • Survived: 0 = No, and 1 = Yes

The goal here is to predict if a passenger survived ...