The Dataset and Exploratory Data Analysis
Explore the Titanic dataset to understand its composition, missing data, and feature distributions. Learn to visualize survival patterns by gender, class, and port of embarkation through various plots. This lesson prepares you to preprocess data and apply logistic regression for classification tasks.
We'll cover the following...
Let's explore one of the most famous and benchmark datasets of the Titanic disaster history. This dataset is considered a first step toward classification in machine learning.
Dataset
In the Titanic dataset, we have the following features. We want to predict if the passenger survived or not. Therefore, the target will be the Survived column.
Data dictionary
PassengerId: Passenger IDPclass: Ticket class, where1= 1st,2= 2nd, and3= 3rdName: Passenger nameSex: Male/femaleAge: Age in yearsSibSp: Number of siblings and/or spouses aboard the TitanicParch: Number of parents and/or children aboard the TitanicTicket: Ticket numberFare: Passenger fareCabin: Cabin numberEmbarked: Port of embarkation, whereC= Cherbourg,Q= Queenstown, andS= SouthamptonSurvived:0= No, and1= Yes
The goal here is to predict if a passenger survived ...