...

Data Preprocessing

Perform data cleaning and create dummies.

We'll cover the following...

Data cleaning
Dealing with categorical features

So, we know from EDA that some data is missing in our dataset. Let's deal with that first.

Data cleaning

The Age column is missing ~19.9% of its data. A convenient way to fix the Age column is by filling the missing data with the mean or average value of all passengers in that column. We can do even better in this case because we know that there are three passenger classes. It's better to use the average age for each missing passenger for its class. Let's use a boxplot() to visually explore if there is any relationship between class and passenger age.

Press + to interact

Course Introduction

Linear Regression

Regularization

Bias-Variance Trade-off

Categorical Features

Logistic Regression

Logistic Regression: Titanic Data

Sentiment Analysis Using Multinomial Logistic Regression

Multiclass Classification and Handling Imbalanced Classes

Project: Predicting Chronic Kidney Disease

K-Nearest Neighbors

Implementation of K-Nearest Neighbors

Logistic Regression vs. KNN

Decision Tree Learning

Implement the Decision Tree Classifier from Scratch

Bootstrapping and Confidence Interval

Support Vector Machine

Practice and Comparisons

What's Next?

Appendix

Data Preprocessing

Data cleaning