Data Science with R: Decision Trees and Random Forests/

...

Testing the Titanic Dataset

Apply what you've learned about the random forest algorithm to the Titanic test dataset.

We'll cover the following...

Profiling the Titanic test dataset
Training a model
Preparing the test dataset
Titanic test dataset predictions

Press + to interact

#================================================================================================
# Load libraries - supress messages
#
suppressMessages(library(tidyverse))
library(skimr)
#================================================================================================
# Load the Titanic test data
#
titanic_test <- read_csv("titanic_test.csv", show_col_types = FALSE)
#================================================================================================
# Use the skimr package to get a first pass of the data
#
skim(titanic_test)

When profiling the test dataset for potential preparation issues, focus on the features that are used in the predictive model and look for the following:

Are there any missing features?
Are there any missing feature values?
Do the levels match the training data for features transformed to be factors?

Missing features are typically the result of an error in the code that creates the test dataset—for example, forgetting to select a particular feature in dplyr code. However, this doesn’t apply to the Titanic test dataset. Here’s why:

The Titanic datasets are used as part of a Kaggle website competition. ...

Welcome to the Course

Supervised Learning

Classification Tree Math

Using Classification Trees in R

Introducing the Bias-Variance Tradeoff

Model Tuning

Model Tuning with tidymodels

Feature Engineering

Regression Trees

The Random Forest Algorithm

Using Random Forests

Gradient Boosting Trees

Continuing Your Journey

Credit Card Fraud Detection using the R Language

Testing the Titanic Dataset

Profiling the Titanic test dataset