Introduction to the Letter Classification Data Set

This lessons focuses on exploration and preprocessing of the letter classification dataset.

The letter classification dataset

The dataset consists of pixel values for generating A, B, and C along with their labels. This is a multiclass classification problem because we have to predict the probability of the letter A, B, or C, given the pixel configuration.

📝 Note: A multiclass classification problem requires the labels to be one-hot encoded.

📝 One-hot encoding is used to quantify categorical data, i.e., data having multiple categories. It generates a vector with the length equal to the number of categories in the data set. If a data point belongs to the ithi^{th} category, then the indices of this vector are assigned the value 0 except for the ithi^{th} index, which is assigned a value of 1. This helps track the categories in a numerically meaningful way.

Dataset exploration

Explore the dataset.

Get hands-on with 1200+ tech skills courses.