Unbiased Mislabeling in Image Classification Using CNNs
Explore how an unbiased mislabeled dataset affects the performance of a CNN model.
In this lesson, we’ll learn about the impact of a small amount of unbiased mislabeling in a dataset. We aim to understand the consequences of poor-quality data by using a CNN model with two versions of the dataset—one with a clean dataset and the other with a mislabeled dataset. We’ll then compare the performance using the accuracy metric in order to gauge the impact of mislabeling.
Implementing unbiased mislabeling
To assess the impact of the dataset on the performance of a CNN model, we’ll take several steps to compare the results between a clean and mislabeled dataset.
Step 1: Importing libraries
The following code imports the libraries necessary to implement unbiased mislabeling:
Step 2: Loading and creating an unbiased mislabeled dataset
The code given below loads the MNIST digit dataset using the Keras library. We assume that the dataset is clean, which means that the labels given to each image in this dataset are correct. Then, we create a new dataset where we mislabel 10% of the images from each class in the dataset. This will help us to understand the impact of just a small amount of unbiased mislabeling on the model's performance.
...