The main objective of this lesson is to simulate unbiased mislabeling noise in a dataset and to visualize its impact. The lesson is structured into the following three steps:

  • Step 1: We’ll examine the MNIST digit dataset and analyze its characteristics in order to understand the dataset thoroughly before introducing mislabeling.

  • Step 2: We’ll simulate unbiased mislabeling in the MNIST dataset. By intentionally introducing mislabeled data points, we’ll simulate the effects of label noise on the dataset.

  • Step 3: We we’ll focus on creating visualizations that depict the impact of mislabeling on each digit within the MNIST dataset. These visualizations will help us observe the effect of unbiased mislabeling on the MNIST dataset.

Step 1: Visualizing the MNIST digit dataset

We chose the MNIST digit dataset, which contains 60,000 training images and 10,000 test images of handwritten digits, to observe the impact of unbiased mislabeling on image classification performance. The provided code visually represents the MNIST digit dataset using a bar chart. Each bar in the chart represents a digit instance, and the number of instances for each digit is displayed on top of the respective bar. Additionally, the digit labels are printed below the bar line. This visualization helps us understand the distribution and characteristics of the MNIST digit dataset.

Click the “Run” button to visualize the number of training examples for each digit in the MNIST dataset.

Get hands-on with 1200+ tech skills courses.