Gauge the Impact of Imbalanced and Mislabeled Datasets
This project assesses the knowledge acquired in this course. The selected dataset for this project is the MNIST handwritten digit dataset, which is renowned for its simplicity and is commonly used in ML. This dataset comprises 70,000 images, each containing handwritten digits ranging from 0 to 9. These images are categorized into two sets: 60,000 for training and 10,000 for testing. Each image is composed of 28 × 28 grayscale pixels.
This project contains two comprehensive sections that cover the entire course. The first section gauges the impact of mislabeled datasets on the performance of ML models. The second section evaluates the effect of imbalanced datasets on the performance of ML models and explores strategies for effectively addressing imbalanced data. Together, these sections will help us understand data-related challenges and essential solutions for handling these challenges in ML.