Solution: Undersampling

Learn how to solve the exercise posed in the previous lesson.

We'll cover the following

Let’s get more familiar with the NearMiss undersampling strategy by practice.

Task

Here, we deal with severely imbalanced training data for a binary classification problem. By default, NearMiss will balance out the training data so that we have (roughly) a 1:1 ratio of classes. We want you to apply a slightly different sampling strategy to end up with a 1:10 ratio.

  1. Configure NearMiss version 3 and choose parameters to meet a 1:10 imbalance ratio after undersampling the majority class.

  2. Apply the sampling strategy on X_train and y_train to create undersampled training data.

  3. Verify that the undersampled data meets the 1:10 imbalance ratio.

Coding workspace

The following workspace has the code solution for the task mentioned above:

Get hands-on with 1200+ tech skills courses.