Exercise: Undersampling

Learn how to reduce the imbalance to a fixed ratio with NearMiss.

We'll cover the following

Let’s get more familiar with the NearMiss undersampling strategy by practice.

Task

Here, we deal with severely imbalanced training data for a binary classification problem. By default, NearMiss will balance out the training data so that we have (roughly) a 1:1 ratio of classes. We want you to apply a slightly different sampling strategy to end up with a 1:10 ratio.

  1. Configure NearMiss version 3 and choose parameters to meet a 1:10 imbalance ratio after undersampling the majority class.

  2. Apply the sampling strategy on X_train and y_train to create undersampled training data.

  3. Verify that the undersampled data meets the 1:10 imbalance ratio.

Coding workspace

The X_train and y_train training data is available in memory in the workspace. Let’s try to code the solution.

Get hands-on with 1200+ tech skills courses.