Machine Learning and Imbalanced Data
Explore methods to handle imbalanced datasets in multiclass classification using logistic regression. Learn to identify accuracy pitfalls, apply oversampling, and use SMOTE to create synthetic samples, improving model recall and generalization on minority classes.
We'll cover the following...
We'll cover the following...
Since we have the features and the targets from our previous lesson, let's split them into train and test datasets.
Imbalance data
Let's also check the class imbalance for our training data.
With that, let's train a logistic regression model.
The numbers look impressive with an accuracy of ~98%. The minority class is only 1.5%, making the baseline ...