# Summary

Go over a summary of what we have learned in this chapter.

In this chapter, we learned about the following concepts.

## Multiclass logistic regression classification

Logistic regression is one of the most popular and widely used classification algorithms, and by default, it is limited to a binary class classification problem. However, logistic regression can be used for multiclass classification using extensions like OVR and multinomial.

In OVR, the problem is first transformed into multiple binary classification problems, and under the hood, separate binary classifiers are trained for all classes.

In multinomial, the solvers learn an accurate multinomial logistic regression model. In this case, the probability estimates should be better calibrated than OVR. The cross-entropy error/loss function supports multiclass classification problems, such as maximum likelihood estimation.

Multiclass logistic regression is known as polytomous logistic regression, multinomial logistic regression, softmax regression, multinomial logit, maximum entropy classifier, and the conditional maximum entropy model.

The multinomial logistic regression assumes that the odds of preferring a particular class over others do not depend on the presence or absence of other irrelevant alternatives. So, the model choices are independent of irrelevant options, which is a core hypothesis in relational choice theory.

## Imbalanced datasets and techniques to handle them

Class imbalance is a common problem in classification datasets, where the number of data points or observations is not the same across all the classes in the target column. The smaller differences are not trouble. However, there are cases when the dataset has an extreme class imbalance that would cause a problem.

Class imbalance in the dataset can cause frustration and needs to be treated. The following are options to handle this issue.

We can collect more data. Sometimes, or even most of the time, it’s not very easy. Still, it’s one of the best solutions in the long run.

We can generate synthetic data. This is relatively easy and more cost-effective than collecting more data; however, it’s a little tricky. One of the most common techniques is SMOTE, which creates synthetic data from the minor class instead of simply copying the instances.

We can think about creating copies of the minority class with oversampling or delete the instances of the majority class (undersampling). It’s important to remember that undersampling is losing information. Consider this option carefully when we have thousands and thousands of class instances. Both strategies need to be compared with different ratios of class representation in the data. The ideal is 1:1 for binary class classification problems.

Along with the above techniques, we can think about the following options:

Testing different algorithms.

Decomposing the majority class into smaller datasets with random subsampling and training several subsets using ensemble methods.

Get hands-on with 1200+ tech skills courses.