Named Entity Recognition with RNNs: Training and Evaluation
Explore how to train and evaluate recurrent neural networks for named entity recognition tasks. Understand challenges of class imbalance and learn to apply macro-averaged accuracy to get fair metric evaluation. Gain skills to prepare sample weights that balance frequent and rare classes and use them to improve model training and validation.
Evaluation metrics and the loss function
During our previous discussion, we alluded to the fact that NER tasks carry a high class imbalance. It’s quite normal for text to have more nonentity-related tokens than entity-related tokens. This leads to large amounts of other (0) type labels and fewer of the remaining types. We need to take this into consideration when training the model and evaluating the model. We’ll address the class imbalance in two ways:
We’ll create a new evaluation metric that is resilient to class imbalance.
We’ll use sample weights to penalize more frequent classes and boost the importance of rare classes.
In this lesson, we’ll only address the former. The latter will be addressed in the next lesson. We’ll define a modified version of the accuracy. This is called a macro-averaged accuracy. In macro averaging, we compute accuracies for each class separately and then average it. Therefore, the class imbalance is ignored when computing the accuracy. When computing standard metrics like accuracy, precision, or recall, there are different types of averaging available.
Different types of metric averaging
There are different types of averaging available for metrics. We can read one such example of this averaging available in scikit-learn. Consider a simple binary classification example with the following confusion matrix results:
Micro: Computes a global metric, ignoring the differences in class distribution, e.g.,
...