The BLEU Score: Evaluating Machine Translation Systems

Learn how the BLEU score is used to evaluate machine translation systems.

We'll cover the following...

Modified precision
Brevity penalty
The final BLEU score

BLEU stands for “bilingual evaluation un.derstudy” and is a way of automatically evaluating machine translation systems. This metric was first introduced in the paper BLEU: A Method for Automatic Evaluation of Machine Translation Papineni, and others, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002 311-318 . We'll be using an implementation of the BLEU score found on GitHub . Let’s learn how this is calculated in the context of MT.

Let’s consider an example to learn the calculations of the BLEU score. Say we have two candidate sentences (that is, sentences predicted by our MT system) and a reference sentence (that is, the corresponding actual translation) for some given source sentence:

Reference 1: The cat sat on the mat.
Candidate 1: The cat is on the mat.

To see how good the translation is, we can use the precision measure. Precision is a measure of how many words in the candidate are actually present in the reference. In general, if we consider a classification problem with two classes (denoted by negative and positive), precision is given by the following formula:

1.Introduction to Natural Language Processing

2.Understanding TensorFlow 2

3.Word2vec: Learning Word Embeddings

4. Advanced Word Vector Algorithms

5.Sentence Classification with Convolutional Neural Networks

6.Recurrent Neural Networks

7.Understanding Long Short-Term Memory Networks

8.Applications of LSTM: Generating Text

9.Sequence-to-Sequence Learning: Neural Machine Translation

10.Transformers

Project

11.Image Captioning with Transformers

12.Final Remarks

13.Appendix: Mathematical Foundations and Advanced TensorFlow

Mock Interview

The BLEU Score: Evaluating Machine Translation Systems

Modified precision