Natural Language Processing with TensorFlow/

...

Evaluating the Results Quantitatively

Learn about the evaluation metrics for image caption generation.

We'll cover the following...

BLEU
ROUGE
METEOR
CIDEr

There are many different techniques for evaluating the quality and the relevancy of the captions generated. We’ll briefly discuss several such metrics we can use to evaluate the captions. We’ll discuss four metrics: BLEU, ROGUE, METEOR, and CIDEr.

All these measures share a key objective: to measure the text’s adequacy (the meaning of the generated text) and fluency (the grammatical correctness of text). To calculate all these measures, we’ll use a candidate sentence and a reference sentence, where a candidate sentence is the sentence or phrase predicted by our algorithm, and the reference sentence is the true sentence or phrase we want to compare with.

BLEU

BLEU was proposed by Papineni and others in BLEU: A Method for Automatic Evaluation of Machine TranslationProceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July (2002): 311-318. It measures the n-gram similarity between reference and candidate phrases in a position-independent manner. This means that a given n-gram from the candidate is present anywhere in the reference sentence and is considered to be a match. BLEU calculates the n-gram similarity in terms of precision. BLEU comes in several variations (BLEU-1, BLEU-2, BLEU-3, and so on), denoting the value of $n$ in the n-gram.

Here, $Count(n-gram)$ is the number of total occurrences of a given n-gram in the candidate sentence. $Count_{clip}(n-gram)$ is a measure that calculates $Count(n-gram)$ for a given n-gram and clips that value by a maximum value. The maximum value for an n-gram is calculated as the number of occurrences of that n-gram in the reference sentence. For example, consider these two sentences:

Candidate: the the the the the the the
Reference: the cat sat on the mat
- $Count(“the”) = 7$
- $Count_{clip} (“the”)=2$ ...

Introduction to Natural Language Processing

Understanding TensorFlow 2

Word2vec: Learning Word Embeddings

Advanced Word Vector Algorithms

Sentence Classification with Convolutional Neural Networks

Recurrent Neural Networks

Understanding Long Short-Term Memory Networks

Applications of LSTM: Generating Text

Sequence-to-Sequence Learning: Neural Machine Translation

Transformers

Sarcasm Classification Using BERT

Image Captioning with Transformers

Caption Generation Using PyTorch

Final Remarks

Appendix: Mathematical Foundations and Advanced TensorFlow

Evaluating the Results Quantitatively

BLEU