How does METEOR evaluation metric calculate the similarity score?

Evaluation metrics are quantitative measures of machine learning models' performance. They are essential to determining whether our model is performing well or poorly for specific tasks.

What is METEOR?

METEOR (Metric for Evaluation of Translation with Explicit Ordering) is a metric used to measure the quality of candidate text based on the alignmentAlignment is a mapping between the unigrams of condidate and reference summaries. between the candidate text and the reference text.

Calculating the METEOR score

Following are the steps to calculate the METEOR score:

Calculate the unigram precision and recall.
Compute the F-score.
Compute chunk penalty.
Calculate the METEOR score.

Calculate the unigram precision and recall

We calculate the unigramUnigram refers to a single word or token in a sequence of text. precision as the ratio between the overlapping unigrams between the candidate and reference summary and the total number of unigrams in the candidate summary.

where,

P: Unigram precision
R: Unigram recall
$\alpha$ : It is the relative weight for precision and recall.

Note: The precision is weighted higher than the recall so that the candidate summary is more precised in the meaning then the word-to-word matches.

Compute chunk penalty

A chunk is a set of consecutive words appearing in the sentence. The precision, recall, and $F_{\text{mean}}$ are computed on the base of single word matches. To take into account longer matches and their word order, METEOR computes a chunk penalty. The chunk penalty is calculated as follows:

Code explanation

Let’s get the insight of the above code.

Line 1: We import the nltk library, which is used widely in the field of NLP.
Line 2: We download the wordnet corpus reader from the nltk library.
Line 4: We define a list named reference_summary and set “Machine learning is a subset of artificial intelligence” as a reference summary.
Line 5: We define a candidate_summary variable and set its value to “Machine learning is seen as a subset of artificial intelligence."
Line 7: We use the meteor_score() function from the nltk.translate.meteor_score to calculate the METEOR score.
Line 8: We print the METEOR score for the provided candidate summary.

How does METEOR evaluation metric calculate the similarity score?

What is METEOR?

Calculating the METEOR score

Calculate the unigram precision and recall

Compute the F-score

Compute chunk penalty

Calculate the METEOR score

Code example

Code explanation