How does METEOR evaluation metric calculate the similarity score?
Evaluation metrics are quantitative measures of machine learning models' performance. They are essential to determining whether our model is performing well or poorly for specific tasks.
What is METEOR?
METEOR (Metric for Evaluation of Translation with Explicit Ordering) is a metric used to measure the quality of candidate text based on the
Calculating the METEOR score
Following are the steps to calculate the METEOR score:
Calculate the unigram precision and recall.
Compute the F-score.
Compute chunk penalty.
Calculate the METEOR score.
Calculate the unigram precision and recall
We calculate the
The unigram recall is calculated as the ratio between the overlapping unigrams between the candidate and reference summary and the total number of unigrams in the reference summary.
Compute the F-score
After calculating the unigram precision and recall, we compute the weighted F-score by taking their harmonic mean, with precision being weighted higher than recall.
where,
P: Unigram precision
R: Unigram recall
: It is the relative weight for precision and recall.
Note: The precision is weighted higher than the recall so that the candidate summary is more precised in the meaning then the word-to-word matches.
Compute chunk penalty
A chunk is a set of consecutive words appearing in the sentence. The precision, recall, and
Where,
What would be the chunk size in case of candidate summary is exactly similar to the reference summary?
Calculate the METEOR score
After computing the F-score and chunk penalty, we are now ready to calculate the METEOR score.
METEOR scores are given on a scale of 0 to 1, with higher values indicating greater similarity between the candidate and the reference summary.
Code example
Now, let’s see how to calculate the METEOR score using Python.
import nltk
nltk.download('wordnet')
reference_summary = [['Machine', 'learning', 'is', 'a', 'subset', 'of', 'artificial', 'intelligence']]
candidate_summary = ['Machine', 'learning', 'is', 'seen', 'as', 'a', 'subset', 'of', 'artificial', 'intelligence']
METEORscore = nltk.translate.meteor_score.meteor_score(reference_summary, candidate_summary)
print(METEORscore)Code explanation
Let’s get the insight of the above code.
Line 1: We import the
nltklibrary, which is used widely in the field of NLP.Line 2: We download the
wordnetcorpus reader from thenltklibrary.Line 4: We define a list named
reference_summaryand set “Machine learning is a subset of artificial intelligence” as a reference summary.Line 5: We define a
candidate_summaryvariable and set its value to “Machine learning is seen as a subset of artificial intelligence."Line 7: We use the
meteor_score()function from thenltk.translate.meteor_scoreto calculate the METEOR score.Line 8: We print the METEOR score for the provided candidate summary.
Free Resources