What is the BLEU evaluation metric?

Evaluation metrics are quantitative measures to evaluate the performance of machine learning models. They are essential to know how good or bad our model is performing for specific tasks.

What is BLEU?

BLEU (Bilingual Evaluation Understudy) is an evaluation metric commonly used in NLP to evaluate the quality of the predicted text. The BLEU metric compares the generated text to one or more references and assigns a score based on word overlap between the two texts. The more words in common, the higher the BLEU score.

Here, we'll be calculating the BLEU score in terms of machine generated text summarization, referred as candidate summary.

How to calculate the BLEU score

The following are the steps to calculate the BLEU score:

Calculate the precision for the n-gram.
Compute the geometric mean of the precision score.
Apply the Brevity Penalty (BP).
Calculate the BLEU score.

Calculate the precision for n-gram

We calculate the precision for each n-gram to measure how well the candidate summary matches the reference summary. Common values for $N$ include 1 (unigrams), 2 (bigrams), 3 (trigrams), and sometimes 4.

The formula to calculate precision for a n-gram is

Code explanation

Line 1: We import the nltk library, which is used widely in the field of NLP.
Line 3: We define a reference_summary variable and set its value to “Machine learning is a subset of artificial intelligence”.
Line 4: We defined a candidate_summary variable and set its value to “Machine learning is seen as a subset of artificial intelligence."
Line 6: We calculate the BLEU score using the sentence_bleu() function from the nltk.translate.bleu_score.
Line 7: We print the BLEU score for the provided candidate summary.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

What is the BLEU evaluation metric?

What is BLEU?

How to calculate the BLEU score

Calculate the precision for n-gram

Compute the geometric mean of the precision scores

Apply the Brevity Penalty

Calculate the BLEU score

Code

Code explanation