Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

wer

What is Word Error Rate?

Shahpar Khan

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

svg viewer

Automated Speech Recognition (ASR)

As the name suggests, Automated Speech Recognition - ASR - is a sophisticated software used to interpret spoken words through an input device (mic) or audio file and then output them. ASR relieves users from tedious data entry by enabling them to dictate data to their computer device rather than typing it. Many industries use ASR as a daily driver. One of the biggest examples is Amazon’s Alexa.

svg viewer

Word Error Rate (WER)

There is a designated metric called Word Error Rate - WER - to check the efficiency of different ASR software. WER is a formula applied to the resulting transcript from an ASR software to measure its accuracy. The formula consists of 4 components:

Component Stands For
S Substitution: The amount of words that need to be substituted to match the original transcript.
D Deletion: The amount of words dropped from the original transcript.
I Insertion: The amount of extra words added compared to the original transcript.
N Number: The Total number of words in the correct transcript.

By combining the above components, we get the following formula to compute WER:

WER=S+D+IN WER = \frac{S + D + I}{N}

Let’s look at an example. Suppose the actual phrase Please turn around gets converted into Please burn a round by some ASR software. Here, can notice that:

  • The word turn got substituted by burn. Therefore, we have one substitution.
  • There are two new words inserted - a and round. Therefore, we have two insertions.
  • There is a word deleted - around. Therefore we have one deletion.
  • There are three words in total in the original transcript.

After putting all of this together, the computed WER for the conversion above turns out to be:

WER=S+D+IN=1+2+13=1.333WER = \frac{S + D + I}{N} = \frac{1 + 2 + 1}{3} = 1.333

Fun Fact: Humans have a WER of 0.4!

RELATED TAGS

wer

CONTRIBUTOR

Shahpar Khan
Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring