Success Is in Fine Details

In this lesson, we’ll cover the best practices that can lead to high-quality outcomes.

We'll cover the following

How to get the highest quality results

As someone who consults with Fortune 500 companies regularly, I notice that quality outcomes depend on a few best practices:

  1. Hardware and audio capture technique matters. Beyond what the API can do, there are a lot of things that can be done to improve audio capture. Businesses should consult with an audio engineer.
  2. Capture audio with a sampling rate of 16,000 Hz or higher.
  3. To help determine the best configuration, test audio that represents the real world.
  4. Invest time and money into configuration testing. Skipping this step can result in even more money and time wasted on poor transcription.
  5. Test at least 1 hour of audio. 3 hours is better, 6 hours is great, and more than that is a case of diminishing returns.
  6. Pay for professional human transcriptions for WER calculation purposes. Unless you work for a company full of trained transcriptionists, do not roll your own human transcriptions. If professionals have a 5% WER, imagine the errors introduced by everyday workers at your company.
  7. The API models are trained with raw source audio. There is no need to up sample (convert 8000Hz file to 16000Hz, for example).
  8. There is no payoff to the conversion of original audio from one encoding to another (MP3 to FLAC, for example).
  9. There is no need to pre-process the audio to reduce noise or background music, as the models are trained for these situations.
  10. If identifying separate speakers is critical, capture each audio on a different channel.

Get hands-on with 1200+ tech skills courses.