Implementing Speech to Text Translation

Learn how to perform the speech-to-text conversion using the Azure Speech SDK for Python.

Introduction

In this lesson, we’re going to explore the speech to text conversion using the Azure speech service. The Speech-to-Text—also referred to as STT—helps to generate real-time text transcriptions from audio data. We can provide the audio in a file format, from real-time streaming data, or directly from a microphone.

The model that is being used behind the scenes to convert the audio into text is the one Microsoft themselves is using in its Office products and Cortana. The model is capable of performing the speech to text translation in more than 100 languages. You can refer to the list from Microsoft’s Language Support Documentation.

Dependencies

To work with this chapter and run the code snippets on your local machine, you need to install the following package:

  • azure-cognitiveservices-speech

To learn how to install the packages, please visit the Appendix section.

Implementation

In this lesson, we’re going to perform the speech to text conversion in two ways:

  • First, we’ll read an audio file and create an object of the AudioConfig class. This object will encode the audio file and then send it to the speech service for getting the corresponding text.

  • Second, we’ll read the audio from a microphone and then generate the corresponding text.

Get hands-on with 1200+ tech skills courses.