How to convert speech into text in machine learning

A process that converts speech into text is called automatic speech recognition (ASR). In machine learning, we can achieve this task using various techniques.

This Answer will provide a simple example of building a speech recognition system using Python and the SpeechRecognition library. This example will give us a basic understanding of the process.

First, we have to install the SpeechRecognition library using the following command:

Explanation

Line 1: We import the speech_recognition library.
Line 3: We initialize the recognizer object.
Line 5: We load the speech file named h_orig.wav which we have uploaded in the Answer.
Line 7: We recognize the speech from the speech file.
Line 8: We record the speech data.
Line 11: We recognize the speech using Google Web Speech API.
Line 12: We print the resulting text recognized from the speech file.

The confidence value represents the level of certainty that the transcription (recognized text) is accurate. The result indicates that the system is approximately 84.42% certain that the spoken audio has been accurately transformed into the given text transcription.

How to convert speech into text in machine learning

Example

Explanation