How to convert speech into text in machine learning

A process that converts speech into text is called automatic speech recognition (ASR). In machine learning, we can achieve this task using various techniques.

This Answer will provide a simple example of building a speech recognition system using Python and the SpeechRecognition library. This example will give us a basic understanding of the process.

First, we have to install the SpeechRecognition library using the following command:

pip install SpeechRecognition

Example

Following is a simple coding example of converting speech into text using SpeechRecognition library. Here's the audio file which we have to convert into text. Listen to it while running the below example:

import speech_recognition as sr
recognizer = sr.Recognizer()
speech_file = "/file/h_orig.wav"
with sr.AudioFile(speech_file) as source:
speech = recognizer.record(source)
try:
text = recognizer.recognize_google(speech)
print("Recognized Text from the speech file:", text)
except sr.UnknownValueError:
print("Could not understand speech file")
except sr.RequestError as e:
print("Could not request results from Google Web Speech API; {0}".format(e))

Explanation

  • Line 1: We import the speech_recognition library.

  • Line 3: We initialize the recognizer object.

  • Line 5: We load the speech file named h_orig.wav which we have uploaded in the Answer.

  • Line 7: We recognize the speech from the speech file.

  • Line 8: We record the speech data.

  • Line 11: We recognize the speech using Google Web Speech API.

  • Line 12: We print the resulting text recognized from the speech file.

The confidence value represents the level of certainty that the transcription (recognized text) is accurate. The result indicates that the system is approximately 84.42% certain that the spoken audio has been accurately transformed into the given text transcription.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved