A process that converts speech into text is called automatic speech recognition (ASR). In machine learning, we can achieve this task using various techniques.
This Answer will provide a simple example of building a speech recognition system using Python and the SpeechRecognition
library. This example will give us a basic understanding of the process.
First, we have to install the SpeechRecognition
library using the following command:
pip install SpeechRecognition
Following is a simple coding example of converting speech into text using SpeechRecognition
library. Here's the audio file which we have to convert into text. Listen to it while running the below example:
import speech_recognition as srrecognizer = sr.Recognizer()speech_file = "/file/h_orig.wav"with sr.AudioFile(speech_file) as source:speech = recognizer.record(source)try:text = recognizer.recognize_google(speech)print("Recognized Text from the speech file:", text)except sr.UnknownValueError:print("Could not understand speech file")except sr.RequestError as e:print("Could not request results from Google Web Speech API; {0}".format(e))
Line 1: We import the speech_recognition
library.
Line 3: We initialize the recognizer
object.
Line 5: We load the speech file named h_orig.wav
which we have uploaded in the Answer.
Line 7: We recognize the speech from the speech file.
Line 8: We record the speech data.
Line 11: We recognize the speech using Google Web Speech API.
Line 12: We print the resulting text recognized from the speech file.
The confidence
value represents the level of certainty that the transcription (recognized text) is accurate. The result indicates that the system is approximately 84.42% certain that the spoken audio has been accurately transformed into the given text transcription.
Free Resources