What is a Web Speech API?

Web Speech API

Web Speech API provides two distinct areas of functionality: speech recognition and speech synthesis.

This API provides us with the capabilities to add speech synthesis and speech recognition to our web app.

With this API, we are able to issue voice commands to our web apps the same way we do on Android via its Google Speech or in Windows using Cortana.

Example

Let’s look at a simple example of how to implement text-to-speech and speech-to-text using Web Speech API:

<body>
    <header>
        <h2>Web APIs<h2>
    </header>
    <div class="web-api-cnt">
        <div id="error" class="close"></div>
        <div class="web-api-card">
            <div class="web-api-card-head">
                Demo - Text to Speech
            </div>
            <div class="web-api-card-body">
                <div>
                    <input placeholder="Enter text here" type="text" id="textToSpeech" />
                </div>
                <div>
                    <button onclick="speak()">Tap to Speak</button>
                </div>
            </div>
        </div>
        <div class="web-api-card">
            <div class="web-api-card-head">
                Demo - Speech to Text
            </div>
            <div class="web-api-card-body">
                <div>
                    <textarea placeholder="Text will appear here when you start speeaking." id="speechToText"></textarea>
                </div>
                <div>
                    <button onclick="tapToSpeak()">Tap and Speak into Mic</button>
                </div>
            </div>
        </div>
    </div>
</body>
<script>
    try {
        var speech = new SpeechSynthesisUtterance()
        var SpeechRecognition = SpeechRecognition;
        var recognition = new SpeechRecognition()
    } catch(e) {
        error.innerHTML = "Web Speech API not supported in this device."
        error.classList.remove("close")                
    }
    function speak() {
        speech.text = textToSpeech.value
        speech.volume = 1
        speech.rate=1
        speech.pitch=1
        window.speechSynthesis.speak(speech)
    }
    function tapToSpeak() {
        recognition.onstart = function() { }
        recognition.onresult = function(event) {
            const curr = event.resultIndex
            const transcript = event.results[curr][0].transcript
            speechToText.value = transcript
        }
        recognition.onerror = function(ev) {
            console.error(ev)
        }
        recognition.start()
    }
    
</script>

It instantiates the SpeechSynthesisUtterance() object and sets the text to speak from the text we typed in the input box. Then, we call the speechSynthesis.speak function with the speech object, and it says the text in the input box out loud in our speaker.

The second demo, speech to text, is a voice recognition demo. We tap on the Tap and Speak into Mic button and speak into the mic, and the words we say are translated into letters in the text area.

The Tap and Speak into Mic button, when clicked, calls the tapToSpeak function: