How to get subtitles with time stamps using Whisper

In today’s digital world, audio and video content is everywhere, from educational courses and webinars to YouTube videos and online meetings. However, understanding and accessibility can be challenging for some audiences due to language barriers or hearing impairments. Subtitles can bridge this gap, making content more accessible and easier to follow. One innovative tool that simplifies the process of generating subtitles with timestamps is OpenAI’s Whisper.

In this Answer, we’ll explore using Whisper to create accurate subtitles for our audio content.

What is Whisper?

Whisper is an advanced speech recognition system developed by OpenAI, designed to transcribe audio into text efficiently. What sets Whisper apart is its ability to handle various accents, background noises, and even different languages, making it a versatile tool for subtitle generation.

Setting up Whisper

Whisper requires the ffmpeg tool to function correctly. The ffmpeg tool is a command-line utility used for multimedia processing. Depending on our operating systems, ffmpeg can be installed with the following commands:

Let's break down the arguments in the above command:

model: This is the size of the Whisper model to be used for this task. The sizes range from tiny, base, small, medium, and large. As models get larger, their accuracy improves but the relative speed of their output decreases.
language: Whisper offers transcribing support for more than 99 languages. These languages include English, Chinese, Somali, German, Urdu and more.
word_timestamps: This argument specifies whether Whisper should generate time stamps for the output subtitles.
output_dir: This argument specifies where Whisper's output should be saved.
output_format: This argument specifies the format for Whisper's output. Some of the commonly used formats include srt, json, and txt.

Demonstration

Run the following terminal to observe how Whisper can transcribe audio files with time stamps.

How to get subtitles with time stamps using Whisper

What is Whisper?

Setting up Whisper

Generating subtitles with time stamps using Whisper

Demonstration

Conclusion