How to get subtitles with time stamps using Whisper

In today’s digital world, audio and video content is everywhere, from educational courses and webinars to YouTube videos and online meetings. However, understanding and accessibility can be challenging for some audiences due to language barriers or hearing impairments. Subtitles can bridge this gap, making content more accessible and easier to follow. One innovative tool that simplifies the process of generating subtitles with timestamps is OpenAI’s Whisper.

In this Answer, we’ll explore using Whisper to create accurate subtitles for our audio content.

What is Whisper?

Whisper is an advanced speech recognition system developed by OpenAI, designed to transcribe audio into text efficiently. What sets Whisper apart is its ability to handle various accents, background noises, and even different languages, making it a versatile tool for subtitle generation.

Setting up Whisper

Whisper requires the ffmpeg tool to function correctly. The ffmpeg tool is a command-line utility used for multimedia processing. Depending on our operating systems, ffmpeg can be installed with the following commands:

# On Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# On Arch Linux
sudo pacman -S ffmpeg
# On MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
# On Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# On Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
Commands to install the ffmpeg tool for different operating systems and package managers

Next, we can install or update Whisper using the following commands:

# Installing Whisper through the GitHub repository
pip install git+https://github.com/openai/whisper.git
# Updating Whisper to the latest version
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
Commands to install or update Whisper

Generating subtitles with time stamps using Whisper

Given an audio file, we can generate subtitles with time stamps through the following terminal command:

whisper /path_to_audio_file --model tiny --language en --word_timestamps True --output_dir /path_for_output --output_format txt
Command to generate time stamps using Whisper

Let's break down the arguments in the above command:

  • model: This is the size of the Whisper model to be used for this task. The sizes range from tiny, base, small, medium, and large. As models get larger, their accuracy improves but the relative speed of their output decreases.

  • language: Whisper offers transcribing support for more than 99 languages. These languages include English, Chinese, Somali, German, Urdu and more.

  • word_timestamps: This argument specifies whether Whisper should generate time stamps for the output subtitles.

  • output_dir: This argument specifies where Whisper's output should be saved.

  • output_format: This argument specifies the format for Whisper's output. Some of the commonly used formats include srt, json, and txt.

Demonstration

Run the following terminal to observe how Whisper can transcribe audio files with time stamps.

Terminal 1
Terminal
Loading...

Conclusion

With OpenAI’s Whisper, the process of generating accurate subtitles with time stamps has become simpler and more accessible to content creators. By following the steps outlined in this Answer, we can enhance the accessibility of our multimedia content, ensuring that it can be enjoyed by a wider audience.

Copyright ©2024 Educative, Inc. All rights reserved