Sometimes, we need to get transcripts/subtitles of YouTube videos, but to do this, we would have to go to the YouTube video and manually generate the transcript. In Python, we have a package named youtube_transcript_api that can be used to automatically give you a transcript that you can use as plain text.
First, let us install this package by running:
pip install youtube_transcript_api
Now, need the YouTube video id for the transcript we want to generate. In the URL below, the text in green is the video id:
Now, let’s see the code:
from youtube_transcript_api import YouTubeTranscriptApi def generate_transcript(id): transcript = YouTubeTranscriptApi.get_transcript(id) script = "" for text in transcript: t = text["text"] if t != '[Music]': script += t + " " return script, len(script.split()) id = 'Y8Tko2YC5hA' transcript, no_of_words = generate_transcript(id) print(transcript)
generate_transcript()function, which accepts the video
idas a parameter and will return the transcript as well as the number of words in the transcript.
get_transcript()method of our package that gets the transcript of the
idprovided as a parameter. This function returns a list of dictionaries, so we need to do some processing to convert it to a single string.
Musicso that, if there is any music in the video, it will not come to our final transcript string.
This package will throw an error if there is no subtitle for the YouTube video for which you passed the video
View all Courses