Pre-training Dataset and Applications of VideoBERT

Learn about the dataset used to pre-train the VideoBERT, along with some interesting VideoBERT applications.

Data source and preprocessing

In order for VideoBERT to learn better language and video representations, we need a large number of videos. We don't use random videos for pre-training; instead, we use instructional videos. How do we obtain instructional videos? Researchers have used instructional videos from YouTube to form their dataset. They filtered out YouTube videos related to cooking using the YouTube video annotation system. Out of these filtered videos, they only included videos whose duration was less than 15 minutes. In total, the number of videos is 312,000, which equals about 23,186 hours or 966 days.

Get hands-on with 1200+ tech skills courses.