Pre-training Dataset and Applications of VideoBERT

Explore how VideoBERT is pre-trained on a large dataset of instructional YouTube videos and understand its key applications. Learn how VideoBERT predicts future visual tokens, generates videos from text inputs, and captions videos effectively, enhancing video and language representation learning.

We'll cover the following...

Data source and preprocessing
Applications of VideoBERT

Data source and preprocessing

In order for VideoBERT to learn better language and video representations, we need a large number of videos. We don't use random videos for pre-training; instead, we use instructional videos. How do we obtain instructional videos? Researchers have used instructional videos from YouTube to form their dataset. They filtered out YouTube videos related to cooking using the YouTube video annotation system. Out of these filtered videos, they only included videos whose duration was less than 15 minutes. In ...

1.Before We Start

2.Starting Off with BERT

3.A Primer on Transformers

Project

4.Understanding the BERT Model

5.Getting Hands-On with BERT

6.Exploring BERT Variants

7.Different BERT Variants

8.BERT Variants—Based on Knowledge Distillation

9.Applications of BERT

10.Exploring BERTSUM for Text Summarization

11.Applying BERT to Other Languages

12.Exploring Sentence and Domain-Specific BERT

13.Working with VideoBERT, BART, and More

14.Conclusion

Project

Pre-training Dataset and Applications of VideoBERT

Data source and preprocessing