Understanding BERT

Explore the architecture and functionality of BERT, an encoder-only transformer model used for natural language processing tasks. Understand how BERT uses special tokens and embeddings to handle sequence classification, token classification, question answering, and multiple-choice tasks. Discover BERT's pretraining with masked language modeling and next sentence prediction, and how it enables strong language understanding for downstream applications.

We'll cover the following...

Input processing for BERT
Tasks solved by BERT
How BERT is pretrained
- Masked language modeling (MLM)
- Next sentence prediction (NSP)

Bidirectional Encoder Representation from Transformers (BERT) is a transformer model among a plethora of transformer models that have come to light over the past few years.

BERT was introduced in the paper BERT—Pre-training of Deep Bidirectional Transformers for Language UnderstandingDelvin et al. (https://arxiv.org/pdf/1810.04805.pdf). The transformer models are divided into two main factions:

Encoder-based models
Decoder-based (autoregressive) models

In other words, either the encoder or the decoder part of the transformer provides the foundation for these models, compared to using both the encoder and the decoder. The main difference between the two is how attention is used. Encoder-based models use bidirectional attention, whereas decoder-based models use autoregressive (that is, left to right) attention.

BERT is an encoder-based transformer model. It takes an input sequence (a collection of tokens) and produces an encoded output sequence. The figure below depicts the high-level architecture of BERT :

1.Introduction to Natural Language Processing

2.Understanding TensorFlow 2

3.Word2vec: Learning Word Embeddings

4. Advanced Word Vector Algorithms

5.Sentence Classification with Convolutional Neural Networks

6.Recurrent Neural Networks

7.Understanding Long Short-Term Memory Networks

8.Applications of LSTM: Generating Text

9.Sequence-to-Sequence Learning: Neural Machine Translation

10.Transformers

Project

11.Image Captioning with Transformers

12.Final Remarks

13.Appendix: Mathematical Foundations and Advanced TensorFlow

Mock Interview

Understanding BERT