Introduction: BERT

Let's go over what we'll cover in the first section of the course.

Let's familiarize ourselves with BERT. First, we'll learn how the transformer works, and then we will explore BERT in detail. We'll also get hands-on with BERT and learn how to use the pre-trained BERT model.

The following chapters are included in this section:

  • A Primer on Transformers

  • Understanding the BERT Model

  • Getting Hands-On with BERT

A primer on transformers

We will begin this chapter by getting a basic idea of the transformer. Then, we will learn how the transformer uses encoder-decoder architecture for a language translation task. Following this, we will inspect how the encoder of the transformer works in detail by exploring each of the encoder components. After understanding the encoder, we will deep dive into the decoder and look into each of the decoder components in detail. At the end of the chapter, we will put the encoder and decoder together and see how the transformer works as a whole.

Understanding the BERT model

We will get started with one of the most popularly used state-of-the-art text embedding models called BERT. BERT has revolutionized the world of NLP by providing state-of-the-art results on many NLP tasks. We will begin the chapter by understanding what BERT is and how it differs from the other embedding models. We will then look into the working of BERT and its configuration in detail.

Moving on, we will learn how the BERT model is pre-trained using two tasks, called masked language modeling and next sentence prediction, in detail. We will then look into the pre-training procedure of BERT. At the end of the chapter, we will learn about several interesting subword tokenization algorithms, including byte pair encoding, byte-level byte pair encoding, and WordPiece.

Getting hands-on with BERT

We will learn how to use the pre-trained BERT model in detail. First, we will look at the different configurations of the pre-trained BERT model open-sourced by Google. Then, we will learn how to use the pre-trained BERT model as a feature extractor. We will also explore Hugging Face's transformers library and learn how to use it to extract embeddings from the pre-trained BERT.

Moving on, we will understand how to extract embeddings from all encoder layers of BERT. Next, we will learn how to fine-tune the pre-trained BERT model for the downstream tasks:

  • We will learn to fine-tune the pre-trained BERT model for a text classification task.

  • Using the transformers library, we will learn to fine-tune BERT for sentiment analysis tasks.

  • We will look into fine-tuning the pre-trained BERT model for natural language inference, question-answering tasks, and named entity recognition tasks.