...

/

Introduction: BERT

We will begin this chapter by getting a basic idea of the transformer. Then, we will learn how the transformer uses encoder-decoder architecture for a language translation task. Following this, we will inspect how the encoder of the transformer works in detail by exploring each of the encoder components. After understanding the encoder, we will deep dive into the decoder and look into each of the decoder components in detail. At the end of the chapter, we will put the encoder and decoder together and see how the transformer works as a whole.

Moving on, we will learn how the BERT model is pre-trained using two tasks, called masked language modeling and next sentence prediction, in detail. We will then look into the pre-training procedure of BERT. At the end of the chapter, we will learn about several interesting subword tokenization algorithms, including byte pair encoding, byte-level byte pair encoding, and WordPiece.

Getting hands-on with BERT

We will learn how to use the pre-trained BERT model in detail. First, we will look at the different configurations of the pre-trained BERT model open-sourced by Google. Then, we will learn how to use the pre-trained BERT model as a feature extractor. We will also explore Hugging Face's transformers library and learn how to use it to extract embeddings from the pre-trained BERT.

Moving on, we will understand how to extract embeddings from all encoder layers of BERT. Next, we will learn how to fine-tune the pre-trained BERT model for the downstream tasks:

We will learn to fine-tune the pre-trained BERT model for a text classification task.
Using the transformers library, we will learn to fine-tune BERT for sentiment analysis tasks.
We will look into fine-tuning the pre-trained BERT model for natural language inference, question-answering tasks, and named entity recognition tasks.

Before We Start

Starting Off with BERT

A Primer on Transformers

Semantic Search with Transformers

Understanding the BERT Model

Getting Hands-On with BERT

Exploring BERT Variants

Different BERT Variants

BERT Variants—Based on Knowledge Distillation

Applications of BERT

Exploring BERTSUM for Text Summarization

Applying BERT to Other Languages

Exploring Sentence and Domain-Specific BERT

Working with VideoBERT, BART, and More

Conclusion

Similarity Detection in English Language Using RoBERTa

Introduction: BERT

A primer on transformers

Understanding the BERT model

Getting hands-on with BERT