Introducion: BERT Variants

Let's go over what we'll learn in the BERT variants section.

In this section, we will explore several interesting variants of BERT. We'll learn about popular variants of BERT, such as ALBERT, RoBERTa, ELECTRA, and SpanBERT. We will also explore BERT variants based on knowledge distillation, such as DistilBERT and TinyBERT.

The following chapters are included in this section:

  • Different BERT Variants

  • BERT Variants—Based on Knowledge Distillation

Different BERT variants

We will start with understanding how ALBERT works. ALBERT is basically A Lite version of BERT model. The ALBERT model includes a few architectural changes to the BERT to minimize the training time. We will cover how ALBERT works and how it differs from BERT in detail.

Moving on, we will learn about the RoBERTa model, which stands for a Robustly Optimized BERT pre-training Approach. RoBERTa is one of the most popular variants of the BERT, and it is used in many state-of-the-art systems. RoBERTa works similarly to BERT but with a few changes in the pre-training steps. We will explore how RoBERTa works and how it differs from the BERT model in detail.

Create a free account to view this lesson.

By signing up, you agree to Educative's Terms of Service and Privacy Policy