...

ALBERT : Training the Model

Learn about the ALBERT model and how to pre-train it using masked language modeling and sentence order prediction tasks.

We'll cover the following...

Sentence order prediction (SOP)
- Example
- Creating a dataset for an SOP task
Comparing ALBERT with BERT

Similar to BERT, the ALBERT model is pre-trained using the English Wikipedia and Toronto BookCorpus datasets. The BERT model is pre-trained using the masked language modeling (MLM) and next sentence prediction (NSP) tasks. Similarly, the ALBERT model is pre-trained using the MLM task, but instead of using the NSP task, ALBERT uses a new task called sentence order prediction (SOP). But why not use the NSP task?

Sentence order prediction (SOP)

The researchers of ALBERT pointed out that pre-training with the NSP task is not really useful and isn't a difficult task to perform compared to the MLM task. Also, the NSP task combines both topic prediction and coherence ...

Before We Start

Starting Off with BERT

A Primer on Transformers

Understanding the BERT Model

Getting Hands-On with BERT

Exploring BERT Variants

Different BERT Variants

BERT Variants—Based on Knowledge Distillation

Applications of BERT

Exploring BERTSUM for Text Summarization

Semantic Search with Transformers

Applying BERT to Other Languages

Exploring Sentence and Domain-Specific BERT

Working with VideoBERT, BART, and More

Conclusion

Similarity Detection in English Language Using RoBERTa

ALBERT : Training the Model

Sentence order prediction (SOP)