ALBERT : Training the Model

Learn about the ALBERT model and how to pre-train it using masked language modeling and sentence order prediction tasks.

Similar to BERT, the ALBERT model is pre-trained using the English Wikipedia and Toronto BookCorpus datasets. The BERT model is pre-trained using the masked language modeling (MLM) and next sentence prediction (NSP) tasks. Similarly, the ALBERT model is pre-trained using the MLM task, but instead of using the NSP task, ALBERT uses a new task called sentence order prediction (SOP). But why not use the NSP task?

Sentence order prediction (SOP)

The researchers of ALBERT pointed out that pre-training with the NSP task is not really useful and isn't a difficult task to perform compared to the MLM task. Also, the NSP task combines both topic prediction and coherence prediction into a single task. To alleviate this, researchers introduced the SOP task. SOP is based on inter-sentence coherence and not on topic prediction. Let's look at how the SOP task works in detail.

Get hands-on with 1200+ tech skills courses.