Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA) is another interesting variant of BERT. We pre-train BERT using the MLM and NSP tasks. We know that in the MLM task, we randomly mask 15% of the tokens and train BERT to predict the masked token. Instead of using the MLM task as a pre-training objective, ELECTRA is pre-trained using a task called replaced token detection.

The replaced token detection task

The replaced token detection task is very similar to MLM, but instead of masking a token with the [MASK] token, we replace a token with a different token and train the model to classify whether the given tokens are actual or replaced tokens.

Why ELECTRA does not use the MLM task

Get hands-on with 1200+ tech skills courses.