The Cross-Lingual Language Model (XLM)

Learn about the XLM model, including its training dataset, different pre-training strategies, and the process for both pre-training and evaluation.

The M-BERT model is pre-trained just like the regular BERT model, without any specific cross-lingual objective. In this lesson, let's learn how to pre-train BERT with a cross-lingual objective. We refer to BERT trained with a cross-lingual objective as a cross-lingual language model (XLM). The XLM model performs better than M-BERT, and it learns cross-lingual representations.

Training dataset

The XLM model is pre-trained using the monolingual and parallel datasets. The parallel dataset consists of text in a language pair; that is, it consists of the same text in two different languages. Say we have an English sentence, and then we will have a corresponding sentence in another language, French, for example. We can call this parallel dataset a cross-lingual dataset.

Get hands-on with 1200+ tech skills courses.