Search⌘ K
AI Features

The Cross-Lingual Language Model (XLM)

Explore the Cross-Lingual Language Model (XLM) and its pre-training strategies including Causal Language Modeling, Masked Language Modeling, and Translation Language Modeling. Understand how XLM leverages monolingual and parallel datasets to learn multilingual representations, surpassing multilingual BERT in performance, and how it can be fine-tuned for cross-lingual NLP tasks.

The M-BERT model is pre-trained just like the regular BERT model, without any specific cross-lingual objective. In this lesson, let's learn how to pre-train BERT with a cross-lingual objective. We refer to BERT trained with a cross-lingual objective as a cross-lingual language model (XLM). The XLM model performs better than M-BERT, and it learns cross-lingual representations.

Training dataset

The XLM model is pre-trained using the monolingual and parallel datasets. The parallel dataset consists of text in a language pair; that is, it consists of the same text in two different languages. Say we have an English sentence, and then we will have a corresponding sentence in another language, French, for example. We can call this parallel dataset a cross-lingual dataset.

The monolingual dataset is obtained from Wikipedia, and the parallel dataset is obtained from several sources, including MultiUN (a multilingual corpus from United ...