Search⌘ K

The XLM-R Model

Explore the XLM-R model, an advanced multilingual transformer trained on extensive monolingual datasets with masked language modeling. Understand its architecture, training methods, and superior performance compared to other multilingual BERT models. Learn how it can be fine-tuned for cross-lingual NLP tasks to achieve better accuracy across many languages.

The XLM-RoBERTa (XLM-R) model is basically an extension of XLM with a few modifications to improve performance. It is a state-of-the-art model for learning cross-lingual representation.

Pre-training the XLM-R model

The XLM is trained with MLM and TLM tasks. The MLM task uses the monolingual dataset, and the TLM task uses the parallel dataset. However, obtaining this parallel dataset is difficult for low-resource languages. So, in the XLM-R model, we train the model only with the MLM objective, and we don't use the TLM objective. Thus, the XLM-R model requires only a monolingual dataset.

XLM-R is trained on a huge dataset whose size is 2.5 TB. The dataset is obtained by filtering the unlabeled text of 100 languages from the CommonCrawl dataset. We also increase the proportion of small languages in our dataset through sampling. The following diagram ...