Language-Specific BERT

Learn about monolingual BERT models for various languages and gain a detailed understanding of the FlauBERT, the pre-trained BERT model for the French language.

The M-BERT model is used in many different languages. However, instead of having a single M-BERT model for many languages, can we train a monolingual BERT for a specific target language? We can, and that is precisely what we will learn in this lesson.

M-BERT for various languages

Below are several interesting and popular monolingual BERT models for various languages:

  • FlauBERT for French

  • BETO for Spanish

  • BERTje for Dutch

  • German BERT

  • Chinese BERT

  • Japanese BERT

  • FinBERT for Finnish

  • UmBERTo for Italian

  • BERTimbay for Portuguese

  • RuBERT for Russian

FlauBERT for French

The French Language Understanding via BERT (FlauBERT) model is a pre-trained BERT model for the French language. The FlauBERT model performs better than the multilingual and cross-lingual models on many downstream French NLP tasks.

FlauBERT is trained on a huge heterogeneous French corpus. The French corpus consists of 24 subcorpora containing data from various sources, including Wikipedia, books, internal crawling, WMT19 data, French text from OPUS (an open-source parallel corpus), and Wikimedia.

Get hands-on with 1200+ tech skills courses.