How Multilingual is Multilingual BERT?

Learn whether the multilingual knowledge transfer of M-BERT depends on the vocabulary overlap.

M-BERT is trained on the Wikipedia text of 104 different languages. M-BERT is also evaluated by fine-tuning it on the XNLI dataset. But how multilingual is our M-BERT? How is a single model able to transfer knowledge across multiple languages? To understand this, let's investigate the multilingual ability of M-BERT in more detail.

Effect of vocabulary overlap

M-BERT is trained on the Wikipedia text of 104 languages, and it consists of a shared vocabulary of 110k tokens. In this lesson, let's investigate whether the multilingual knowledge transfer of M-BERT depends on the vocabulary overlap.

Get hands-on with 1200+ tech skills courses.