Multilingual Sentence-BERT Model

Learn about the different pre-trained multilingual models that are available and how to use them compute the similarity between two sentences in different languages.

We'll cover the following

Pre-trained multilingual models
Using the multilingual model
Coding playground

We learned how to make the monolingual model multilingual through knowledge distillation. Now let's learn how to use the pre-trained multilingual model. The researchers have made their pre-trained models publicly available with the sentence-transformers library. So, we can directly download a pre-trained model and use it for our task.

Pre-trained multilingual models

The available pre-trained multilingual models are as follows:

distiluse-base-multilingual-cased: This supports Arabic, Chinese, Dutch, English, French, German, Italian, Korean, Polish, Portuguese, Russian, Spanish, and Turkish.
xlm-r-base-en-ko-nli-ststb: This supports Korean and English.
xlm-r-large-en-ko-nli-ststb: This supports Korean and English.

Now, let's learn how to use these pre-trained models.

Using the multilingual model

Let's see how to compute the similarity between two sentences in a different language. First, let's import the SentenceTransformer module:

Get hands-on with 1400+ tech skills courses.

Before We Start

Starting Off with BERT

A Primer on Transformers

Understanding the BERT Model

Getting Hands-On with BERT

Exploring BERT Variants

Different BERT Variants

BERT Variants—Based on Knowledge Distillation

Applications of BERT

Exploring BERTSUM for Text Summarization

Applying BERT to Other Languages

Exploring Sentence and Domain-Specific BERT

Working with VideoBERT, BART, and More

Conclusion

Multilingual Sentence-BERT Model

Pre-trained multilingual models

Using the multilingual model