Search⌘ K
AI Features

BERT Models for Italian and Portuguese

Explore how to utilize UmBERTo and BERTimbau models specialized for Italian and Portuguese languages. Understand their training methods, architectures, and practical usage with transformers libraries to generate sentence representations. Gain hands-on experience working with multilingual BERT variants for effective NLP applications.

UmBERTo for Italian

UmBERTo is the pre-trained BERT model for the Italian language by Musixmatch research. The UmBERTo model inherits the RoBERTa model architecture. The RoBERTa is essentially BERT with the following changes in pre-training:

  • Dynamic masking is used instead of static masking in the MLM task.

  • The NSP task is removed and trained using only the MLM task.

  • Training is undertaken with a large batch size.

  • Byte-level BPE is used as a ...