Introduction: Applications of BERT
Discover how to fine-tune BERT for text summarization with BERTSUM, apply multilingual BERT for non-English languages, and use domain-specific models like ClinicalBERT and BioBERT. Learn about models such as VideoBERT and BART and gain the skills to leverage these technologies in practical NLP tasks.
In this section, we'll learn how to fine-tune BERT for text summarization tasks using BERTSUM. Then, we will explore how to apply BERT for languages other than English. We will also learn about VideoBERT and other interesting models.
The following chapters are included in this section:
Exploring BERTSUM for Text Summarization
Applying BERT to Other Languages
Exploring Sentence and Domain-Specific BERT
Working with VideoBERT, BART, and More
Exploring BERTSUM for text summarization
Text summarization is one of the most popular applications of natural language processing. This chapter will explain how to fine-tune the pre-trained BERT model for a text summarization task. The BERT model fine-tuned for the text summarization task is often called BERTSUM (BERT for summarization). We will understand what BERTSUM is and how it is used for text summarization in detail.
We will begin by understanding two different types of text summarization—extractive and abstractive summarizations. First, we will learn how to perform extractive summarization using BERTSUM with a classifier, BERTSUM with a transformer, and BERTSUM with an LSTM. Next, we will look into how BERTSUM is used for performing the abstractive summarization task.
Going forward, we will learn about the text summarization evaluation metric called the ROUGE metric. We will understand ROUGE-N and ROUGE-L evaluation metrics in detail. Next, we will check the performance of the BERTSUM model. In the end, we will also take a look at training the BERTSUM model.
Applying BERT to other languages
We've learned how BERT works and explored its different variants. However, so far we've only applied BERT to the English language. Can we also apply BERT to other languages? The answer to this question is yes, and that's precisely what we will learn. We will use multilingual BERT (M-BERT) to compute the representation of different languages other than English. We will begin the chapter by understanding how M-BERT works and how to use it.
Next, we will understand how multilingual the M-BERT model is by investigating it in detail. Following this, we will learn about the XLM model. XLM stands for the cross-lingual language model, which is used to obtain cross-lingual representations. We will go over how XLM works and how it differs from M-BERT in detail.
Next, we'll learn about XLM-R, which is the XLM-RoBERTa model. XLM-R is the state-of-the-art, cross-lingual model. We will explore how XLM-R works and how it differs from XLM. At the end, we will look at some of the pre-trained monolingual BERT models for languages, including French, Spanish, Dutch, German, Chinese, Japanese, Finnish, Italian, Portuguese, and Russian.
Exploring sentence and domain-specific BERT
Sentence-BERT is one of the most interesting variants of BERT and is popularly used for computing sentence representation. We will begin by understanding how Sentence-BERT works in detail. We will explore how Sentence-BERT computes sentence representation using the siamese and triplet network architectures. Next, we will learn about the sentence-transformers library. We will learn how to use the pre-trained Sentence-BERT model to compute sentence representation with the sentence-transformers library.
Moving on, we will understand how to make the monolingual model multilingual with knowledge distillation in detail. Next, we will learn about several interesting domain-specific BERT models, such as ClinicalBERT and BioBERT. We will learn how ClinicalBERT is trained, and how it is used for predicting the probability of re-admission.
Next, we will understand how BioBERT is trained and how to fine-tune the pre-trained BioBERT for named-entity recognition (NER) and question-answering tasks.
Working with VideoBERT, BART, and more
We will learn about two interesting models: VideoBERT and BART. We will also explore two popular BERT libraries known as ktrain and bert-as-service. We will start by learning how VideoBERT works. We will look at how the VideoBERT model is pre-trained to learn the representation of language and video in a joint manner. Then, we will look into some of the applications of VideoBERT.
Moving on, we will learn what BART is and how it differs from the BERT model. We will understand the different noising techniques used in BART in detail. Then, we will see how to perform text summarization using the pre-trained BART model.
After that, we will learn about an interesting library called ktrain. We will explore how ktrain works, and we will learn to use ktrain for sentiment analysis, document answering, and document summarization.