...

BERTSUM for Abstractive Summarization

Learn how to use the BERTSUM model to find the probability of including the sentence in the abstractive summary.

We'll cover the following...

Abstractive Summarization using BERT
Learning rate for the encoder
Learning rate for the decoder

We will learn how to perform abstractive summarization using BERT. In abstractive summarization, our goal is to create a summary by paraphrasing the given text. That is, in abstractive summarization, given a text, we will create a summary by re-expressing the given text using different words holding only the essential meaning of the given text. But how can we do this with BERT? Because BERT will return only the representation of tokens. How can we generate a new text with BERT? Let's explore this in detail.

Abstractive Summarization using BERT

To perform abstractive summarization, we use the transformer model with encoder-decoder architecture. We feed the input text to the encoder, and the encoder will return the representation of the given input text. We take the representation returned by the encoder and feed it as input to the decoder. The decoder uses this representation and generates the summary.

Now, in our transformer model with encoder-decoder architecture, we can use pre-trained BERTSUM as an encoder. So, the pre-trained BERTSUM model will generate meaningful representation, and the decoder uses this representation and learns how to generate the summary.

Before We Start

Starting Off with BERT

A Primer on Transformers

Understanding the BERT Model

Getting Hands-On with BERT

Exploring BERT Variants

Different BERT Variants

BERT Variants—Based on Knowledge Distillation

Applications of BERT

Exploring BERTSUM for Text Summarization

Semantic Search with Transformers

Applying BERT to Other Languages

Exploring Sentence and Domain-Specific BERT

Working with VideoBERT, BART, and More

Conclusion

Similarity Detection in English Language Using RoBERTa

BERTSUM for Abstractive Summarization

Abstractive Summarization using BERT

Learning rate for the encoder