BERTSUM for Abstractive Summarization

Learn how to use the BERTSUM model to find the probability of including the sentence in the abstractive summary.

We will learn how to perform abstractive summarization using BERT. In abstractive summarization, our goal is to create a summary by paraphrasing the given text. That is, in abstractive summarization, given a text, we will create a summary by re-expressing the given text using different words holding only the essential meaning of the given text. But how can we do this with BERT? Because BERT will return only the representation of tokens. How can we generate a new text with BERT? Let's explore this in detail.

Abstractive Summarization using BERT

To perform abstractive summarization, we use the transformer model with encoder-decoder architecture. We feed the input text to the encoder, and the encoder will return the representation of the given input text. We take the representation returned by the encoder and feed it as input to the decoder. The decoder uses this representation and generates the summary.

Now, in our transformer model with encoder-decoder architecture, we can use pre-trained BERTSUM as an encoder. So, the pre-trained BERTSUM model will generate meaningful representation, and the decoder uses this representation and learns how to generate the summary.

Get hands-on with 1200+ tech skills courses.