How the transformer model is used for question-answering

Key takeaways:

  • Transformer models replace RNNs/LSTMs in NLP, using attention mechanisms for efficiency. They consist of an encoder for input processing and a decoder for output generation.

  • Transformer pretrained models like BERT answer questions by extracting answers from context.

  • For question-answering, input text is tokenized, structured with special tokens, and converted to IDs for processing. The model predicts start and end indices to extract and display answers.

  • Transformers are scalable, versatile, and essential for modern NLP tasks.

The Transformer model is a type of deep learning neural network that is used as an efficient replacement of recurrent neural network (RNN) and long short-term memory (LSTM) for various natural language processing (NLP) tasks. It was developed by Google and proposed in the groundbreaking paper "Attention Is All You Need" in 2017 based on the multi-head attention mechanism. It is designed in a way to handle the sequential data more efficiently as compared to the previous modules.

We’ll see how a transformer model helps to implement question-answers using a pretrained model.

Workflow

Let’s understand how the transformer model works. It has two main components: an encoder and a decoder. The encoder processes the input data and passes information about the representation of the input data to the decoder. The decoder receives the representation sent by the encoder and generates the output sentence in the sequence to generate the answer.

Here’s the explanation of the code with steps:

Question answering with transformer model

Suppose we have a question and a relevant paragraph. We want to extract the answer from the paragraph using a Transformer model. Let’s go through the steps:

1. Import libraries

Import necessary Python libraries and modules needed for text processing and question answering.

import os
import torch
import logging
from transformers import BertForQuestionAnswering, BertTokenizer
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

2. Load the tokenizer and model

We use the pretrained bert-large-uncased-whole-word-masking-finetuned-squad model, fine-tuned on SQuAD v1.1. This model is case-insensitive and trained to answer questions using a context. The tokenizer is used to process input text into tokens.

tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

3. Define the question and paragraph

Input the question and the paragraph from which the answer will be extracted.

Question = "What is the immune system?"
paragraph = "The immune system is a system of many biological structures and processes within an organism that protects against disease. To function properly, an immune system must detect a wide variety of agents, known as pathogens, from viruses to parasitic worms, and distinguish them from the organism's own healthy tissue."

4. Process the input data

Special tokens are added to mark the beginning of the input ([CLS]), separate the question and paragraph ([SEP]), and end the input.

question = '[CLS] ' + Question + '[SEP]'
paragraph = paragraph + '[SEP]'

5. Tokenize and generate IDs

The question and paragraph are tokenized into subwords, combined, and converted into numerical IDs that the model can process.

tokens_question = tokenizer.tokenize(question)
tokens_paragraph = tokenizer.tokenize(paragraph)
combined_tokens = tokens_question + tokens_paragraph
token_ids = tokenizer.convert_tokens_to_ids(combined_tokens)

6. Generate segment IDs

Create a list of segment IDs to differentiate tokens from the question (0) and the paragraph (1).

segment_id = [0] * len(tokens_question)
segment_id += [1] * len(tokens_paragraph)

7. Prepare tensors

Convert token IDs and segment IDs into PyTorch tensors to prepare them for the model.

token_ids_tensor = torch.tensor([token_ids])
segment_id_tensor = torch.tensor([segment_id])

8. Use the model to get scores

Pass the tensors to the model to get start_logits (score for the start of the answer) and end_logits (score for the end of the answer).

objects = model(token_ids_tensor, token_type_ids = segment_id_tensor)
starting_scores = objects.start_logits
ending_scores = objects.end_logits

9. Get the starting and ending index of the answer

Find the indexes of the tokens with the highest scores, representing the start and end of the answer.

starting_index = torch.argmax(starting_scores)
ending_index = torch.argmax(ending_scores)

10. Display the answer

Using the starting and ending indexes, extract and display the answer from the tokens.

print("Question: ", Question)
print("Answer: ")
print(' '.join(combined_tokens[starting_index:ending_index+1]))

Output

When you run the code, the model processes the input and outputs:

Question: What is the immune system?
Answer: the immune system is a system of many biological structures and processes within an organism that protects against disease .

This detailed breakdown explains the working of the Transformer model for question answering using the bert-large-uncased pretrained model.

Code: Full implementation

Here is the complete implementation of the steps we discussed above.

Please note that the notebook cells have been preconfigured to display the outputs
for your convenience and to facilitate an understanding of the concepts covered. 
This hands-on approach will allow you to experiment with the memory techniques discussed, providing a more immersive learning experience.
Question answering using transformer model

In conclusion, the transformer model represents a prominent advancement in natural language processing because of its scalability, efficiency, and versatility. Developers can leverage this state-of-the-art deep neural architecture for NLP tasks and other areas of machine learning.

Frequently asked questions

Haven’t found what you were looking for? Contact Us


What is the application of question answering system?

A question-answering system extracts relevant answers from given context, used in chatbots, search engines, and virtual assistants.


Is ChatGPT a transformer model?

Yes, ChatGPT is based on the transformer architecture, specifically fine-tuned from OpenAI’s GPT model.


What is a transformer model in AI?

A transformer model is a deep learning architecture designed for NLP tasks, leveraging attention mechanisms to process and generate text efficiently.


Free Resources

Copyright ©2025 Educative, Inc. All rights reserved