Data Augmentation
Explore how to generate augmented datasets by masking and replacing words in sentences using BERT-based predictions and word similarity. Learn to apply these data augmentation techniques to effectively fine-tune TinyBERT models for improved performance in NLP tasks.
To perform distillation at the fine-tuning step, we need more task-specific data points. That is, for task-specific distillation, we need more data points. So we use a data augmentation method to obtain the augmented dataset. We will fine-tune the general TinyBERT with this augmented dataset.
Steps for the data augmentation
First, we will explore the algorithm of the data augmentation method step by step, and then we will understand it more clearly with an example.
Suppose we have a sentence: 'Paris is a beautiful city'.
Step 1: Tokenizing the sentence
First, we tokenize the sentence using the BERT tokenizer and store the tokens in the list called
Step 2: Copy the tokens
We copy
Step 3: Data augmentation step
Now, for every element (word),
We check whether
is a single-piece word. If it is a single-piece word, then we mask with the [MASK] token. Next, we use the BERT-base model to predict the masked word. We predict the first most likely words and store them in a list called candidates. Say then we predict the 5 most likely words and store them in the candidates list. If
is not a single-piece word, then we will not mask it. Instead, we check for the ...