Data Augmentation

Learn about using the data augmentation method to obtain the augmented dataset.

To perform distillation at the fine-tuning step, we need more task-specific data points. That is, for task-specific distillation, we need more data points. So we use a data augmentation method to obtain the augmented dataset. We will fine-tune the general TinyBERT with this augmented dataset.

Steps for the data augmentation

First, we will explore the algorithm of the data augmentation method step by step, and then we will understand it more clearly with an example.

Suppose we have a sentence: 'Paris is a beautiful city'.

Step 1: Tokenizing the sentence

First, we tokenize the sentence using the BERT tokenizer and store the tokens in the list called XX as shown here:

Get hands-on with 1200+ tech skills courses.