Distillation Techniques for Pre-training and Fine-tuning
Explore the two-stage distillation framework used in TinyBERT, which transfers knowledge from large pre-trained BERT models to smaller student models during both pre-training and fine-tuning. Understand how general and task-specific distillation work together to create efficient and specialized BERT variants.
We'll cover the following...
We'll cover the following...
In TinyBERT, we will use a two-stage learning framework as follows:
General distillation
Task-specific distillation
This two-stage learning framework enables the distillation in both the pre-training and fine-tuning stages. Let's take a look at how each of the stages works in detail.