Search⌘ K

Distillation Techniques for Pre-training and Fine-tuning

Explore the two-stage distillation framework used in TinyBERT, which transfers knowledge from large pre-trained BERT models to smaller student models during both pre-training and fine-tuning. Understand how general and task-specific distillation work together to create efficient and specialized BERT variants.

In TinyBERT, we will use a two-stage learning framework as follows:

  • General distillation

  • Task-specific distillation

This two-stage learning framework enables the distillation in both the pre-training and fine-tuning stages. Let's take a look at how each of the stages works in detail.

General distillation

...