Distillation: The BYOL Algorithm

Learn about self-supervised learning via distillation and get an overview of the BYOL algorithm.

Distillation as similarity maximization

As shown in the figure below, distillation, in general, refers to transferring knowledge from a fixed (usually large) model known as teacher fteacher(.)f^{\text{teacher}}(.) to a smaller one known as student fstudent(.)f^{\text{student}}(.).

Get hands-on with 1200+ tech skills courses.