Search⌘ K
AI Features

BYOL Training

Explore BYOL training by understanding the student and teacher network architectures, including the use of prediction heads for asymmetry. Learn how to compute BYOL loss through mean squared error of normalized predictions and perform training by updating student weights while maintaining teacher weights as exponential moving averages. This lesson equips you to implement and train BYOL effectively for similarity maximization in self-supervised learning.

Student and teacher architectures

The student and teacher network in BYOL follows the same backbone architecture. However, the student network uses an additional MLP prediction head, p(.)p(.), to ensure asymmetry in the overall student-teacher architecture. In other words, in fteacher=ghf^{\text{teacher}} = g \circ h and fstudent=pghf^{\text{student}} = p \circ g \circ h ...