Search⌘ K

The SimSiam Algorithm

Explore the SimSiam algorithm to understand how it achieves effective self-supervised learning by using a stop-gradient operation to create asymmetry between student and teacher networks. Learn about its architecture, training process, and how it maximizes similarity between augmented data views without requiring a momentum encoder.

SimSiam vs. BYOL

Like BYOL, Simple Siamese Representation Learning (SimSiam) is another distillation-based self-supervised algorithm that works without a momentum encoder. The main idea of SimSiam is to use a stop-grad operation to stop the flow of gradients through the teacher encoder. This ensures learning asymmetry as only the student branch (its encoder and predictor) gets updated during training.

The Siamese networks have been shown to develop better representations with this very simple idea. The figure below illustrates the concept of SimSiam.

Network architecture

In SimSiam, the student and teacher networks share the same backbone architecture f=ghf = g \circ h (here, h(.)h(.) is the feature extractor and g(.)g(.) ...