Mastering Self-Supervised Algorithms for Learning without Labels/

...

Masked Siamese Networks: Masking Strategy and Encoder

Learn how to implement the masking strategy and encoder layer of Masked Siamese Networks (MSNs).

We'll cover the following...

Inputs and masking strategy
Encoder

Masked Siamese Networks (MSNs) is a self-supervised framework that leverages similarity maximization and masked image modeling concepts. As shown in the figure below, MSNs generate two augmented views of an image, where one is masked and the other is unchanged. The objective of MSNs is to learn similar network representations (a vision transformer) for both views.

MSNs don’t explicitly reconstruct the image pixels from the masked input. Instead, they incorporate this mask-denoising step in their feature representations itself by making representations of masked and unmasked image views similar.

Introduction to Self-Supervised Learning

Pretext Tasks

Similarity Maximization and Redundancy Reduction

Masked Image Modeling

Appendix

Masked Siamese Networks: Masking Strategy and Encoder

Inputs and masking strategy