Mastering Self-Supervised Algorithms for Learning without Labels/

...

Introduction to Self-Supervised Learning

Learn about self-supervised learning and its mathematical framework and taxonomy.

We'll cover the following...

What is self-supervised learning?
Taxonomy of self-supervised learning
Self-supervised learning framework
- Pre-training step
- Transfer learning
  - Linear classifier
  - Fine-tuning

What is self-supervised learning?

Self-supervised learning methods are a class of machine learning algorithms that learn rich neural network representations without relying on labels. These algorithms leverage the supervisory signals or pseudo labels from the structure of the unlabeled data and predict any unobserved or hidden property of the input.

For example, in computer vision, one can rotate an image by a certain degree and ask the neural network to predict the rotation angle of the picture. In this example, we didn’t use human-annotated labels to train the neural network. Instead, we defined our pseudo labels (i.e., the angle of rotation of an image), which serve as supervisory signals. After these supervisory signals or pseudo labels are created, we can use our standard supervised losses (e.g., cross-entropy) to train the neural network.

One might confuse self-supervised learning with unsupervised learning (a more known terminology). Though the assumptions about the absence of training labels are identical in both frameworks, unsupervised learning needs to be better defined. It's often misleading as it refers to learning without supervision. Self-supervised learning, on the other hand, is not unsupervised since it uses supervisory signals from the data structure. This difference is illustrated in the figure below.

Press + to interact

Self-supervised learning framework

Self-supervised learning aims to learn a neural network $f=h \ \circ \ g$ (here, $g$ is the feature extraction backbone and $h$ is the final classification layer) on an unlabeled source dataset $D_{source} = \{ X_i \}_{i=1}^N$ ( $i$ used to index images, $N$ is the total number of images) such that its representations, $g(.)$ , can be transferred to a target downstream task with the help of small labeled target dataset $D_{target}=\{(X_i, Y_i)\}_{i=1}^M$ ( $M$ is the total number of labeled images). Here, $M < N$ .

The self-supervised learning framework consists of two steps: pre-training and transfer learning.

Pre-training step

The pre-training step involves training a neural network $f=h \ \circ \ g$ (here, $g$ is the feature extraction backbone and $h$ is the final classification layer) on an unlabeled source dataset $D_{source} = \{ X_i \}_{i=1}^N$ by minimizing a self-supervised learning loss $\mathcal{L}_{SSL}$ .

As discussed in the previous lesson, the self-supervised learning objective will help the neural network learn rich-semantic representations by extracting the supervisory signals from the structure of the data itself. Mathematically, this step can be written as:

Press + to interact

Transfer learning

Once the network is trained, its feature representations can be transferred on a downstream task using a small labeled target dataset $D_{target}=\{(X_i, Y_i)\}_{i=1}^M$ ( $i$ is used to index images, $M$ is the total number of labeled images). Two standard ways to achieve this are linear classifiers and fine-tuning.

Linear classifier

Keeping the feature extractor $g^*(.)$ fixed, we can learn a small linear classifier $c(.)$ to minimize the cross-entropy loss over the target dataset $D_{target}$ .

Press + to interact

Fine-tuning

Fine-tuning is when weights of a trained neural network are used as initialization and optimized further (only for a few epochs and using a small learning rate) on a target downstream task (usually having small labeled samples). This is unlike regular training where we train the neural network from scratch on a huge number of data points.

By using the weights of feature extractor $g^*(.)$ as a good initialization point, we optimize the whole network ( $g^*$ and $c$ ) to minimize the cross-entropy loss over the target dataset $D_{target}$ .

Press + to interact

Introduction to Self-Supervised Learning

Pretext Tasks

Similarity Maximization and Redundancy Reduction

Masked Image Modeling

Appendix

Introduction to Self-Supervised Learning

What is self-supervised learning?

Taxonomy of self-supervised learning

Self-supervised learning framework

Pre-training step

Transfer learning

Linear classifier

Fine-tuning