Search⌘ K
AI Features

SimCLR Training Objective

Explore the SimCLR training objective by understanding how two augmented image views are processed through a neural network backbone and MLP projection head to produce feature embeddings. Learn how the contrastive loss uses similarity maximization between positive pairs and minimizes similarity among negative pairs by computing a similarity matrix. This lesson guides you step-by-step in implementing the SimCLR contrastive loss, preparing you to build self-supervised learning models for unlabeled data.

Now that we have two augmented versions of the input batch, T1(B)T_1(B) and T2(B)T_2(B), we'll look into other components of the SimCLR training pipeline.

Network architecture

As shown in the figure below, the two augmented versions of an image, XiX_i (i.e., T1(Xi)T_1(X_i) and T2(Xi)T_2(X_i)), are passed through the neural network f(.)f(.) to get the penultimate feature representations, hi1h_{i1}, and hi2h_{i2}, respectively. These feature representations are passed again through a multilayer perceptron (MLP) projection head g(.)g(.) to get the feature embeddings zi1z_{i1} and zi2z_{i2} ...