Masked Autoencoders: Decoder and Loss Function

Learn how to implement the decoder layer and loss function of Masked Autoencoders (MAE).

We'll cover the following

Decoder

The input to the MAE decoder consists of all the tokens—that is:

  • Encoded visible patches, and

  • Mask tokens.

Similar to SimMIM, a shared masked token vector is used as a substitute for the missing or masked patches in the input. The full set of tokens is passed through a transformer network containing self-attention layers.

The goal of the MAE decoder is to perform the image reconstruction task. Note that the MAE decoder is only used during the pre-training step (i.e., only the encoder is used in the transfer learning step). The design of the decoder can be flexible. You can opt for shallow decoders to incur minimum training overhead.

Get hands-on with 1200+ tech skills courses.