Build a Transformer Encoder

It’s finally time to apply all we have learned about Transformers. The best way to do it is to build a Transformer Encoder from scratch. We will start by developing all the subcomponents and in the end, we will combine them to form the encoder. Let’s start with something simple.

Disclaimer: Pytorch has its own built-in Transformer and attention modules. However, we believe that you can get a solid understanding only if you develop them yourself.

Linear layers

A good first step is to build the linear subcomponent. A 2-layered feedforward network followed by dropout is good enough. So here is what the forward pass should look like:

  1. Linear Layer
  2. RELU as an activation function
  3. Dropout
  4. 2nd Linear layer

You can implement this yourself. Jump to the code below, and finish the FeedForward module. Note that this is not an exercise. It is only intended to solidify your understanding by revisiting how to build simple Pytorch modules.

Get hands-on with 1200+ tech skills courses.