Introduction to Deep Learning & Neural Networks/

...

The Transformer's Encoder

Formulate the encoder of a transformer by combining all the building blocks.

We'll cover the following...

- Add linear layers to form the encoder
- Recap: The Transformer encoder

Even though this could be a stand-alone building block, the creators of the transformer add another stack of two linear layers with an activation in-between and renormalize it along with another skip connection.

Add linear layers to form the encoder

Suppose $x$ is the output of the multi-head self-attention. What we will depict as linear in the diagram will look something like this:

import torch
import torch.nn as nn

dim = 512
dim_linear_block = 1024 ## usually a multiple of dim
dropout = 0.1

norm = nn.LayerNorm(dim)
linear = nn.Sequential(
            nn.Linear(dim, dim_linear_block),
            nn.ReLU()
            nn.Dropout(dropout),

...

Learn Deep Learning

Neural Networks

Training Neural Networks

Convolutional Neural Networks

Recurrent Neural Networks

Autoencoders

Generative Adversarial Networks

Attention and Transformers

Graph Neural Networks

Conclusion

Final Quiz

The Transformer's Encoder

Add linear layers to form the encoder