Summary: Transformers

Review what we've learned in this chapter.

How transformers models work

In this chapter, we talked about transformer models. First, we looked at the transformer at a microscopic level to understand the inner workings of the model. We saw that transformers use self-attention, a powerful technique to attend to other inputs in the text sequences while processing one input. We also saw that transformers use positional embeddings to inform the model about the relative position of tokens in addition to token embeddings. We also discussed that transformers leverage residual connections (that is, shortcut connections) and layer normalization in order to improve model training.

Get hands-on with 1200+ tech skills courses.