Transformer Architecture: Embedding Layers

Learn how transformer architectures utilize embedding layers to represent words and their positions within sequences. This lesson explains token embeddings and positional embeddings, the mathematical basis for positional encodings, and how these embeddings combine to enable transformers to understand word context and order.

We'll cover the following...

General approach for word embeddings
Embeddings in transformer models

Word embeddings provide a semantic-preserving representation of words based on the context in which words are used. In other words, if two words are used in the same context, they will have similar word vectors. For example, the words “cat” and “dog” will have similar representations, whereas “cat” and “volcano” will have vastly different representations.

Word vectors were initially introduced in the paper titled Efficient Estimation of Word Representations in Vector SpaceMikolov et al. (https://arxiv.org/pdf/1301.3781.pdf). It came in two variants: skip-gram and continuous bag-of-words. Embeddings work by first defining a large matrix of size $V \times E$ ...

1.Introduction to Natural Language Processing

2.Understanding TensorFlow 2

3.Word2vec: Learning Word Embeddings

4. Advanced Word Vector Algorithms

5.Sentence Classification with Convolutional Neural Networks

6.Recurrent Neural Networks

7.Understanding Long Short-Term Memory Networks

8.Applications of LSTM: Generating Text

9.Sequence-to-Sequence Learning: Neural Machine Translation

10.Transformers

Project

11.Image Captioning with Transformers

12.Final Remarks

13.Appendix: Mathematical Foundations and Advanced TensorFlow

Mock Interview

Transformer Architecture: Embedding Layers