Transformer Architecture: Embedding Layers
Learn how transformer architectures utilize embedding layers to represent words and their positions within sequences. This lesson explains token embeddings and positional embeddings, the mathematical basis for positional encodings, and how these embeddings combine to enable transformers to understand word context and order.
We'll cover the following...
Word embeddings provide a semantic-preserving representation of words based on the context in which words are used. In other words, if two words are used in the same context, they will have similar word vectors. For example, the words “cat” and “dog” will have similar representations, whereas “cat” and “volcano” will have vastly different representations.
Word vectors were initially introduced in the paper titled