Introduction to Deep Learning & Neural Networks/

...

Multi-Head Self-Attention

Explore how multi-head attention expands upon self-attention.

We'll cover the following...

The idea of self-attention can be expanded to multi-head attention. In essence, we run through the attention mechanism several times.

Each time, we map the independent set of Key, Query, Value matrices into different lower-dimensional spaces and compute the attention there. The individual output is called a “head”. The mapping is achieved by multiplying each matrix with a separate weight matrix, which is denoted as ${W}_{i}^{K} , {W}_{i}^{Q} \in R^{d_{model} \times d_{k} }$ and ${W}_{i}^{V} \in R^{d_{model} \times d_{k}}$ ...

Learn Deep Learning

Neural Networks

Training Neural Networks

Convolutional Neural Networks

Recurrent Neural Networks

Autoencoders

Generative Adversarial Networks

Attention and Transformers

Graph Neural Networks

Conclusion

Final Quiz

Multi-Head Self-Attention