Introduction: Transformers

Get an overview of the transformers model.

We'll cover the following

Chapter overview

Transformer models changed the playing field for most machine learning problems that involve sequential data. They have advanced the state of the art by a significant margin compared to the previous leaders, RNN-based models. One of the primary reasons that the transformer model is so performant is that it has access to the whole sequence of items (e.g., sequence of tokens), as opposed to RNN-based models, which look at one item at a time. The term “transformer” has come up several times in our conversations as a method that has outperformed other sequential models, such as LSTMs and GRUs. Now, we’ll learn more about transformer models.

Chapter overview

We’ll first learn about the transformer model in detail. Then, we’ll discuss the details of a specific model from the transformer family known as Bidirectional Encoder Representations from Transformers (BERT). We’ll see how we can use this model to complete a question-answering task.

Specifically, we’ll cover the following main topics:

Transformer architecture
Understanding BERT
Using BERT to answer questions

Get hands-on with 1200+ tech skills courses.

Introduction to Natural Language Processing

Understanding TensorFlow 2

Word2vec: Learning Word Embeddings

Advanced Word Vector Algorithms

Sentence Classification with Convolutional Neural Networks

Recurrent Neural Networks

Understanding Long Short-Term Memory Networks

Applications of LSTM: Generating Text

Sequence-to-Sequence Learning: Neural Machine Translation

Transformers

Image Captioning with Transformers

Final Remarks

Appendix: Mathematical Foundations and Advanced TensorFlow

Introduction: Transformers

Chapter overview