Getting Started

Get to know the motivation behind learning about transformers and taking this course.

Why should we learn about transformers?

Transformer models are a game-changer for Natural Language Understanding (NLU), a subset of Natural Language Processing (NLP), which has become one of the pillars of artificial intelligence in a global digital economy.

Transformer models mark the beginning of a new era in artificial intelligence. Language understanding has become a cornerstone of language modeling, chatbots, personal assistants, question answering, text summarizing, speech-to-text, sentiment analysis, machine translation, and more.

We are witnessing the expansion of social networks vs. physical encounters, e-commerce vs. physical shopping, digital vs. physical newspapers, streaming vs. physical theaters, remote doctor consultations vs. physical visits, remote work vs. on-site tasks, and similar trends in hundreds of more domains. It would be incredibly difficult for society to use web browsers, streaming services, and any digital activity involving language without Artificial Intelligence (AI) language understanding.

The paradigm shift in our societies from physical to massive digital information forced artificial intelligence into a new era. Artificial intelligence has evolved to billion-parameter models to face the challenge of trillion-word datasets.

Transformer architecture is both revolutionary and disruptive. It breaks with the past, leaving the dominance of Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) behind. BERT and GPT models abandoned recurrent network layers and replaced them with self-attention. Transformer models outperform RNNs and CNNs. The 2020s are experiencing a major change in AI.

Transformer encoders and decoders contain attention headsThese refer to the individual parallel pathways through which the model processes information, allowing the model to focus on different parts of the input data, learning different patterns and relationships simultaneously. that train separately, parallelizing cutting-edge hardware. Attention heads can run on separate GPUs opening the door to billion-parameter models and soon-to-come trillion-parameter models. OpenAI trained a 175 billion parameter GPT-3 transformer model on a supercomputer with 10,000 GPUs and 285,000 CPU cores.

The increasing amount of data requires training AI models at scale. As such, transformers pave the way to a new era of parameter-driven AI. Learning to understand how hundreds of millions of words fit together in sentences requires a tremendous number of parameters.

Transformer models such as Google BERT and OpenAI GPT-3 have taken emergence to another level. Transformers can perform hundreds of NLP tasks they were not trained for.

Transformers can also learn image classification and reconstruction by embedding images as sequences of words. This book will introduce you to cutting-edge computer vision transformers such as Vision Transformers (ViT), CLIP, and DALL·E.

Intended audience

This course is not an introduction to Python programming or machine learning concepts. Instead, it focuses on deep learning for machine translations, speech-to-text, text-to-speech, language modeling, question answering, and many more NLP domains.

The learners who can benefit the most from this course are:

  • Deep learning and NLP practitioners who are familiar with Python programming.

  • Data analysts and data scientists who want an introduction to AI language understanding to process the increasing amounts of language-driven functions.

What is the course about?

The following broad topics and the content of these broad topics are outlined in the table below:

Introduction to transformer architecture

Application of transformers for NLU and Generation (NLUG)

Advanced Language Understanding techniques

The basic definition of transformers, alongside its ecosystem and properties of foundation models

The magic of transformer models with downstream NLP tasks via machine translation

The ways in which transformers are able to understand a text, or a story, and display reasoning skills

The different components present in a standard transformer model

The aspects of OpenAI's GPT-2 and GPT-3 transformers

The methods by which transformers have improved sentiment analysis, helping us to understand the different perspectives of content

The techniques employed to fine-tune and pretrain various transformer models

The concepts and architecture of the T5 transformer model and how the quality of data encoding is improved

The hidden details behind the working of transformers, alongside more advanced versions of transformer models

The steps needed to pretrain RoBERTa models

The ways in which transformers are able to understand the context of text

The different vision transformers and how they are tested on computer vision tasks, such as generating computer images

The structure of this course can be seen in the diagram below:

Now that we have insight into what’s coming in the course, let’s begin our learning journey!