Search⌘ K
AI Features

Sequence to Sequence Models

Understand the fundamentals of sequence to sequence (Seq2Seq) models, their role in machine translation, and how encoder-decoder architectures process input and output sequences. Learn about the challenges of RNNs with long sequences and why attention mechanisms and transformers improve these tasks.

To understand attention, we need to discuss how it emerged from natural language applications. Machine translation, translating an input sentence from one language to another, is a great use case.

So how did we process text and sentences in natural language before?

Formulating text in terms of machine learning

Let’s start by formulating the problem in terms of machine learning.

Thinking in terms of input and output first will help us grasp this category of models.

The representation is quite intuitive: sentences can be regarded as sequences of words.

Before showing any crazy model, let’s take a comprehensive look. We represent the input and output sentences as sequences.

Seq2Seq

Ideally, a model has to understand the input sentence in one language. This is captured in the so-called “encoder”. It produces the intermediate representation, denoted as z, in the diagram.

We need to convert the meaning into another language, so let’s call this model decoder.

In fact, the terminology is not any different.

This category of approaches is called Sequence to Sequence (Seq2Seq), and it works pretty much like the diagram shown above.

The elements of the sequence ...