Understanding Neural Machine Translation

Learn the workings of neural machine translation.

Now that we have an appreciation for how MT has evolved over time, let’s try to understand how state-of-the-art NMT works. First, we’ll take a look at the model architecture used by neural machine translators and then move on to understanding the actual training algorithm.

Intuition behind NMT systems

First, let’s understand the intuition underlying an NMT system’s design. Say we’re fluent English and German speakers and were asked to translate the following sentence into German:

I went home.

This sentence translates to the following:

Ich ging nach Hause.

Although it might not have taken more than a few seconds for a fluent person to translate this, there is a certain process that produces the translation. First, we read the English sentence, and then we create a thought or concept about what this sentence represents or implies in our mind. And finally, we translate the sentence into German. The same idea is used for building NMT systems (see figure below). The encoder reads the source sentence (that is, similar to reading the English sentence). Then, the encoder outputs a context vector (the context vector corresponds to the thought or concept we imagined after reading the sentence). Finally, the decoder takes in the context vectors and outputs the translation in German:

Get hands-on with 1200+ tech skills courses.