Search⌘ K
AI Features

Where Now?

Explore future opportunities in deep learning by understanding diverse machine learning paths such as computer vision, natural language processing, and image generation. Learn about CNNs, RNNs, GANs, and other advanced concepts, and gain insight into broader ML fields like reinforcement and semi-supervised learning to guide your continued growth after this course.

We are about to reach the end of this course, but you are up for more. There is no shortage of things to learn. In fact, even if we track the field closely, keeping up with the barrage of exciting new ideas and techniques across the many areas of deep learning is hard.

With so many possible paths to mastery, we might wonder which one to take. The next sections describe a few of those paths, including some topics we have not covered so far.

The path of vision

Our first option is to follow the path of vision that we discussed in the previous lessons: computer vision and CNNs. We picked image recognition as the storyline for this course because it makes for nice concrete examples, but even then, we only scratched the surface.

There is a lot more to do and learn in computer vision beyond recognizing images. One prominent subfield of computer vision these days is object detection. While image recognition answers questions like: “Does this picture represent a platypus?” object detection answers questions like: “Where are the platypuses in this picture?” As you can imagine, that’s a crucial technology for self-driving cars.

Computer vision is not just about static images but also about video. There are many fascinating use cases for computer vision applied to video, including pose estimation, which detects the position of a human figure, and motion estimation, which tracks the movement of objects.

To delve deeper into computer vision, we should learn more about CNNs. We may look up a technique called transfer learning, which allows us to reuse a pre-trained model on a different dataset. Transfer learning allows us to download a model that might have been trained on a large cluster of GPUs and complete the training on our home machine. That technique can be useful in all areas of supervised learning, but it’s most commonly used with CNNs.

The path of language

Another area of ML that’s in full bloom these days is natural language processing, such as speech or text. Just like computer vision has been upheaved by CNNs, natural language processing is the domain of recurrent neural networks or RNNs for short.

Language works like a sequence of information, where the meaning of a word depends on the words before and after it. Fully connected networks do not work well with this kind of data because they do not have memory. A training network takes a batch of independent examples, tweaks its weights to approximate them, and then forgets about them. With no memory of what’s come before, a fully connected network gets stumped in the middle of a sentence, staring at a word out of context, without hope of grokking its meaning.

By contrast, recurrent neural networks have a form of memory. While the information in a regular network only moves forward, the information in RNNs can loop back into the same layer or an earlier one, as shown in the diagram below:

Because of those loops, an RNN can process sequence-like data. It understands a piece of information based on the information that came before it—like our brain is doing right now as we read these sentences.

Loops make a neural network more complicated. In a sense, they also make it deeper. Think about it: if a layer feeds back into itself from one iteration to the other, we can imagine unrolling it into a sequence of regular layers, each representing one iteration of the original layer.

For that reason, even pretty shallow RNNs suffer from the same problems as deeper networks, like the vanishing gradient. In the late 1990s, researchers did find a reliable way to ameliorate those problems when they invented the long short-term memory architecture. That architecture is the basis of today’s surprisingly accurate text processing systems, such as Google Translate.

So, in general, we should focus more on CNNs if we want to explore computer vision, and RNNs if we are fascinated by natural language processing. Now let’s look at another option that’s a bit more far out. .

The path of image generation

Image generation is about synthesizing images, either by modifying existing ones or creating new ones from scratch. The most amazing invention in this field is generative adversarial networks, or GANs, invented by a young researcher called Ian Goodfellow in 2014. GANs are the technology behind those uncanny fake pictures and videos that you might have seen on the Internet.

Here is a concrete example to help us understand GANs. To experiment with this concept, let’s select a bunch of horse pictures from the CIFAR-10 dataset. Here are a few of those:

Imagine building a CNN to differentiate horse pictures from other pictures that do not contain a horse. Let’s call this network the discriminator, as shown in this diagram:

Now imagine a second, somewhat unusual neural network. This network takes a random sequence of bytes as input and passes it through its parameterized model to turn it into an image. It might use an architecture similar to a regular CNN, only tweaked to generate images as output instead of taking them as input, as shown in the next diagram:

This second network is called the generator, even though all it generates before it is trained is a random pixel jam.

Now comes the brilliant idea behind GANs. Connect the output of the generator to the input of the discriminator, and train the two networks in tandem, as shown below:

The trick of this architecture is that the loss functions of the two networks are carefully chosen to pit them against one another:

  • The discriminator receives a random mix of horse pictures and pictures from the generator. It gets a lower loss when it correctly tells apart the horses from the generated images.
  • On its part, the generator gets rewarded when the discriminator gets it wrong. Its loss is higher if the discriminator identifies its output as generated and lower if the discriminator confuses it for a horse.

Can we see where this is going? The whole system works like a competition between an expert that identifies fake images, and a forger that attempts to counterfeit them. As a result, the discriminator and the generator improve together as we train them.

Following are a few randomly chosen images from the generator, taken every 50 iterations of training:

In the beginning, the generator outputs meaningless noise. After 200 iterations, it already seems to output images that are mostly green at the bottom, mostly blue at the top, and contain a vague brownish shape. Those are common traits of horse pictures, which usually involve blue skies and grassy fields. At the same time, the discriminator is improving at recognizing synthetic pictures.

After a few hours and about one million iterations of training, the generator has become efficient:

Just to be clear, none of these pictures are a straight copy of a CIFAR-10 horse. Indeed, the generator has never ever seen a horse picture—it just learned what a horse looks like by learning how to cheat the discriminator. As a result, it forges these weirdly crooked but undisputedly horse-like creatures.

These images come from a humble laptop and a pair of poorly hacked neural networks. If we want to see the state of the art, the Internet has more generated images than we can handle. The site is known as “This a person does not exist,” for one, generates jaw-droppingly realistic human faces. GANs also power Google’s famous deep dream image generator.

That’s it about GANs. Before leaving the topic of image generation, we should at least mention style transfer—aanother brilliant idea that has become popular in the last few years. We have seen it before: it’s the technique that redraws photos in the style of pencil drawings, watercolor, or famous painters. Style transfer uses an algorithm similar to neural networks, but instead of optimizing the network’s weights, it works directly on an image’s pixels. Here’s the portrait as painted by Edvard Munch:

Once again, the Internet has more examples, tutorials, and style transfer web apps than we probably need. If we want to take the path of image generation, then we are well covered

The broad path

We have tried to keep this course a straight, narrow path through the vast field of ML. We focused on supervised learning, and specifically on supervised learning with neural networks. For further details, we might aim for broader knowledge, as opposed to We deeper knowledge. In that case, we should look at other machine learning flavors and techniques off the path of this course.

For one, the field of reinforcement learning deserves more attention than we could grant in this course. Its recent successes compete with those of supervised learning. One story that hit the front pages was about AlphaGo, the program that made a sensation by beating a human at the ancient game of Go, uses reinforcement learning.

Unsupervised learning is ML applied to unlabeled data, and it’s also a trending topic these days. Among other things, people are coming up with clever ways to use neural networks to compress or cluster data.

Another exciting branch of ML is called semi-supervised learning, and it uses a mixture of labeled and unlabeled data to train a machine learning system. It’s often easier to collect large amounts of unlabeled data than labeled data, and semi-supervised learning can be a brilliant way to use all those extra data.

However, even if we stick to supervised learning, we have plenty to look at, especially if we like computer science or statistics. While this course focused on neural networks, there are other algorithms worth learning, and some of them still work better than neural networks in specific circumstances. For a couple of examples, look at support vector machines and random forests.

The hands-on path

Last but not least, here is one final path into the world of machine learning, and the one that we could find most exciting as a developer. Instead of cherry-picking a subfield of ML upfront, keep our options open and join a competition on Kaggle a popular platform that organizes ML challenges.