Multilayer Perceptrons and Backpropagation

Explore the fundamentals of multilayer perceptrons and the backpropagation algorithm, focusing on how gradients are computed and used to train deep neural networks. Understand the mathematical principles behind weight updates, the role of activation functions like sigmoid and ReLU, and practical implementation details using TensorFlow 2. Gain insights into challenges such as vanishing gradients and the historical evolution of these concepts in deep learning.

We'll cover the following...

Backpropagation in practice
The shortfalls of backpropagation

While large research funding for neural networks declined until the 1980s after the publication of Perceptrons, researchers still recognized that these models had value, particularly when assembled into multilayer networks, each composed of several perceptron units. Indeed, when the mathematical form of the output function (that is, the output of the model) was relaxed to take on many forms (such as a linear function or a sigmoid), these networks could solve both regression and classification problems, with theoretical results showing that three-layer networks could effectively approximate any outputCrevier, Daniel (1993), AI: The Tumultuous Search for Artificial Intelligence, New York, NY: BasicBooks.. However, none of this work addressed the practical limitations of computing the solutions to these models, with rules such as the perceptron learning algorithm described earlier proving a great limitation to their applied use.

Renewed interest in neural networks came with the popularization of the backpropagation algorithm, which, while discovered in the 1960s, was not widely applied to neural networks until the 1980s, following several studies highlighting its usefulness for learning the weights in these modelsCybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control Signal Systems 2, 303–314 (1989). https://doi.org/10.1007/ BF02551274. As you saw with the perceptron model, a learning rule to update weights is relatively easy to derive as long as there are no “hidden” layers. The input is transformed once by the perceptron to compute an output value, meaning the weights can be directly tuned to yield the desired output. When there are hidden layers between the input and output, the problem becomes more complex: when do we change the internal weights to compute the activations that feed into the final output? How do we modify them in relation to the input weights?

The insight of the backpropagation technique is that we can use the chain rule from calculus to efficiently compute the derivatives of each parameter of a network with respect to a loss function, and, combined with a learning rule, this provides a scalable way to train multilayer networks.

Let’s illustrate backpropagation with an example: consider a network like the one shown previouslyFigure_3_3. Assume that the output in the final layer is computed using a sigmoidal function, which yields a value between $0$ and $1$ :

1.Introduction to the Course

2.An Introduction to Generative AI

3.Building Blocks of Deep Neural Networks

4.Teaching Networks to Generate Digits

5.Painting Pictures with Neural Networks Using VAEs

Project

6.Image Generation with GANs

Project

7.Style Transfer with GANs

Assessment

8.Deepfakes with GANs

9.The Rise of Methods for Text Generation

10.NLP 2.0: Using Transformers to Generate Text

11.Composing Music with Generative Models

Project

12.Play Video Games with Generative AI: GAIL

13.Emerging Applications in Generative AI

Assessment

14.Conclusion

15.Appendix

Multilayer Perceptrons and Backpropagation