Multilayer Perceptrons and Backpropagation
Explore the fundamentals of multilayer perceptrons and the backpropagation algorithm, focusing on how gradients are computed and used to train deep neural networks. Understand the mathematical principles behind weight updates, the role of activation functions like sigmoid and ReLU, and practical implementation details using TensorFlow 2. Gain insights into challenges such as vanishing gradients and the historical evolution of these concepts in deep learning.
We'll cover the following...
While large research funding for neural networks declined until the 1980s after the publication of Perceptrons, researchers still recognized that these models had value, particularly when assembled into multilayer networks, each composed of several perceptron units. Indeed, when the mathematical form of the output function (that is, the output of the model) was relaxed to take on many forms (such as a linear function or a sigmoid), these networks could solve both regression and classification problems, with theoretical results showing that three-layer networks could effectively approximate any
Renewed interest in neural networks came with the popularization of the backpropagation algorithm, which, while discovered in the 1960s, was not widely applied to neural networks until the 1980s, following several studies highlighting its usefulness for learning the weights in these
The insight of the backpropagation technique is that we can use the chain rule from calculus to efficiently compute the derivatives of each parameter of a network with respect to a loss function, and, combined with a learning rule, this provides a scalable way to train multilayer networks.
Let’s illustrate backpropagation with an example: consider a network like the one shown
Furthermore, the value
We also need a notion of when the network is performing well or badly at its task. A straightforward error function to use here is a squared loss:
where