Update Weights

Learn how to reduce errors by updating the weights.

We'll cover the following

Error-controlled weights update

We have not yet discussed the central question of updating the link weights in a neural network. We’ve been working toward this point, and we’re almost there. We have just one more key idea to cover before we unlock this secret.

So far, we’ve propagated the errors back to each layer of the network. Why did we do this? Because the error is used to guide how we adjust the link weights to improve the overall answer given by the neural network. This is basically what we were doing with the linear classifier at the start of this course.

But these nodes aren’t simple linear classifiers. These slightly more sophisticated nodes sum the weighted signals into the node and apply the sigmoid threshold function. So, how do we actually update the weights for links that connect these more sophisticated nodes? Why can’t we use fancy algebra to work out what the weights should be?

We can’t calculate the weights directly because the math is so complex. There are just too many combinations of weights and too many functions of functions being combined when we feed the signal forward through the network. Think about just a small neural network with three layers and three neurons in each layer, like our previous example. How would we tweak the weight for a link between the first input node and the second hidden node so that the third output node increased its output by, say, $0.5$? Even if we did get lucky, the effect could be ruined by tweaking another weight to improve a different output node. We can see that doing all that math isn’t trivial at all.

To see how nontrivial, just look at the following expression showing an output node’s output as a function of the inputs and the link weights for a simple three-layer neural network with three nodes in each layer. The input at node $i$ is $x_i$, and the weights for links connecting input node $i$ to hidden node $j$ is $w_{i,j}$. Similarly, the output of hidden node $j$ is $x_j$, and the weights for links connecting hidden node $j$ to output node $k$ is $w_{j,k}$. That funny symbol $\sum_{a}^b$ means sum the subsequent expression for all values between $a$ and $b$.

Get hands-on with 1200+ tech skills courses.