Search⌘ K
AI Features

Choose the Right Weights Iteratively

Explore how to iteratively adjust neural network weights by applying gradient descent on the error function. Understand the calculus behind updating weights, including the use of the sigmoid activation function and the chain rule. This lesson helps you grasp the fundamental method to train and optimize neural networks effectively.

Differentiate the error

Choosing the right weights directly is too difficult. An alternative approach is to iteratively improve the weights by descending the error function and taking small steps. Each step is in the direction of the greatest downward slope from our current position.

This means that the error function didn’t need to sum all the output nodes in the first place. The reason is that the output of a node only depends on the connected links and hence their weights. This fact is sometimes glossed over, and sometimes the error function is simply stated without an explanation.

Here is our simpler expression:

Ewjk=wjk(tkok)2 \frac{\partial E}{\partial w_{jk}} = \frac{\partial}{\partial w_{jk}}(t_k - o_k)^2

Now, we will do a bit of calculus.

That tkt_k part is a constant, so it doesn’t vary like wjkw_{jk} varies. This means tkt_k isn’t a function of wjkw_{jk}. If we think about it, it would be really strange if the truth examples providing the target values changed depending on the weights. That leaves the oko_k part, which we know depends on wjkw_{jk} because the weights are used to feed the signal forward to become the outputs oko_k.

We’ll use the chain rule to break this differentiation task into more manageable pieces:

Ewjk=Eokokwjk \frac{\partial E}{\partial w_{jk}} = \frac{\partial E}{\partial o_{k}}\cdot \frac{\partial o_k}{\partial w_{jk}} ...