Choose the Right Weights Iteratively

Derive a simplified expression for error differentiation using the sigmoid function to find the right weights.

We'll cover the following...

Differentiate the error
The role of sigmoid functions
Analyze the simplified expression

Differentiate the error

Choosing the right weights directly is too difficult. An alternative approach is to iteratively improve the weights by descending the error function and taking small steps. Each step is in the direction of the greatest downward slope from our current position.

This means that the error function didn’t need to sum all the output nodes in the first place. The reason is that the output of a node only depends on the connected links and hence their weights. This fact is sometimes glossed over, and sometimes the error function is simply stated without an explanation.

Here is our simpler expression:

\frac{\partial E}{\partial w_{jk}} = \frac{\partial}{\partial w_{jk}}(t_k - o_k)^2

Now, we will do a bit of calculus.

That $t_k$ part is a constant, so it doesn’t vary like $w_{jk}$ varies. This means $t_k$ isn’t a function of $w_{jk}$ . If we think about it, it would be really strange if the truth examples providing the target values changed depending on the weights. That leaves the $o_k$ part, which we know depends on $w_{jk}$ because the weights are used to feed the signal forward to become the outputs $o_k$ .

We’ll use the chain rule to break this differentiation task into more manageable pieces:

\frac{\partial E}{\partial w_{jk}} = \frac{\partial E}{\partial o_{k}}\cdot \frac{\partial o_k}{\partial w_{jk}}

Prologue

A Little Background

Let's Get Started!

Backward Propagation of Error

Adjusting the Link Weights

A Gentle Start with Python

Neural Network with Python

Testing Neural Network against MNIST Dataset

Some Suggested Improvements

Even More Fun!

Epilogue

Appendix: A Small Guide to Calculus

Choose the Right Weights Iteratively

Differentiate the error