Understand Activation Functions

Learn the role of activation functions while building a deep neural network model.

About this chapter

There’s nothing special about deep networks. They are like shallow neural networks, only with more layers. However, when people started experimenting with deep networks, they realized that building deep networks may be easy, but training them is not.

Backpropagation on deep networks comes with its own specific challenges such as vanishing gradients and dead neurons. Those challenges rarely come up in shallow neural networks.

Over the years, neural networks researchers developed a collection of strategies to tackle those challenges and tame deep neural networks:

  • New activation functions to replace the sigmoid
  • Multiple flavors of gradient descent
  • More effective weight initializations
  • Better regularization techniques to counter overfitting
  • Other ideas that work, though they do not quite fit any of these categories

This chapter is a whirlwind tour through these techniques. We’ll spend most of our time discussing activation functions. Moreover, we’ll discuss why the sigmoid does not pass muster in deep neural networks, and how to replace them. Then we’ll conclude the chapter with a few choices and approaches from the other categories listed above.

Get hands-on with 1200+ tech skills courses.