# Understand Activation Functions

Learn the role of activation functions while building a deep neural network model.

## About this chapter

There’s nothing special about deep networks. They are like shallow neural networks, only with more layers. However, when people started experimenting with deep networks, they realized that **building deep networks may be easy, but training them is not.**

Backpropagation on deep networks comes with its own specific challenges such as **vanishing gradients** and **dead neurons**. Those challenges rarely come up in shallow neural networks.

Over the years, neural networks researchers developed a collection of strategies to tackle those challenges and tame deep neural networks:

- New activation functions to replace the sigmoid
- Multiple flavors of gradient descent
- More effective weight initializations
- Better regularization techniques to counter overfitting
- Other ideas that work, though they do not quite fit any of these categories

This chapter is a whirlwind tour through these techniques. We’ll spend most of our time discussing activation functions. Moreover, we’ll discuss *why the sigmoid does not pass muster in deep neural networks, and how to replace them*. Then we’ll conclude the chapter with a few choices and approaches from the other categories listed above.

Get hands-on with 1200+ tech skills courses.