# Activating Neural Network Potential with Advanced Functions

Explore the crucial role of activation functions in neural networks to address vanishing and exploding gradients.

Activation functions are one of the primary drivers of neural networks. An “activation” introduces the “nonlinear” properties to a network.

Note:A network withlinearactivation is equivalent to a simple regression model.

The nonlinearity of the activations makes a neural network capable of learning nonlinear patterns in complex problems.

But there are a variety of activations, like, `tanh`

`elu`

`relu`

Yes, if appropriately chosen, an activation can significantly improve a model. An appropriate activation doesn’t have **vanishing** and/or **exploding** gradient issues.

## Vanishing and exploding gradients

Deep learning networks are learned with backpropagation. Backpropagation methods are gradient-based. A gradient-based parameter learning can be generalized as:

$\theta_{n+1}← \theta_n−η∇_\theta$

where

- $n$ is a learning iteration.
- $η$ is a learning rate.
- $∇_\theta$ is the gradient of the loss $\mathcal{L(\theta)}$ with respect to the model parameters $θ$.

The equation shows that gradient-based learning iteratively estimates $\theta$. In each iteration, the parameter $\theta$ is moved “closer” to its optimal value $\theta^∗$.

However, whether the gradient will truly bring $\theta$ closer to $\theta^∗$ will depend on the gradient itself. This is visually demonstrated in the illustrations below. In these illustrations, the horizontal axis is the model parameter $\theta$, the vertical axis is the loss $\mathcal{L(\theta)}$, and $\theta^∗$ indicates the optimal parameter at the lowest point of loss.

Note:The gradient, $∇$, guides the parameter $\theta$ to its optimal value. An apt gradient is, therefore, critical for the parameter’s journey toward the optimal.

Get hands-on with 1200+ tech skills courses.