What is ELU?

Exponential Linear Unit (ELU) is an activation function that improves a model's accuracy and reduces the training time. It is mathematically represented as follows:

In the formula above, $\alpha$ is usually set to 1.0. It determines the saturation level of the negative inputs.

Need for ELU

The activation function ReLU became famous when it solved the problem of the vanishing gradient (namely, the gradients of activation functions, like sigmoids, become very small making it difficult to train bigger models).

At the same time, however, ReLU created a problem for itself, called the dying ReLU problem. This problem occurs when ReLU outputs 0 on any input.

In contrast, ELU (like batch normalization) has negative values that help bring the mean value closer to 0. This improves the training speed.

Even though Parametric ReLU and Leaky ReLU also have negative values, they are not smooth functions. ELU is a smooth function for negative values, making it more noise-robust.

Code

Here, we implement ELU in Python:

What is ELU?

Need for ELU

Code

Code explanation