The **tanh** activation function, also called the **hyperbolic tangent activation function**, is a mathematical function commonly used in artificial neural networks for their hidden layers. It transforms input values to produce output values between -1 and 1. It is expressed as the ratio of the difference between the exponential of the input value and the exponential of its negation to the sum of these exponentials.

Mathematically, the tanh activation function can be represented as:

Where:

*x*represents the input value.*e*is a mathematical constant approximately equal to**2.71828.**

In a neural network architecture, the tanh function is advantageous for the following reasons and use cases.

It is a smooth, differentiable function that incorporates nonlinearity into the network which means that the output of this function is not a simple linear function of its input. This allows the network to learn a more complex, nonlinear relationship between the input and the output data. Let's illustrate this using a code example.

import numpy as npimport matplotlib.pyplot as plt# Define input valuesx = np.linspace(-5, 5, 100)# Compute the output of the tanh function for the input valuesy = np.tanh(x)# Plot the output of the tanh functionplt.plot(x, y)plt.xlabel('Input')plt.ylabel('Output')plt.title('tanh function output')plt.show()

**Line 1:**We import the`numpy`

module and assign it the alias`np`

.**Line 2:**We import the`matplotlib.pyplot`

module and assign it the alias`plt`

.**Line 5:**We define an array of input values for the tanh function using the`np.linspace()`

function, which generates`100`

evenly spaced points between`-5`

and`5`

inclusive.**Line 8:**We compute the output of the tanh function for each input value in the`x`

array using the`np.tanh()`

function from`numpy`

, and assign it to the variable`y`

.**Lines 11–15:**We generate a plot of the tanh function using the`plt.plot()`

function, with`x`

and`y`

as the input and output values respectively. We set the x-axis label to`Input`

, the y-axis label to`Output`

, the title of the plot to`tanh function output`

. Finally, we display the plot using the`plt.show()`

function.

The tanh function is symmetric around the origin, meaning that it outputs negative values for negative input values, and positive values for positive input values. Let's illustrate this using a code example.

import numpy as npimport matplotlib.pyplot as plt# Define the tanh functiondef tanh(x):return np.tanh(x)# Create a range of input values from -5 to 5x = np.arange(-5, 5, 0.1)# Compute the output of the tanh function for each input valuey = tanh(x)# Plot the output of the tanh functionplt.plot(x, y)plt.xlabel('Input')plt.ylabel('Output')plt.title('tanh Function')plt.show()

**Line 1:**We import the`numpy`

module and assign it the alias`np`

.**Line 2:**We import the`matplotlib.pyplot`

module and assign it the alias`plt`

.**Lines 4–6:**We define the`tanh`

function, which takes an input`x`

and returns the hyperbolic tangent of`x`

, computed using the`np.tanh()`

function from`numpy`

.**Line 9:**We create an array of input values for the`tanh`

function using the`np.arange()`

function, ranging from`-5`

to`5`

with a step size of`0.1`

.**Line 12:**We compute the output of the tanh function for each input value in the`x`

array by calling the`tanh()`

function defined earlier, and assign it to the variable`y`

.**Lines 15–19:**We generate a plot of the`tanh()`

function using the`plt.plot()`

function, with`x`

and`y`

as the input and output values respectively. We set the x-axis label to`Input`

, the y-axis label to`Output`

, and the title of the plot to`tanh function`

. Finally, we display the plot using the`plt.show()`

function.

The limitation of the tanh function is that it suffers from the **vanishing gradient problem**. This simply means that as the input function becomes very small or very large depending on the case, the gradient of the function approaches zero, thus making it difficult for the network to update the weights of the earlier layers to learn from the input data. This is usually a big problem in deep neural networks having many layers, since the gradients can become extremely small by the time they reach the earlier layers. This leads to slow convergence and poor performance. Let's illustrate this using a code example.

import numpy as npimport matplotlib.pyplot as pltdef tanh(x):return np.tanh(x)x = np.linspace(-5, 5, 100)# Compute the gradients of the tanh functiontanh_grad = 1 - np.tanh(x)**2# Plot the gradients of the tanh functionplt.plot(x, tanh_grad, label='tanh')plt.xlabel('Input')plt.ylabel('Gradient')plt.title('Gradients of the tanh activation function')plt.legend()plt.show()

As we can see from the plot, the gradient of the tanh function approaches zero as the input becomes very large or very small.

**Lines 1–2:**We import the`numpy`

module with the alias`np`

and the`matplotlib.pyplot`

module with the alias`plt`

.**Lines 5–6:**We define the`tanh()`

function that takes in an array of values`x`

and returns the hyperbolic tangent of that array using the`np.tanh()`

function.**Line 8:**We define the input values using the`np.linspace()`

function, which returns an array of evenly spaced numbers over a specified interval.**Line 11:**We compute the gradients of the tanh function using the formula`1 - np.tanh(x)**2`

and assign the output to`tanh_grad`

.**Lines 14–19:**We plot the gradients of the tanh function using the`plt.plot()`

function. We set the input values`x`

as the x-axis values and the`tanh_grad`

as the y-axis values. We label the plot as`tanh`

. We set the x-axis label to`Input`

, the y-axis label to`Gradient`

, and the title to`Gradients of the tanh activation function`

. Finally, we add a legend to the plot using the`plt.legend()`

function and display the plot using the`plt.show()`

function.

To mitigate the issue of the vanishing gradient problem, other learning activation functions such as **rectified linear unit (ReLU)**, **Leaky ReLU, exponential linear unit (ELU)**, **Maxout**, and other variants of ReLU have shown to work well in neural network architectures and can help to lessen the problem of vanishing gradients that can occur when using the tanh activation function.

In this answer, we looked at the tanh activation function, a very commonly used function in a neural network architecture. We looked at some of the reasons why the function is usually used: Nonlinear relationship between the input and output data and its symmetry around the origin. The limitation of the tanh function is that it is affected by the vanishing gradient problem which can be solved using other learning activation functions.

TRENDING TOPICS