Search⌘ K

Introduction to Softmax

Explore the softmax activation function and its use as the last layer in neural networks. Understand how softmax normalizes outputs into probabilities, which helps in interpreting model confidence across multiple classes. This lesson helps you grasp the design of neural networks including layers, activations, and biases with a focus on practical classification applications like MNIST.

We'll cover the following...

Softmax

Check out the activation functions in our neural network. So far, we took it for granted that both of those functions are sigmoids. However, most neural networks replace the last sigmoid, the one right before the output layer, with another function called the softmax.

Let’s see what the softmax looks like and why it is useful. Like the sigmoid, the softmax takes an array of numbers, that in this case are called the logits. And it returns an array with the same size as the input. Here is the formula of the softmax, in case we want to understand the math behind it:​

softmax(li)=eliel\text{softmax}(l_i)=\frac{e^{l_i}}{\sum{e^l}} ...