How to implement the softmax function in Python
Overview
The softmax function is a mathematical function that converts a vector of real values into a vector of probabilities that sum to 1. Each value in the original vector is converted to a number between 0 and 1.
The formula of the softmax function is shown below:
As shown above, the softmax function accepts a vector z of length K. For each value in z, the softmax function applies the standard exponential function to the value. It then divides it by the sum of the exponents of each value in z.
Example
Consider the following vector:
z = [5, 2, 8]
First, let’s calculate the exponential of each value in z.
= = 148.4
= = 7.4
= = 2981.0
Next, we can calculate the sum of the exponentials:
=
= 148.4 + 7.4 + 2981.0 = 3136.8
Finally, we can calculate the softmax equivalent for each value in z, as shown below:
() = = 0.0473
() = = 0.0024
() = = 0.9503
So, we end up with a vector of probabilities:
Softmax(z) = [0.0473, 0.0024, 0.9503]
Code
The code below shows how to implement the softmax function in Python:
import math# softmax functiondef softmax(z):# vector to hold exponential valuesexponents = []# vector to hold softmax probabilitiessoftmax_prob = []# sum of exponentialsexp_sum = 0# for each value in the input vectorfor value in z:# calculate the exponentexp_value = math.exp(value)# append to exponent vectorexponents.append(exp_value)# add to exponential sumexp_sum += exp_value# for each exponential valuefor value in exponents:# calculate softmax probabilityprobability = value / exp_sum# append to probability vectorsoftmax_prob.append(probability)return softmax_prob# define vectorz = [5, 2, 8]# find softmaxresult = softmax(z)print(result)
Explanation
In the code above:
- Line 1: We import the
mathlibrary. - Line 4: We define the
softmaxfunction that accepts a vector as a parameter. - Lines 7-13: We declare three variables to store the exponential of each value, the corresponding probability, and the sum of all exponentials, respectively.
- Lines 16-25: We use a
for-loopto iterate over each value in the given array. We first calculate its exponential for each value through themath.exp()function, and append the value toexponents. The sum of exponentials is also updated in each iteration of the loop. - Lines 28-34: We use another
for-loopto find the probability corresponding to each exponential value by dividing the value byexp_sum. Each probability is appended tosoftmax_prob. - Lines 39-43: We declare a vector
zcontaining three values and pass it to thesoftmaxfunction. The vector returned by the function is output accordingly.