What is cross-entropy loss in PyTorch?
In machine learning classification issues, cross-entropy loss is a frequently employed loss function. The difference between the projected probability distribution and the actual probability distribution of the target classes is measured by this metric.
The cross-entropy loss penalizes the model more when it is more confident in the incorrect class, which makes intuitive sense. The cross-entropy loss will be substantial – for instance, if the model forecasts a low probability for the right class but a high probability for the incorrect class.
For example, we can define cross-entropy loss like this:
loss(x, y) = - sum(y * log(x))
In this simple example, we have x as the predicted probability distribution, y is the true probability distribution (represented as a one-hot encoded vector), log is the natural logarithm, and sum is taken over all classes.
Cross-entropy loss in PyTorch
Cross-entropy loss, also known as log loss or softmax loss, is a commonly used loss function in PyTorch for training classification models. It measures the difference between the predicted class probabilities and the true class labels.
We first import the required libraries and create the input tensors:
import torchimport torch.nn.functional as TF# Define some sample input data and labelsinput_data = torch.randn(4, 10) # 4 samples, 10 classeslabels = torch.LongTensor([2, 5, 1, 9]) # target class indices
In PyTorch, the cross-entropy loss is implemented as the
nn.CrossEntropyLossclass. This class combines thenn.LogSoftmaxandnn.NLLLossfunctions to compute the loss in a numerically stable way. Thenn.LogSoftmaxandnn.NLLLossfunctions are building blocks used to implement the cross-entropy loss function in PyTorch.
The PyTorch cross-entropy loss can be defined as:
loss_fn = nn.CrossEntropyLoss()loss = loss_fn(outputs, labels)
Here, outputs is a tensor of predicted class probabilities of the batch_size, num_classes shape and labels is a tensor of true class labels of the batch_size shape.
The
nn.CrossEntropyLossclass applies a softmax function to theoutputstensor to obtain the predicted class probabilities. After that, it computes the negative log-likelihood loss between the predicted probabilities and the true labels.
Example
Let's implement all that we have learned:
import torchimport torch.nn.functional as TF# Define some sample input data and labelsinput_data = torch.randn(4, 10) # 4 samples, 10 classeslabels = torch.LongTensor([2, 5, 1, 9]) # target class indices# Compute the cross entropy lossloss = TF.cross_entropy(input_data, labels)# Print the computed lossprint(f"Cross entropy loss: {loss.item()}")# Compute the softmax probabilities manuallysoftmax_probs = TF.softmax(input_data, dim=1)# Print the computed softmax probabilitiesprint(f"Softmax probabilities:\n{softmax_probs}")# Compute the cross entropy loss manuallymanual_loss = torch.mean(-torch.log(softmax_probs.gather(1, labels.view(-1,1)) ))# Print the manually computed lossprint(f"Manually computed loss: {manual_loss.item()}")
Explanation
Line 1: Firstly, import
torchlibrary.Line 2: We also import
torch.nn.functionalwith an aliasTF.Line 5: We define some sample input data and labels with the input data having 4 samples and 10 classes.
Line 6: We create a tensor called
labelsusing the PyTorch library. The tensor is of typeLongTensor, which means that it contains integer values of 64-bit precision.Line 9: The
TF.cross_entropy()function takes two arguments:input_dataandlabels. Theinput_dataargument is the predicted output of the model, which could be the output of the final layer before applying a softmax activation function. Thelabelsargument is the true label for the corresponding input data.Line 12: We print the computed loss.
Line 15: We compute the softmax probabilities manually passing the
input_dataanddim=1which means that the function will apply the softmax function along the second dimension of theinput_datatensor.Line 18: We also print the computed softmax probabilities.
Line 21: We compute the cross-entropy loss manually by taking the negative log of the softmax probabilities for the target class indices, averaging over all samples, and negating the result.
Line 24: Finally, we print the manually computed loss.
Conclusion
To summarize, cross-entropy loss is a popular loss function in deep learning and is very effective for classification tasks. While cross-entropy loss is a strong and useful tool for deep learning model training, it's crucial to remember that it is only one of many possible loss functions and might not be the ideal option for all tasks or datasets. Therefore, to identify the best settings for our unique use case, it is always a good idea to experiment with alternative loss functions and hyper-parameters.