Trusted answers to developer questions

Sami Muzzamil

Learning to choose the right hyperparameters is one of the best ways to extract the most from our machine learning or deep learning models. In this article, we’ll explore five different hyperparameters. These five include:

- No. of epochs
- No. of hidden layers
- Learning rate
- Loss function
- Activation function

The process of tuning hyperparameters is an integral part of deep learning. We must understand the significance of the tuning process before building any models. This will allow us to extract the maximum performance from our models and serve as leverage in building top-performing models.

Let’s discuss each hyperparameter individually before jumping into the practice.

We can easily tune epochs because it’s the easiest hyperparameter. We already know that if we train a system long enough, it becomes more accurate. However, if we train it further, we start to underperform and might even become counterproductive and decrease our accuracy.

We don’t need a hidden layer if our data is linear. We need to figure out how complex our data is and decide how many hidden layers we need. Adding more can improve it, but the increased complexity could lead to overfitting. It’s best to stick to one or two digits for the number of layers, since we need more.

To understand the trade-off of different learning rates, let’s go back to the basics
and visualize gradient descent. The following diagrams show a few steps of gradient descent
along a one-dimensional loss curve, with three different values of `lr.`

The red
cross marks the starting point, and the green cross marks the minimum:

When we set a significant value for `lr`

, gradient descent tries to minimize the loss with substantial steps. It’s often used for large, sparse datasets because even if the algorithm does not converge, it can still uncover patterns. The opposite case is *batch gradient descent*, where each algorithm step is small. Still, it executes them all at once: we train a network on many examples in one pass.

Using a smaller value for `lr`

is more efficient and often preferred. If we have a smaller dataset and want to find the minimum faster, it will yield better results.

The goal of a **loss function** is to evaluate the “goodness” of its model. There is no one-size-fits-all loss function. They are usually picked based on the machine learning problem we’re trying to solve, which features we’re using, and so on.

There are two broad categories depending on the learning task we’re dealing with — * regression losses* and

The neuron’s activation function returns a value between 0 and 1 as it determines if the neuron is relevant or should be ignored. The **activation function** decides how the neurons combine inputs to form the final output.

We use the placeholder **sigmoid activation function** for the output layer of a binary classification. The value of this node depends on whether its input value is more significant than 0.5, in which case it’ll return 1, or else it’ll return 0.

The **hyperbolic tangent activation function** is similar to the sigmoid function. It takes any real value as input and outputs values in the range of -1 to 1. Just like the sigmoid activation function, hyperbolic tangent activation has an S-shaped curve that ranges between “off” (x = 0) and “on” (x = 1).

**ReLU** is one of the most straightforward and efficient activation functions in deep learning. At a time, only a few neurons are activated, making the network sparse, efficient, and easy for computation.

ReLU neurons are not differentiable at 0. They tend to become inactive for all inputs. ReLU neurons can cause problems when learning at high rates; specifically, they can reduce the model’s capacity to learn.

We can use **Softmax** for multi-class classification to return the probability of each class, and the target class will have the highest probability.

It’s often used in the last layer of neural networks.

Let’s run the application given below and tune hyperparameters without coding for non-linearly separable data.

# A utility function that plots the training loss and validation loss from # a Keras history object. import streamlit as st import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap import seaborn as sns def plot(history): plt.clf() plt.plot(history.history['loss'], label='Training set', color='blue', linestyle='-') plt.plot(history.history['val_loss'], label='Validation set', color='green', linestyle='--') plt.xlabel("Epochs") plt.ylabel("Loss") plt.xlim(0, len(history.history['loss'])) plt.legend() plt.title("Training vs. Validation (loss)", fontsize=10) plt.show()

RELATED TAGS

hyperparameters

streamlit

CONTRIBUTOR

Sami Muzzamil

Copyright ©2022 Educative, Inc. All rights reserved

RELATED COURSES

View all Courses

Keep Exploring

Related Courses