Training a neural network using PyTorch

A neural network is an AI methodology that empowers computers and devices to handle data in a way that draws inspiration from the human brain. It consists of interconnected nodes or neurons arranged in layers that process and transmit information, much like the configuration of the human brain. Neural networks are designed to recognize complex patterns and relationships in data. Through this layered structure, they create an adaptive system that enables computers to learn from mistakes and improve their performance gradually.

Neurons

Think of a neuron as a tiny decision-maker. It takes input, processes it, and produces an output. In a neural network, we have artificial neurons, as seen in the above animation.

Layers

A neural network is organized into layers. The input layer receives data, the hidden layers process it, and the output layer provides the final result. Each layer has many neurons.

Weights and bias

Neurons have parameters called weights and bias. Weights adjust the importance of each input, and bias helps shift the activation function. These are crucial for the network to learn effectively.

Building a neural network

The first step of building a neural network using PyTorch is to import the torch library, as shown below:

Then, we define a class that represents our neural network— SimpleNN in our case. This class acts as a blueprint for the network we want to create. Next, we set up the initial configuration of our neural network by specifying the number of input features (input_size), hidden neurons (hidden_size), and output neurons (output_size). This is done in the __init__ method.

Inside the SimpleNN class, we create the building blocks of our neural network, i.e., the input layer, the activation function (ReLU), and the output layer.

Imagine our neural network as a series of connected parts.

self.fc1 represents the first part, a linear layer that takes the input data and transforms it using weights and biases to produce some intermediate values.
self.relu represents the activation function. It is a filter that removes the negative values and keeps the positive ones.
self.fc2 is the second linear layer that takes the filtered data from the activation function and transforms it to produce the final output.

Finally, we need to specify how data flows through the network. This is done by defining the forward pass in the neural network class, as shown below:

We connect the different parts. We first pass the x input through self.fc1, then through the activation function, self.relu, and finally through self.fc2. Each step transforms the data. We return the final result, which is the output of our network, after passing the input through all these layers.

Training a neural network

Training a neural network involves setting it up with the right structure, defining how to measure its mistakes (loss), adjusting its parameters to minimize those mistakes (optimization), and iteratively improving its predictions through backpropagation. PyTorch provides a powerful and accessible framework to accomplish these steps and build intelligent systems.

import torch
import torch.nn as nn
# Define the neural network architecture
class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)  # Fully connected layer 1
        self.relu = nn.ReLU()  # ReLU activation function
        self.fc2 = nn.Linear(hidden_size, output_size)  # Fully connected layer 2
    def forward(self, x):
        out = self.fc1(x)  # Apply the first fully connected layer
        out = self.relu(out)  # Apply the ReLU activation function
        out = self.fc2(out)  # Apply the second fully connected layer
        return out
# Define network hyperparameters
input_size = 64  # Number of input features
hidden_size = 128  # Number of neurons in the hidden layer
output_size = 10  # Number of output classes
# Input data
input_data = torch.rand(32, input_size)  # 32 is the batch size
target = torch.empty(32, dtype=torch.long).random_(output_size)
# Create an instance of the SimpleNN model
model = SimpleNN(input_size, hidden_size, output_size)
# Define the loss function (Cross Entropy Loss) and optimizer (Adam)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# Example training loop
num_epochs = 10  # Define the number of training epochs
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(input_data)
    loss = criterion(outputs, target)  # Compute the loss
    # Backward pass and optimization
    optimizer.zero_grad()  # Clear gradients
    loss.backward()  # Backpropagate to compute gradients
    optimizer.step()  # Update the model parameters
    # Print the loss for each epoch
    print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item()}')

Code explanation

Lines 28–32: We create an instance of the SimpleNN model with the specified input, hidden, and output sizes. Then, we define the loss function, CrossEntropyLoss(), to calculate the loss between the actual and the predicted label and the Adam optimizer to minimize the loss.
Lines 35–39: We set up a training loop to train the neural network for a specified number of epochs, num_epochs. We pass the input data inside the loop through the model to obtain predictions. The loss is calculated by comparing the model's predictions with the target values.
Line 42: We clear the gradients of the model’s parameters that are stored in the optimizer. The optimizer keeps track of the gradients for each parameter and calls the zero_grad() function to set these gradients to zero for the current iteration.

Note: During the backward pass, if we don’t clear these gradients before the next backward pass, the new gradients are added to the existing ones. This can lead to incorrect gradient information and make the optimization process ineffective or unstable.

Line 43: We compute the gradients of the loss with respect to the model’s parameters using backpropagation. These gradients are accumulated in the model’s parameters, which are then used for the next step.
Line 44: We update the model’s weights in the direction that minimizes the loss.
Line 47: We print the loss for each epoch to monitor the training progress.

Unlock your potential: Neural network series, all in one place!

To continue your exploration of Neural network, check out our series of Answers below:

What are artificial neural networks?
Learn how artificial neural networks (ANNs), inspired by the human brain, perform tasks like classification and prediction through interconnected layers and neurons.
Why do we use neural networks?
Learn how neural networks offer high approximation and representational power, enabling valuable data utilization and excelling in tasks like automated image classification.
Training of a neural network using pytorch
Learn how artificial neural networks mimic brain functions to process data, and how PyTorch simplifies building and training them using layers, weights, loss functions, and backpropagation.
How neural language models work in ChatGPT
Learn how ChatGPT uses transformer architecture with a focus on the decoder, leveraging vast data and attention mechanisms to generate coherent responses.
Benefits and Limitations of Neural Machine Translation in ChatGPT
Learn how ChatGPT's neural machine translation offers efficient, accurate language translations, while acknowledging its limitations due to its novelty.
What is Graph Neural Networks?
Learn how Graph Neural Networks (GNNs) handle non-Euclidean data using graphs, excelling in clustering, visualization, prediction, NLP, molecule structures, cybersecurity, and social network analysis.
What is a neural network-based approach for graph embeddings?
Learn how graph embeddings use neural networks like GCNs to represent graph data as vectors, enabling efficient analysis and tasks like node classification and link prediction.
How to avoid overfitting in neural network
Learn how to use cross-validation, regularization, dropout, early stopping, and data augmentation to effectively avoid overfitting in machine learning models.
How to Do Back Propagation in a Neural Network
Learn how to calculate gradients using backpropagation to update neural network parameters and improve learning from data actions.
PyTorch cheatsheet: Neural network layers
PyTorch provides diverse neural network layers, enabling the design and training of complex models for tasks like image classification, sequence modeling, and reinforcement learning.

Free Resources

Copyright ©2026 Educative, Inc. All rights reserved