Dynamic weight pruning in PyTorch

Dynamic weight pruning is a technique used to reduce the size and computational cost of neural networks during the training of deep learning models. It involves identifying and removing unimportant weights from the network while minimizing the impact on its performance. Dynamic pruning happens during the training process itself, unlike static pruning, which happens before training. For several reasons, dynamic weight pruning primarily focuses on improving the efficiency and performance of deep learning models. Here’s a breakdown of the key advantages:

Reduced model size: Pruning removes unnecessary weights, leading to a smaller and more compact model size. This translates to reduced storage requirements and memory usage during inference.
Faster inference: A smaller model requires fewer computations during the forward pass of the model, resulting in faster predictions.
Improved generalization: Pruning can sometimes lead to slightly improved accuracy by removing redundant weights that might not contribute significantly to learning. It can also have a regularizing effect similar to dropout, preventing overfitting the training data.

How to do it

During training, we assign each weight in the network an importance score that reflects its contribution to the network’s output. Several methods, such as magnitude-based pruning (removing weights with small absolute values) or gradient-based pruning (removing weights with small gradients), can be used to calculate this score. A pruning threshold is defined based on the importance scores. Weights with scores below this threshold are considered unimportant and can be removed from the network. The weights below the threshold are set to zero, effectively removing them from the network. The remaining weights and biases are updated as usual during backpropagation.

In a coding environment, let’s see how to perform dynamic weight pruning with PyTorch.

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.nn.utils import prune

# Define a 5-layer fully connected neural network model
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 128)
        self.fc4 = nn.Linear(128, 64)
        self.fc5 = nn.Linear(64, 10)

    def forward(self, x):
        x = x.view(-1, 28 * 28)  # Flatten input image
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = torch.relu(self.fc3(x))
        x = torch.relu(self.fc4(x))
        x = self.fc5(x)
        return x

# Load FashionMNIST dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

training_data = datasets.FashionMNIST(root='./data', train=True, transform=transform, download=True)
testing_data = datasets.FashionMNIST(root='./data', train=False, transform=transform, download=True)

training_loader = torch.utils.data.DataLoader(training_data, batch_size=64, shuffle=True)
testing_loader = torch.utils.data.DataLoader(testing_data, batch_size=1000, shuffle=False)

# Instantiate the model and optimizer
model = SimpleNN()
optimizer = optim.Adam(model.parameters(), lr=0.001)

def dynamic_prune(model, pruning_amount):
    for name, module in model.named_modules():
        if isinstance(module, nn.Linear):
            # Calculate the number of weights to prune based on the pruning amount
            num_weights_to_prune = int(module.weight.numel() * pruning_amount)
            # Get the top indices of absolute weight values (least important)
            _, prune_indices = torch.topk(torch.abs(module.weight.data).view(-1), num_weights_to_prune)
            # Create a mask to zero out the least important weights
            mask = torch.ones_like(module.weight.data)
            mask.view(-1)[prune_indices] = 0
            # Apply the mask to prune weights
            module.weight.data *= mask

# Train the model with pruning
def train(model, optimizer, training_loader, epochs=3, prune_frequency=100, pruning_amount=0.05):
    for epoch in range(epochs):
        running_loss = 0.0
        for i, (data, true_labels) in enumerate(training_loader, 0):
            optimizer.zero_grad()
            pred_outputs = model(data)
            loss = nn.functional.cross_entropy(pred_outputs, true_labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
            if i % prune_frequency == prune_frequency - 1:  # Prune every 'prune_frequency' mini-batches
                dynamic_prune(model, pruning_amount)  # Prune based on the specified pruning amount
            if i % 100 == 99:  # Print every 100 mini-batches
                print('[%d, %5d] loss: %.3f' %
                      (epoch + 1, i + 1, running_loss / 100))
                running_loss = 0.0

train(model, optimizer, training_loader)

Dynamic weight pruning with PyTorch

Explanation

The explanation of the code given above is given as follows:

Lines 1–5: We import necessary modules from the PyTorch library for neural network training, pruning, handling datasets, and transformations from torchvision.
Lines 8–24: We define a simple, fully connected neural network model named SimpleNN inheriting from nn.Module. Initialize the model with five fully connected layers of different sizes and implement the forward pass method to define the data flow through the network.
Lines 27–30: We define a transformation to be applied to the input data and load FashionMNIST dataset for both training and testing sets, applying the defined transformation.
Lines 32–36: We create data loaders for training and testing datasets with specified batch sizes and shuffling settings.
Lines 39–40: We instantiate the neural network model and an optimizer for training.
Lines 42–53: We define a function for dynamic pruning that does the following tasks:
- Iterates through the model’s modules and focuses on linear layers.
- Calculates the number of weights to prune based on the provided pruning_amount.
- Identifies the indexes of the least important weights (those with the lowest absolute values) using torch.topk.
- Creates a mask that sets the elements corresponding to those indexes to zero.
- Finally, it applies the mask to the weight tensor, effectively pruning the least important weights.
Lines 56–71: We define a function to train a neural network while also performing dynamic weight pruning, which is done in the following way:
- If the current mini-batch index is a multiple of the prune_frequency minus one, it calls the dynamic_prune function to prune the model with the specified pruning_amount.
- Every 100 mini-batches, it prints the average loss for those mini-batches.
Line 73: We call the training function.

Dynamic weight pruning can cause some damage as well, which includes the following:

Accuracy loss: Aggressive pruning can decrease model accuracy. Finding the right balance between pruning and performance is crucial.
Fine-tuning required: After pruning, the network might require further training (fine-tuning) to adjust to the reduced weight space.

Overall, dynamic weight pruning offers a powerful technique for optimizing deep learning models by reducing their size, improving inference speed, and potentially enhancing generalization. However, carefully considering the trade-offs and proper implementation are crucial for achieving the desired benefits.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources