Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

ai
neural network
machine learning
deep learning

What are ResNets?

Muhammad Nabeel

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Overview

Deep neural networks are hard to train as their depth increases. This increase in depth comes with many problems. Residual Networks or ResNets are a solution to such problems without any greater cost overhead to the regular working of the neural networks. Before we dive into the ResNets, let's understand why we need them.

Degradation in deep neural networks

Neural networks have multiple neurons, and they are stacked in layers. These number of layers form the depth of the neural network. The number of neurons in each layer forms the width of a neural network.

It has been shown that increasing the width of the neural network makes it prone to memorizing the training data, leading to overfitting. But increasing the depth allows the neural networks to learn from the training data in a generalized way which is our goal.

After addressing problems like vanishing gradientsThe gradients of the activation functions like sigmoid become very small, which makes it difficult to train bigger models. Using activation functions like ReLU can solve this problem., there is still a problem of degradation. With the increase in the neural networks' depth, the neural network's accuracy starts degrading, which means higher training error. Moreover, experiments show that this is not because of overfitting.

Working of ResNets

ResNets can help us counter the degradation with the help of identity mapping. Identity mapping in a neural network does nothing but returns the same input that was given to the layer.

Note: If you are a little bit aware of electronics, you can imagine it to be a buffer amplifier.

To understand the usage of identity mapping, let's take an example of a neural network with 50 layers. We can introduce 20 new layers with identity mapping into this neural network. This will not degrade as these 20 layers are just forwarding the output with little tweaks to improve the performance.

A residual block

A ResNet consists of residual blocks. A residual block is like a shortcut block shown in the figure above. The working of this block can be represented as:

When learning identity mapping, the neural network needs to simply learn so as to make F(x)=0F(x) = 0. This can allow the neural network to just forward xx as it is.

Note: Usually, there is a combination of convolution, batch normalization, and activation layers in the place of one weighted layer.

Code implementation

Here, we'll implement a residual block in TensorFlow:

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # to ignore tf warnings
import tensorflow as tf
from tensorflow.keras import layers
def residual_block(shape=(24, 24, 3)):
x = layers.Input(shape)
x_copy = x
# Layer 1
conv2d_layer1 = layers.Conv2D(3, (3, 3), padding='same')(x)
batch_norm_layer1 = layers.BatchNormalization()(conv2d_layer1)
activation_layer1 = layers.Activation('relu')(batch_norm_layer1)
# Layer 2
conv2d_layer2 = layers.Conv2D(3, (3, 3), padding='same')(activation_layer1)
batch_norm_layer2 = layers.BatchNormalization()(conv2d_layer2)
# skip connection
addition_layer = layers.Add()([x_copy, batch_norm_layer2])
activation_layer2 = layers.Activation('relu')(addition_layer)
model = tf.keras.Model(inputs=x, outputs=activation_layer2)
return model
residual_block().summary()
The residual block

Code explanation

  • Line 2: We block TensorFlow warnings.
  • Line 7: We take the input of shape dimensions.
  • Line 11–13: We define the first layer with convolution, batch normalization, and activation layer.

Note: You can set the kernel size and filters according to your needs.

  • Line 15–16: We have convolution and batch normalization layer, but before we perform activation, we need to add the skip connection.
  • Line 19: We complete our skip connection by adding the original input to the output from the batch normalization layer.
  • Line 20: We perform the activation function using the activation layer, and that completes our residual block
  • Line 22–23: We create a model using just one residual block. This model is returned to the calling residual_block function.

Import ResNets

In this section, we'll learn to import the pretrained ResNet model—ResNet50. It is a relatively small model with 50 layers.

Using Tensorflow

We can use a number of models that are already available to us using TensorFlow. Here, we import ResNet50 using tf.keras.applications. It uses ImageNet weights by default. You can learn more here.

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # to ignore tf warnings
import tensorflow as tf
model = tf.keras.applications.resnet50.ResNet50()
model.summary()
How to import ResNet50 in tensorflow

Using PyTorch

Here, we use torchvision to get the models. To print the summary of a model in PyTorch, we have to install torchsummary.

from torchvision import models
from torchsummary import summary
model = models.resnet50()
print(model)
summary(model, (3, 512, 512))
How to import resnet50 in PyTorch

We have to provide the input shape of ResNet50 explicitly so torchsummary could use it to print the summary. Here, the input shape for ResNet50 is (512, 512, 3).

RELATED TAGS

ai
neural network
machine learning
deep learning

CONTRIBUTOR

Muhammad Nabeel
Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring