Search⌘ K
AI Features

Neural Network Building Blocks

Explore the fundamental building blocks of neural networks, including linear layers and common activation functions such as sigmoid, ReLU, and leaky ReLU. Understand how dropout helps reduce overfitting and how layers process data batches using PyTorch. This lesson builds foundational knowledge for constructing and training neural networks effectively.

Neural networks form a class of machine learning objects that implement a parameterized function composition. Tensors flow in a neural network by successive transformations called layers.

A neural network is a composition of functions
A neural network is a composition of functions

In this lesson, we’ll get familiarized with some of the most common types of layers:

  • The linear layer

  • The logistic, a.k.a the sigmoid function

  • The ReLU function

  • The leaky ReLU function

  • The dropout layer

The linear layer

The linear layer is the star of neural networks. Almost every time we want to project a one-dimensional tensor, a.k.a. a vector, into another one-dimensional tensor, a linear layer will get involved.

A linear layer performs an affine transformation on a vector:

Here, yy is the output vector, WW is the weight vector, bb is the bias vector, and xx is the input vector.

Let’s create a linear layer with the PyTorch class torch.nn.Linear:

C++
import torch
# Create a linear layer that takes a 3-dimensional vector as input and
# outputs a 2-dimensional vector
linear_layer = torch.nn.Linear(in_features=3, out_features=2)
# Display the randomly initialized trainable parameters
linear_layer_parameters = list(linear_layer.named_parameters())
print(f"linear_layer_parameters =\n{linear_layer_parameters}")

In line 5, we create a torch.nn.Linear object. In line 8, we extract the layer’s trainable parameters with the named_parameters() method. The internal trainable parameters (in this case, tensors named weight and bias) are initialized randomly at object construction.

Any layer in PyTorch (and any composition of layers) processes the input tensor through the forward() method.

C++
import torch
linear_layer = torch.nn.Linear(3, 2)
input_tsr = torch.tensor([[1., 2., 3.]]) # input_tsr.shape = (1, 3)
output1_tsr = linear_layer.forward(input_tsr)
print(f"output1_tsr =\n{output1_tsr}")
output2_tsr = linear_layer(input_tsr)
print(f"output2_tsr =\n{output2_tsr}")

In line 5, we obtain the output of the linear layer through a call to its forward() method, with a suitable tensor as the argument. In line 8, we obtain the same output tensor by calling linear_layer(input_tsr), which internally calls the forward() method.

Notice that the input tensor must have a practice one as the first dimension. Layers process a batch of tensors. The first dimension in the tensor shape is the batch size. For this reason, even though the linear layer we created processes three-dimensional vectors, the input tensor shape must be (N, 3), where N can be any strictly positive integer.

We saw that the linear layer has two named parameters: weight and bias. As you can probably guess from their names, these tensors are the weight and the bias of an affine transformation.

C++
import torch
linear_layer = torch.nn.Linear(3, 2)
# Test the use of the linear layer
input_tsr = torch.tensor([[1., 2., 3.]]) # input_tsr.shape = (1, 3)
output1_tsr = linear_layer(input_tsr)
print(f"output1_tsr =\n{output1_tsr}")
# Do the same computation through an explicit affine transformation
W = linear_layer.weight
b = linear_layer.bias
for obs_ndx in range(input_tsr.shape[0]):
affine_output_tsr = torch.matmul(W, input_tsr[obs_ndx, :]) + b # Wx + b
print(f"affine_output_tsr (obs_ndx={obs_ndx}) =\n{affine_output_tsr}")

In line 14, we compute the affine transformation y=Wx+by = Wx + b, for each observation vector in the batch. We can confirm that it is indeed the computation done by the forward() method of the linear layer.

Non-linear activation functions

Neural networks are a composition of parameterized functions. What happens when we compose affine transformations?

Let’s write down an affine transformation of an affine transformation:

Here, we introduce W3W_3 and b3b_3 as the weight and bias of the affine transformation resulting from the composition of two affine transformations. In other words, a composition of affine transformations is an affine transformation. As we want to build non-linear function compositions, we must insert a non-linear function between linear functions (such as affine transformations). These non-linear functions are called activation functions.

Logistic

The logistic, or sigmoid function is defined in the following way:

The logistic function
The logistic function

The logistic function maps any real number to a number in the [0,1][0, 1] range.

The PyTorch function that implements the logistic function is torch.sigmoid().

C++
import torch
x_tsr = torch.tensor([-5., -2., -1., 0., 1., 2., 5.])
print(torch.sigmoid(x_tsr))

ReLU

The rectified linear unit, or ReLU, is a simple activation:

The ReLU function
The ReLU function

The ReLU function is the go-to activation function for deep neural networks. It is implemented by the PyTorch function torch.nn.functional.relu():

C++
import torch
input_tsr = torch.tensor([[-3., 0, 4., -0.5, -1.7, 1.7]])
output_tsr = torch.nn.functional.relu(input_tsr)
print(f"output_tsr = {output_tsr}")

In line 3, we compute the ReLU function for each entry in the input tensor.

Leaky ReLU

The flat segment at the left of the ReLU graph can create a situation where, for every observation in the training set, the output of the ReLU function is 00, and the gradient is also 00. When such a situation arises, a neuron is said to be dead because the neuron feeding the ReLU function won’t be updated anymore.

To avoid dead neurons, the leaky ReLU activation function always outputs a small non-zero value, even when the input number is negative:

Here, εε is a small number, typically in the range [0.01,0.1][0.01, 0.1].

The Leaky ReLU function
The Leaky ReLU function

The PyTorch function that implements the leaky ReLU function is torch.nn.functional.leaky_relu():

C++
import torch
input_tsr = torch.tensor([[-3., 0, 4., -0.5, -1.7, 1.7]])
output_tsr = torch.nn.functional.leaky_relu(input_tsr, negative_slope=0.01)
print(f"output_tsr = {output_tsr}")

In line 4, we apply the leaky ReLU function to an input tensor.

Dropout

The dropout layer behaves differently when the neural network is training vs. when inferencing, such as when used in production. When training, the dropout layer will mask a given proportion of the activation values. To keep an overall signal average approximately constant, it will boost the unmasked values. In inference, it won’t mask any values, essentially becoming transparent.

The dropout layer regularizes a neural network. It creates a constraint during training such that the neural network can’t rely too heavily on the value of a particular activation because any activation can be dropped at any time. It helps to reduce overfitting, the undesirable tendency to learn the specifics of the training examples rather than finding general rules.

The dropout layer is implemented by the PyTorch class torch.nn.Dropout.

C++
import torch
dropout = torch.nn.Dropout(p=0.5) # p is the proportion of the activation values that will get masked during training
print(f"dropout.training = {dropout.training}")
input_tsr = torch.randn(4, 4)
print(f"input_tsr =\n{input_tsr}")
output_tsr = dropout(input_tsr)
print(f"output_tsr =\n{output_tsr}")

In line 7, we pass the input tensor through the dropout layer. As a result, half the entries of the output tensor are zero, and the non-zero values are multiplied by a factor of two.

If we set the training field of the dropout layer to False, the dropout layer becomes a pipe that lets the input tensor pass without modification.

C++
import torch
dropout = torch.nn.Dropout(p=0.5)
dropout.training = False
print(f"dropout.training = {dropout.training}")
input_tsr = torch.randn(4, 4)
print(f"input_tsr =\n{input_tsr}")
output_tsr = dropout(input_tsr)
print(f"output_tsr =\n{output_tsr}")

In line 4, we set the dropout layer to inference mode, a.k.a. evaluation mode. As a result, the output tensor is equal to the input tensor.