Neural Network Building Blocks
Explore the fundamental building blocks of neural networks, including linear layers and common activation functions such as sigmoid, ReLU, and leaky ReLU. Understand how dropout helps reduce overfitting and how layers process data batches using PyTorch. This lesson builds foundational knowledge for constructing and training neural networks effectively.
We'll cover the following...
Neural networks form a class of machine learning objects that implement a parameterized function composition. Tensors flow in a neural network by successive transformations called layers.
In this lesson, we’ll get familiarized with some of the most common types of layers:
The linear layer
The logistic, a.k.a the sigmoid function
The ReLU function
The leaky ReLU function
The dropout layer
The linear layer
The linear layer is the star of neural networks. Almost every time we want to project a one-dimensional tensor, a.k.a. a vector, into another one-dimensional tensor, a linear layer will get involved.
A linear layer performs an affine transformation on a vector:
Here,
Let’s create a linear layer with the PyTorch class torch.nn.Linear:
In line 5, we create a torch.nn.Linear object. In line 8, we extract the layer’s trainable parameters with the named_parameters() method. The internal trainable parameters (in this case, tensors named weight and bias) are initialized randomly at object construction.
Any layer in PyTorch (and any composition of layers) processes the input tensor through the forward() method.
In line 5, we obtain the output of the linear layer through a call to its forward() method, with a suitable tensor as the argument. In line 8, we obtain the same output tensor by calling linear_layer(input_tsr), which internally calls the forward() method.
Notice that the input tensor must have a practice one as the first dimension. Layers process a batch of tensors. The first dimension in the tensor shape is the batch size. For this reason, even though the linear layer we created processes three-dimensional vectors, the input tensor shape must be (N, 3), where N can be any strictly positive integer.
We saw that the linear layer has two named parameters: weight and bias. As you can probably guess from their names, these tensors are the weight and the bias of an affine transformation.
In line 14, we compute the affine transformation forward() method of the linear layer.
Non-linear activation functions
Neural networks are a composition of parameterized functions. What happens when we compose affine transformations?
Let’s write down an affine transformation of an affine transformation:
Here, we introduce
Logistic
The logistic, or sigmoid function is defined in the following way:
The logistic function maps any real number to a number in the
The PyTorch function that implements the logistic function is torch.sigmoid().
ReLU
The rectified linear unit, or ReLU, is a simple activation:
The ReLU function is the go-to activation function for deep neural networks. It is implemented by the PyTorch function torch.nn.functional.relu():
In line 3, we compute the ReLU function for each entry in the input tensor.
Leaky ReLU
The flat segment at the left of the ReLU graph can create a situation where, for every observation in the training set, the output of the ReLU function is
To avoid dead neurons, the leaky ReLU activation function always outputs a small non-zero value, even when the input number is negative:
Here,
The PyTorch function that implements the leaky ReLU function is torch.nn.functional.leaky_relu():
In line 4, we apply the leaky ReLU function to an input tensor.
Dropout
The dropout layer behaves differently when the neural network is training vs. when inferencing, such as when used in production. When training, the dropout layer will mask a given proportion of the activation values. To keep an overall signal average approximately constant, it will boost the unmasked values. In inference, it won’t mask any values, essentially becoming transparent.
The dropout layer regularizes a neural network. It creates a constraint during training such that the neural network can’t rely too heavily on the value of a particular activation because any activation can be dropped at any time. It helps to reduce overfitting, the undesirable tendency to learn the specifics of the training examples rather than finding general rules.
The dropout layer is implemented by the PyTorch class torch.nn.Dropout.
In line 7, we pass the input tensor through the dropout layer. As a result, half the entries of the output tensor are zero, and the non-zero values are multiplied by a factor of two.
If we set the training field of the dropout layer to False, the dropout layer becomes a pipe that lets the input tensor pass without modification.
In line 4, we set the dropout layer to inference mode, a.k.a. evaluation mode. As a result, the output tensor is equal to the input tensor.