Introduction to Convolutional Neural Networks
Welcome to the first chapter. In this chapter, we are going to discuss the workings of a Convolutional Neural Network, and, in the latter part of the chapter, we will be building a real-world project that you can even take to the next level!
Before starting, let’s learn how the human brain reads an image.
How does the human brain read an image?
To understand CNNs, we must understand how our own human brain works or, more specifically, how it reads an image. Our brain depends on detecting features and categorizing the objects we see accordingly. You must have been in a situation where once you saw an image, you perceived it to be something, and, then, after looking at it again more thoroughly, you judged or perceived the image in a different manner. Let’s see this with an example below:
What did you see at first glance in the above image? Did you see a person’s face looking sideways or a person looking at you?
The point that we want to make here is that our brain classifies any image it sees based on the features it detects first. Convolutional Neural Networks work in a similar manner. CNNs try to find the features that are important for the classification of the images.
Note: CNNs are mainly used for image-based tasks, but recent studies have proven that they can also be used for NLP tasks. However, the application can be limited for NLP.
How does a CNN work?
There are three elements involved when we work with CNNs:
- Input image
- Convolutional Neural Network
- Output (classified image class)
Here, the CNN works in four steps, which are discussed further in the latter part of the chapter.
Convolution - the first building block is the convolution operation. In this step, we will discuss the feature detectors, which basically serve as the neural network’s filters. We will also discuss how these filters are learned by the network.
Pooling - in this lesson, we’ll cover pooling, and you will learn exactly how it works. Our focus, however, will be on a specific type of pooling i.e. max pooling. However, there are some different poolings available and we will be discussing them in the next chapter.
Flattening - in this lesson, we will discuss how flattening works and creates an output that converts your input to a vector.
Full Connection - in this lesson, we will see how all of these steps are merged together and how the final predictions are done.
These are the building blocks of a Convolutional Neural Network.
How does a CNN work with images?
A Convolutional Neural Network not only works with black and white images, but it can work with colored images. Although there will be some differences, the overall structure of a CNN remains the same.
It will become clearer once we move ahead in the chapter but please keep these points in mind:
- Images are represented as matrices. A black and white image is simply a single matrix, whereas a colored image is represented by a combination of three matrices (one for each color channel, i.e., RGB).
- The values in the matrix are the pixel values at the point in the image. Look at the image below to better understand this.
- Each pixel contains eight bits (one byte) of information.
- Colors are represented on a scale from 0 to 255.
- Generally, 0 is pitch black and 255 is pure white, and in between 0 and 255 are the various shades of gray.
- Each pixel value for a colored image is represented on three levels (RGB). Since any color is a combination of red, green, and blue at different levels of concentration, a single pixel in a colored image is assigned a separate value for each of these layers.
- The network does not actually learn colors. Since computers understand nothing but 1’s and 0’s, the colors’ numerical values are represented to the network in binary terms.
Now let’s look at the application of CNNs.
Applications of CNN
- Self-driving cars
- Image classification
- Facial expression detection
- Medical diagnosis using X-Rays, CT Scans, etc.
CNNs are very useful in almost all sectors.