How image recognition works

Images can be thought of as arrays of pixels. In most cases, each pixel can be viewed as a set of three numbers, corresponding to the pixel’s RGB values. So the first process of image recognition is to process all the image data into arrays containing the numeric RGB values for each pixel.

After processing the images, we feed the data arrays into a convolutional neural network. A convolutional neural network (CNN) is a type of neural network that works extremely well for image-related tasks. You’ll learn about the CNN architecture and different variations of CNNs in this course.

The CNN will apply various computations to the input image data, producing a probability (or set of probabilities for multiclass classification). The class with the highest probability is the class that the CNN categorizes the image into. For more on classification using neural networks, check out the Machine Learning for Software Engineers course.

C. What will this course provide

After taking this course, you’ll be able to process image data, train a CNN on the image data, and use the CNN to perform image recognition. Specifically, you will be able to:

  • Take raw images and process them into usable data for machine learning. This includes techniques such as image standardization, image resizing, and data augmentation.
  • Build a CNN from scratch and different variations of the CNN such as SqueezeNet and ResNet.
  • Use a CNN to perform different types of image recognition, from digit recognition to classifying the ImageNet dataset.