Introduction to Perceptron

Explore the perceptron's function as a binary classifier within supervised learning models. Understand how perceptrons process inputs, calculate weighted sums, and classify data, while examining their historical significance and why more advanced methods like multilayer perceptrons are necessary. This lesson helps you grasp the basics behind perceptron architecture and prepares you for deeper study in neural networks.

We'll cover the following...

What is a perceptron?
Perceptrons assembly basics

In the previous chapters, we designed a complex program with a lot of detail. It’s time to take a step back and enjoy the big picture.

In the first part of this course, we built a supervised learning system based on a specific architecture called the perceptron. By the end of this chapter, we’ll know what a perceptron looks like and what it can do. Moreover, we’ll also explore what it cannot do, and why we must move forward to more sophisticated algorithms such as neural networks.

We’ll also learn about the history of the perceptron. It won’t be a boring history lesson, but rather, this will explain a clash of ideas that impacted much of what we know about computers.

What is a perceptron?

To understand what a perceptron is, let’s look back at the binary classifier we discussed in the Getting Real. That program sorted MNIST characters into two classes: “ $5$ ” or “not a $5.$ ” The following picture shows one way to understand it:

This diagram tracks an MNIST image through the system. The process begins with the input variables, from $x_1$ to $x_n$ . In the case of MNIST, the input variables are the $784$ pixels of an image. To those, we add a bias $x_0$ , with a constant value of $1$ . Further, we color it with a darker shade of gray, to make it stand apart from other input variables.

The next step, the yellow square, is the weighted sum of the input variables. It’s implemented as a multiplication of matrices, so it is marked with the “dot” sign.

The weighted sum flows through one last function—the light blue square. In general, this is called the activation function, and can be different for different learning systems. In our system, we use a sigmoid. The output of the sigmoid is the predicted label $\hat{y}$ , ranging from $0$ to $1$ .

During training, the system compares the prediction $\hat{y}$ with the ground truth to calculate the next step of gradient descent. During classification, it snaps the value of $\hat{y}$ to one of its extremes, either $1$ or $0$ , meaning “ $5$ ” or “not a $5$ ,” respectively.

The architecture explained above is the perceptron, the original supervised learning system.

Note: Since perceptron does not use gradient descent, the vanilla perceptronMultilayer perceptrons are sometimes colloquially referred to as “vanilla” perceptron does not need an activation function with a smooth gradient such as the sigmoid. Instead, it can get away with a simple function that snaps the value of the weighted sum to either $1$ or $0,$ depending on whether it’s positive or negative.

Perceptrons assembly basics

The perceptron is a great building block for more complex systems. It means we have been assembling multiple perceptrons since the very beginning. Let’s understand this concept.

During training, our system reads all the examples together, rather than reading one example at a time. In a way, that operation is like stacking multiple perceptrons, sending one example to each perceptron, and then collecting all the outputs into a matrix:

Each parallelized perceptron classifies one class, from $0$ to $9.$ During classification, we pick the class that outputs the most confident prediction.

So we stack perceptrons and parallelize perceptrons. We have done it with matrix operations in both cases, which is easier and faster than running the same classifier multiple times, once per example, and then once per class.

One more way to combine perceptrons is to serialize them, using the output of one perceptron as input to the next. The result is called a multilayer perceptron. We have not used multilayer perceptrons yet, but we’ll use them in the next lessons. For now, let’s keep this idea at the back of our minds.

1.How Machine Learning Works

2.Our First Learning Program

3.Walking the Gradient

4.Hyperspace

5.A Discern Machine

6.Get Real

7.The Final Challenge

8.The Perceptron

9.Designing the Network

10.Building the Network

11.Training the Network

12.How Classifiers Work

13.Batchin’ Up

14.The Zen of Testing

15.Let’s Do Development

16.A Deeper Kind of Network

Project

17.Defeating Overfitting

18.Taming Deep Networks

19.Beyond Vanilla Networks

20.Into the Deep

Project

Mock Interview

Introduction to Perceptron

What is a perceptron?

Perceptrons assembly basics