Deep learning (DL) is a machine learning method that allows computers to mimic the human brain, usually to complete classification tasks on images or non-visual data sets. Deep learning has recently become an industry-defining tool for its to advances in GPU technology.
Deep learning is now used in self-driving cars, fraud detection, artificial intelligence programs, and beyond. These technologies are in high demand, so deep learning data scientists and ML engineers being hired every day.
Today, we’ll help you take the first step toward those exciting careers. You’ll learn how deep learning works, why it’s become so popular, and teach you to implement your first deep learning model.
Let’s get started!
Here’s what we’ll cover today:
Learn all the top machine learning techniques and tools without scrubbing through tutorial videos.
A Beginner’s Guide to Deep Learning
Deep Learning (sometimes called Deep Structured Learning) is a type of machine learning algorithm based on Artificial Neural Network technology (ANN).
Deep learning and other ANN methods allow computers to learn by example in a similar way to the human brain. This is accomplished through passing input data through multiple levels of Neural Net processing to transform data and narrow the possible predictions each step along the way.
Deep learning algorithms have powerful advantages over other models like:
Deep learning is best suited to classification patterns that match input data to a learned type. DL methods are therefore often used for image recognition, speech recognition software, Natural Language Processing (NLP).
More recently, it’s been used to allow self-driving cars to detect signs and obstacles.
Deep learning learns to recognize what features all members of a type have through the analysis of structured training data.
The algorithm then analyzes each data point and recognizes similarities between all data points of the same label. This process is called feature extraction.
The algorithm then selects which of these features form the most accurate criteria for each label. This criterion is called the decision boundary.
Once the program has perfected these criteria using all available training data, it uses these learned criteria to classify unstructured input data into the previous labels.
For example, an engineer may pass in 10,000 photos, with 5,000 labeled elephant
and another 5,000 labeled not elephant
. The model will go through all 1000 pictures and pull out features shared by elephant pictures like “four-legged” or “trunk”.
It would learn that many creatures have 4 legs, therefore if a creature has four legs it may be an elephant. Conversely, only elephants have a trunk. The model can then predict that if a pictured animal has a trunk, it’s very likely an elephant
.
The algorithm could then use these “trunk”, “four-legged” and other features to form a model that can assign elephant
or not elephant
labels to a different, unlabeled set of animal pictures.
Once this model is formed, we can even reuse it as a starting point for another similar deep learning algorithm. The process of reusing models is called transfer learning.
Comprehensive Training Data:
Our DL model can only be accurate if it is passed a variety of training data. Incorrect outcomes of a DL model are often caused by the training set rather than the model itself.
For example, the model would likely classify a wooly mammoth as an
elephant
if our training data didn’t include any pictures of wooly mammoths labelednot elephant
.
The key to deep learning is the many hidden layers of processing the input data must go through.
Each layer contains multiple neurons or “nodes” with mathematical functions that collect and classify data. The first and final layers are the input and output layers.
Between them, there are hidden layers with nodes that take the results of previous classifications as input. These nodes run the previous findings through their own classification functions and adjust the weighting of the findings accordingly.
Traditional neural nets before deep learning would only pass data through 2-3 hidden layers before completion. Deep learning increases that number to up to 150 hidden layers to increase result accuracy.
The input layer is raw data. It’s roughly classified and sent along to the appropriate hidden layer node.
The first hidden layer contains nodes that classify on the broadest criteria.
Each subsequent hidden layer’s nodes get more and more specific to narrow the classification possibilities further via result weighting.
The final output layer then chooses the most likely classification label out of those that have not been ruled out.
Deep learning is a specialized form of machine learning. The main difference between deep learning and machine learning processes is how features are extracted.
Machine learning: An engineer with knowledge of both the model and the subject being classified manually selects which features the ML algorithm will use as a decision boundary. The algorithm then searches for these set features and uses them to classify data.
Deep learning: Deep learning is a subset of ML that determines target features automatically, without the aid of a human engineer. This speeds up results as the algorithm can find and select features faster than a human can.
DL also increases accuracy because the algorithm can detect all features rather than just those recognizable to the human eye.
Deep learning also avoids the shallow learning plateau encountered by other types of ML. Shallow learning algorithms are ML algorithms that do not gain in accuracy beyond a certain amount of training data.
Deep learning is not shallow learning and continues to scale inaccuracy even with extremely large training data pools.
The downside of deep learning is that it requires a larger pool of labeled training data to get started. It also requires a powerful machine with an efficient GPU to rapidly process each image.
If you do not have either of these things, other ML algorithms will be a better choice.
Continue your deep learning education with hands-on walkthroughs of top projects and tools like Keras and NumPy. Educative’s in-browser coding windows allow you to skip the setup and get right to learning.
Now we’ll look at a hands-on example of classification in Python, the Perceptron. Perceptron is a binary linear classifier used in supervised learning to determine lines that separates two classes.
Each node in a neural net hidden layer is essentially a small perceptron. As we build this single perceptron, imagine how many of these in sequence could classify data with complex features.
This example learns its feature recognition like deep learning algorithms but for this example, we’ll only have a single neural network layer.
The boundary line that separates the two classes are:
$w_1$$x_1$ $+ w_2$$x_2$ $+ b = 0$
Here:
$x_1$ and $x_2$ are the inputs
$w_1$ and $w_2$ are the weights
$b$ is the bias
This equation will allow our model to find the boundary line between our two input classes, star
and not star
.
Now we’ll implement forward propagation to determine if a point is a part of star
or not.
This is a discrete prediction because our implementation simply returns “yes” or “no” and not a percentage of certainty about that prediction.
import numpy as npdef step(weighted_sum): # step activation function# The step activation is applied to the perceptron output that# returns 0 if the weighted sum is less than 0 and 1 otherwisereturn (weighted_sum > 0) * 1def forward_propagation(input_data, weights, bias):#Computes the forward propagation operation of a perceptron# and returns the output after applying the step activation function# takes the dot product of input and the weights and adds the biasreturn step(np.dot(input_data, weights) + bias)# Initialize parametersX = np.array([2, 3]) # declaring two data pointsY = np.array([0]) # labelweights = np.array([2.0, 3.0]) # weights of perceptronbias = 0.1 # bias valueY_predicted = forward_propagation(X, weights.T, bias) # predicted labelprint("Predicted label:", Y_predicted)
Code Explanation:
Call the
forward_propagation
function:After the parameters are initialized, the forward propagation function is called.
forward_propagation
function
- Takes in the input variable X and weights, then it calculates the dot product using np.dot and adds the bias to compute the weighted sum.
- Applies the step function to the computed weighted sum.
step
function:Takes the weighted sum and returns 1 if the value is greater than 0 and 0 otherwise.
Variables | Definition |
---|---|
X |
An input NumPY array with feature values 2 and 3 |
Y |
An output label with value 0 |
weights |
The weights of the perceptron with initial values of 2 and 3, respectively. |
bias |
The bias value initialized with 0 |
Now we’ll apply the Sigmoid Activation Function to make our example more accurate. The function increases the range of prediction of our program from 0
or 1
to between 0
and 1
.
This allows our program to record various levels of certainty and approve those above a certain threshold.
import numpy as npdef sigmoid(x):# The sigmoid activation functionreturn 1 / (1 + np.exp(-x)) # applying the sigmoid functiondef forward_propagation(input_data, weights, bias):#Computes the forward propagation operation of a perceptron and#returns the output after applying the sigmoid activation function# take the dot product of input and weight and add the biasreturn sigmoid(np.dot(input_data, weights) + bias) # the perceptron equation# Initializing parametersX = np.array([2, 3]) # declaring two data pointsY = np.array([0]) # labelweights = np.array([2.0, 3.0]) # weights of perceptronbias = 0.1 # bias valueoutput = forward_propagation(X, weights.T, bias) # predicted labelprint("Forward propagation output:", output)Y_predicted = (output > 0.5) * 1 ## apply sigmoid activationprint("Label:", Y_predicted)
Code Explanation:
sigmoid function
For the given input value
x
, the value of sigmoid can be calculated as $1 / 1 + np.exp(-x)$.The label after the forward propagation operation is predicted as
1
if the sigmoid output is greater than0.5
and0
otherwise. In this example, the threshold is set to0.5
.Threshold-based classification models logistic regression algorithms, therefore we’ve implemented logistic regression.
Finally, we’ll implement an error function that compares the actual value and the predicted value of each point in our example.
Error functions are used to quantify the certainty of a prediction. For example, instead of simply having the logistically determined “yes” or “no”, we’ll be able to see how certain the program is in its prediction.
Cross-entropy is the error function used for classification models.
$E= -(y log(y') + (1-y)log(1-y'))$
Minimized cross-entropy indicates a maximum likelihood that a class belongs to the predicted type.
import numpy as npdef sigmoid(x):# The sigmoid activation function"""return 1 / (1 + np.exp(-x))def forward_propagation(input_data, weights, bias):# Computes the forward propagation operation of a perceptron and# returns the output after applying the sigmoid activation function# take the dot product of input and weight and add the biasreturn sigmoid(np.dot(input_data, weights) + bias)def calculate_error(y, y_predicted):#Computes the binary cross entropy error"""return - y * np.log(y_predicted) - (1 - y) * np.log(1 - y_predicted)def ce_two_different_weights(X, Y, weights_0, weights_1, bias):#Computes sum of error using two different weights and the same bias"""sum_error1 = 0.0sum_error2 = 0.0for j in range(len(X)):Y_predicted_1 = forward_propagation(X[j], weights_0.T, bias) # predicted labelsum_error1 = sum_error1 + calculate_error (Y[j], Y_predicted_1) # sum of error with weights_0Y_predicted_2 = forward_propagation(X[j], weights_1.T, bias) # predicted labelsum_error2 = sum_error2 + calculate_error (Y[j], Y_predicted_2) # sum of error with weights_1return sum_error1, sum_error2# Initialize parametersX = np.array([[2, 3], [1, 4], [-1, -3], [-4, -5]]) # declaring two data pointsY = np.array([1.0, 1.0, 0.0, 0.0]) # actual labelweights_0 = np.array([0.0, 0.0]) # weights of perceptronweights_1 = np.array([1.0, -1.0]) # weights of perceptronbias = 0.0 # bias valuesum_error1, sum_error2 = ce_two_different_weights(X, Y, weights_0, weights_1, bias)print("sum_error1:", sum_error1, "sum_error2:", sum_error2)
Code Explanation:
ce_error_different_weights
function
The functions take the parameters, the input data features
X
, the labelsY
,weights_0
,weights_1
, andbias
.Line 18 - 27: Loops over the training data calculates the predicted value and error. It also continues to add the error of the previous iteration in the variable
sum_error1
(line 20) andsum_error2
(line 22) while using both of the weights separately.Line 27: Returns the sum of cross-entropy error by each of the weights.
Congratulations, you’ve now made a simple Perceptron classifier! You can now move onto other top deep learning projects like:
Classification is the most common use of deep learning so you’ll want to get as much practice with them as possible!
To help you along the way, Educative has created the course A Beginner’s Guide to Deep Learning. The course walks you through core concepts of deep learning at an approachable level. Then, you get the chance to practice each concept with a hands-on example.
By the end of the course, you’ll have the hands-on experience you need to start off your deep learning journey right.
Happy learning!
Join a community of more than 1.4 million readers. A free, bi-monthly email with a roundup of Educative's top articles and coding tips.