# Supervised vs unsupervised learning

Apr 13, 2022 - 14 min read
Crystal Song

With the potential to transform entire industries, artificial intelligence has long been recognized as being at the forefront of technological progress.

Today, the field of artificial intelligence is rapidly adapting and evolving to match the expanding scale and increasing complexity of data being generated across all industries and fields of research. As a result, there is a serious demand for engineers, developers, and data scientists with the skills and ambition to drive the field of artificial intelligence forward.

Adding machine learning to your skill set is one way to get started in this field today.

Machine learning is a particular subset of artificial intelligence that has garnered attention as a powerful tool with the potential to have a major impact on addressing high-profile problems with no clear solution in sight.

Part of the reason why machine learning is so valuable is its ability to handle big data. Machine learning can help identify hidden patterns in vast quantities of data that would overwhelm the average person. Machine learning models allow us to reach into the chaos and extract valuable information that can help us with decision-making and forecasting trends in our data.

If you’re interested in learning more about how machine learning is currently being used to solve problems, and are considering a career working in artificial intelligence, then you’re in the right place! Today, we’ll be talking about some of the key differences between two approaches in data science: supervised and unsupervised machine learning. Afterward, we’ll go over some additional resources to help get you started on your machine learning journey.

We’ll cover:

#### Get hands-on with machine learning algorithms today.

Try one of our 300+ courses and learning paths: Machine Learning for Software Engineers.

## What is machine learning?

Machine learning is the subset of artificial intelligence (AI) that studies the algorithms and statistical models used by computer systems to perform tasks without being programmed to do so.

The major advantage of machine learning comes from its ability to enable computers to optimize their performance without needing explicit instructions. Instead, computer programmers can rely on machine learning to learn from the current context and generalize out to unseen tasks[1] that adjust their programs without direct intervention.

As mentioned before, the huge volume of datasets being generated today has led to a proportionate demand in many industries for machine learning to extract relevant data[2] that is capable of driving intelligent business decisions. At a corporate scale, machine learning is well-suited for making massive improvements to the efficiency of supply chains, energy consumption, and other areas with financial impact.

## Supervised vs unsupervised learning

Supervised learning is similar to how a student would learn from their teacher. The teacher acts as a supervisor, or, an authoritative source of information that the student can rely on to guide their learning. You can also think of the student’s mind as a computational engine.

Say these students are going on a field trip to the local zoo to learn about animals. The teacher shows the students each animal, and then provides the student with the animal’s name, or, label.

If the student makes a mistake when trying to identify certain animals, the teacher corrects their mistake by providing the correct name. As the teacher continues to train the student, the student begins to develop a pattern, or, model, in their minds.

Computational engines learn to recognize patterns and build models based on the training data provided by a supervisor. When that computational engine is presented with an unknown or unlabeled element, they can predict a label for it based on what they learned from the training data.

Essentially, the supervisor shows the computational engine an animal $\bold{x_i}$ and then tells the computational engine what label to use $y_i$ for that animal. Showing the computational engine more examples $(\bold{x_i},y_i)$ trains the computational engine to develop a model.

The supervisor shows the computational engine an unknown animal $\bold{x_t}$, and asks for its label $y_t$.

The computational engine predicts a label based on what it learned from the training data.

Unsupervised learning has no supervisor, and no correct answers[3]. In unsupervised learning, information is unsorted, and instead grouped according to similarities and differences. In other words, unsupervised learning would be similar to letting students explore the zoo on their own to come up with their own ideas for why the zoo is organized the way it is based solely on what they observe.

To summarize, the main difference is that input data will be accompanied by labels in supervised learning, but won’t have any labels in unsupervised learning.

### Other notable differences

Feature Supervised Learning Unsupervised Learning
Accuracy More accurate results Less accurate results
Complexity Less complex and more easily understood More complex. Requires more computation power to process due to ambiguity in data
Input/Output Input and output variables are given Only input variables are given
Time Learning takes place offline Learning takes place online and in real-time

## Supervised learning

In this section we will go over a brief comparison between regression vs. classification, and then move on to how those concepts relate to four popular supervised machine learning algorithms:

• Linear regression
• Support Vector Machine (SVM)
• Logistic regression
• Random forest

### Regression vs classification

Supervised learning models are especially well-suited for handling regression problems and classification problems.

#### Classification

One machine learning method is classifying, and refers to the task of taking an input value and using it to predict discrete output values typically consisting of classes or categories.

#### Regression

Regression refers to the task of predicting continuous output values such as temperature, height, or stock market trends.

#### Training data

Training datasets can come in a variety of formats ranging from text to images, video, and audio. These datasets contain labeled data that helps train your machine learning algorithm to identify specific features and patterns in the data. Eventually, the training will enable your machine learning model to identify the features and patterns in unlabeled data.

Supervised learning focuses on the following sets of labeled data.

• Classification data: Training data where the labels $y_i$ represent different classes instead of a numeric value of some importance.
• Regression data: Training data where the labels $y_i$ have values of numeric importance, typically a real number.

Regression and classification are both types of supervised learning algorithms where the training data contains labels $y_i$.

Note: Other types of datasets include test data, which is used to benchmark the efficiency of a machine learning algorithm when predicting answers, and validation data, which is used to evaluate your training approach based on the algorithm and model parameters you have set.

### Linear regression

Linear regression was first developed in the field of statistics and is used in machine learning to create predictive models that assume a linear relationship between input variables (x) and an output variable (y).

• Simple linear regression: One input for x
• Multiple linear regression: Multiple input variables

One main advantage of using a linear regression model lies in its simplicity. When representing a model using a linear equation, making a prediction can be as simple as solving an equation for the inputs you specify.

### Support Vector Machine (SVM)

SVM is a popular binary-classification algorithm that provides a linear model for both classification and regression problems. For a while, SVM was the default choice because it provided simple models that avoided over-fitting. However, one drawback of SVM is that it couldn’t be extended to multi-class problems as easily as other algorithms.

Note: Non-linear SVMs also exist! Some datasets that can’t be optimally separated by a linear function can still be separated by a quadratic one.

Support vectors are the data points that lie closest to the decision surface (or hyperplane)[1]. These data points are some of the most difficult ones to classify, and are critical to finding the optimal hyperplane. Removing any of these data points would ultimately change the position of the hyperplane.

The goal of an SVM is to maximize the margin around the hyperplane that separates these data points.

Note: In 2-dimensional space, data points can be separated by a line. SVM is especially effective when applied to spaces with higher dimensions because it allows the use of a hyperplane.

### Logistic regression

Despite its name, the logistic regression model is actually a linear model for classification. It is referred to as a logistic regression because it performs regression on logits[2], which allows for the classification of data based on model probability predictions.

Like SVM, logistic regression estimates the classification boundary by maximizing the margin of all data points from the boundary. Unlike SVM, logistic regression can be extended to multiple classes with relative ease.

### Random forest

A random forest is referred to as such because it is essentially a group of decision trees!

With a random forest algorithm, the training model learns to predict the values of a target variable by learning the rules for making a decision. These decisions can be represented as a tree, with each branch leading to a decision node. Each node contains an attribute and asks for a decision to be made based on the available features.

Random forests are arguably one of the most popular algorithms used in supervised machine learning for regression and classification problems. The simplicity of this algorithm makes it approachable and easy to interpret for a wide range of problems.

### Neural networks

With over 80 billion neurons, the human brain is easily one of the most complex systems on Earth, and even after decades of study, the depth and breadth of its cognitive processes are nowhere close to being fully understood.

Biological neural networks like the human brain inspired the emergence of artificial neural networks (ANN). Deep learning (DL) is a subset of machine learning based on ANN technology, and attempts to expand the functionality of computers by enabling them to learn in a way that is similar to humans.

Neural networks are one of the most fundamental and ambitious concepts related to machine learning. Although traditional computers are great at performing many rapid calculations, they tend to struggle with solving problems that biological brains can handle with ease, like image recognition. Artificial neural networks aim to mimic cognitive processes in ways that can be used to perform interesting and more complex tasks.

One good example of how an artificial neural network is used in machine learning can be found in DeepMind’s AlphaGo, which used reinforcement learning to learn from millions of games of Go played against itself.

### Applications of supervised learning

Machine learning has been successfully applied in a wide variety of fields and industries ranging from pattern recognition, computer vision, spacecraft engineering, finance, entertainment, computational biology, and medicine[3]. Below are some interesting examples of use cases for supervised learning algorithms.

#### Image classification

Computer vision is a field of artificial intelligence concerning the ability of machines to be able to gain high-level understanding from images and videos. At the heart of computer vision is the task of image recognition. Image classification is used to train neural networks by taking raw images and processing them into usable data for machine learning.

Image recognition models are essential for many machine-based visual tasks like facial recognition, guiding autonomous robots, or helping self-driving cars avoid accidents.

#### Object detection

Although image classification is essential for categorizing images with labels, object detection is equally important for telling us where objects exist within an image. This is done by using bounding boxes, which use (x, y) coordinates to tell us the location of each object in an image.

#### Anomaly detection

Anomaly detection in machine learning refers to the task of identifying outliers, abnormal data points, and other unexpected observations in the dataset. Supervised anomaly detection requires training a classifier using two labeled datasets with one labeled normal and the other labeled as abnormal.

## Unsupervised learning

Unsupervised learning models use datasets without labeled outcomes to predict outcomes of unseen data.

There are two main types of unsupervised learning algorithms:

Clustering algorithms: Data is processed into clusters of data points that bear similar features to other data points in the same cluster

Association algorithms: Interesting relationships between variables in large databases are found and used to identify underlying association rules for how and why certain data points are connected.

### K-means clustering

K-means clustering is an iterative process that first looks for a fixed number of clusters (K) in the dataset. Initially, these clusters are picked randomly but will be recomputed later until the inertia or within-cluster-sum-of-squares is completely minimized.

The inertia of a K-means cluster is reduced by calculating the center of the ‘Kth’ cluster is represented by ‘μk’, and is also referred to as a cluster centroid, average point, or sometimes the cluster-center. Cluster centroids are simply the mean of all points within that cluster.

Each instance of a data point is added to the nearest centroid by calculating measures of similarity or distance. Then the centroids are recomputed with the new average point of the cluster. Data points are again added to the closest cluster centroid, and the average is recomputed again until the average no longer changes.

### Principal Component Analysis (PCA)

Principal component analysis is a very popular method for performing exploratory data analysis, information compression, data compression, image processing, and more. However, it’s primarily used for dimensionality reduction. Dimensionality refers to the number of variables and attributes your data possesses.

Having a high number of input variables can severely limit the function and performance of the algorithm used. This problem is known as the curse of dimensionality[4].

Another good reason for reducing input variables and dimensionality is to obtain a statistically sound and reliable result. When dimensionality increases, the amount of data needed to support your result grows exponentially.

Dimension reduction methods like PCA work for data points observed in high-dimensional spaces because it reduces the number of variables in a dataset while preserving the information needed to analyze and explore your data.

Given a dataset, PCA works by normalizing the size of the data. Each element of a dimension is subtracted from the mean of its corresponding dimension.

#### Get hands-on with machine learning algorithms today.

Try one of our 300+ courses and learning paths: Machine Learning for Software Engineers.

### Applications of unsupervised learning

#### Image segmentation

Image segmentation is an extension of image classification that involves breaking down images to reduce their visual complexity. Simplifying an image can make processing and image analysis quicker and more efficient.

Unsupervised machine learning algorithms like K-means clustering can be used to segment an image based on similarities of pixel attributes like color.

#### Dimensionality reduction

To recap, high-dimensional spaces can be difficult to work with due to the excessive number of variables involved. Excess features and variables can lead to overfitting, which is a phenomenon in statistics where a statistical model fits against its training data, affecting the accuracy of the algorithm being used to the point of obsoletion[5]. Dimensionality reduction is beneficial for improving the performance of algorithms, and preserving statistical significance in results because it gets rid of redundant data without eliminating relevant information that predictive models need.

Principal Component Analysis (PCA) reduces dimensionality by extracting only the variables you need into more manageable groups.

## Bonus topic: self-supervised learning

Self-supervised learning is a relatively new branch of machine learning in which there is no external supervisor. Basically, a self-supervised machine learning model trains itself to generate its own labels. This is especially useful in natural language processing (NLP), which is a branch of machine learning concerned with enabling machines to process and understand human text and speech.

Today, most natural language processing models utilize some form of self-supervised learning.

## Wrapping up and next steps

Machine learning and artificial intelligence are fantastic fields to explore for anyone who enjoys tackling highly complex challenges. If you liked learning about some of the differences between supervised and unsupervised machine learning, and are curious to learn more, you’re in luck.

There is a wealth of resources that are available to satisfy your curiosity and strengthen your knowledge in one of the most exciting fields of computer science.

If you’re eager to get more hands-on experience with machine learning, then Educative has a massive library of fun, interactive courses like Machine Learning for Software Engineers to check out!

Happy learning!

### Continue learning about machine learning

WRITTEN BYCrystal Song

Join a community of more than 1 million readers. A free, bi-monthly email with a roundup of Educative's top articles and coding tips.

Learn in-demand tech skills in half the time

Copyright ©2022 Educative, Inc. All rights reserved.