Search⌘ K
AI Features

Introduction to SVM

Explore the core concepts of Support Vector Machines (SVM), a supervised learning algorithm used for classification and regression. Understand how SVM finds the best hyperplane with maximum margin to separate classes, leverages support vectors, and applies kernels. This lesson also covers SVM advantages, limitations, and practical implementation with Python examples.

Support vector machine (SVM) is a popular and powerful supervised learning algorithm for classification and regression problems. It works by finding the best possible boundary between different classes of data points. In this lesson, we’ll cover the basic concepts and principles behind SVMs and see how they can be applied in practice.

What is SVM?

Suppose a person works for a bank, and their job is to decide whether to approve or reject loan applications based on the applicant’s financial history. They have a loan dataset with various features such as credit score, income, and debt-to-income ratio, along with past approval and rejection records. The task is to use SVM to build a predictive model for future loan applications.

First, they map each loan application into a feature space based on its features and label each loan application as either “approved” or “rejected,” which creates two different classes in the dataset. Next, they try to find a decision boundary that will separate the data linearly.

SVM finds the best hyperplane that separates the two classes. A hyperplane is simply a decision boundary: a line in 2D, a plane in 3D, or an (N1)(N-1)-dimensional flat subspace in an NN-dimensional feature space.

The best hyperplane is the one that maximizes the margin, which is the distance between the hyperplane and the closest data points from each class. These closest points, which lie on the margin boundaries, are called the support vectors. This means the goal is to position the hyperplane so it is as far away as possible from the nearest approved and rejected loan applications.

Maximizing the margin

Any line (or plane) can separate the data, but only one is the maximum margin classifier. Maximizing this margin achieves two crucial goals:

  1. Increased generalization: A larger margin provides a safety buffer. If the hyperplane is too close to a data point, a small change in a new applicant’s features could cause the model to misclassify them. A large margin ensures the decision boundary is robust and makes the most confident prediction possible for unseen data.

  2. Focus on support vectors: It is not necessary to consider every loan application when determining the hyperplane. Only the support vectors—the loan applications that lie closest to the hyperplane—are used to determine its precise position and orientation. All other points can be removed without affecting the final decision boundary. This approach makes support vector machines (SVMs) memory-efficient and enables the construction of models driven by the most critical and hardest-to-classify data points.

SVM vs. other classifier
SVM vs. other classifier

The plot above shows two classifiers that separate the positive and negative classes of a dataset. The blue line represents the SVM classifier, whereas the green line represents the other classifier. The points on the dotted line are called support vectors because they’re the closest to the hyperplane, and the distance between the blue dotted lines is called the margin, which is what we want to maximize in SVM to get the best possible classifier. We can’t say that the green line is a hyperplane of SVM because it doesn’t have a maximum margin.

Note: SVM can be thought of as a generalized linear discriminant with maximum margin.

Signed and unsigned distance

In SVM, the hyperplane is defined by a weight vector ww and a bias term bb. The hyperplane equation can be written as wTx+b=0w^T x + b = 0. Here, xx represents a data point, ww represents the normal vector to the hyperplane, and bb represents the offset of the hyperplane from the origin. The signed distance of a point xix_i from the hyperplane measures how far the point lies from the hyperplane, while also indicating which side of the hyperplane it is on. The distance is signed because it can be positive or negative depending on the direction relative to the normal vector.

If the direction from the hyperplane to the point is aligned with the normal vector ww, the distance is positive. If it points in the opposite direction, the distance is negative. Points above the hyperplane, therefore, have positive distance, and points below it have negative distance.

Note: Unless stated otherwise, we assume the bias parameter as part of the vector ww, and we’ll append 11 to the feature vectors.

Signed and unsigned distance
Signed and unsigned distance

A hyperplane in the feature space defined by mapping ϕ\phi can be defined as wTϕ(x)=0w^T\phi(x)=0. Given a binary classification dataset: D={(x1,y1),(x2,y2),,(xn,yn)}D=\{(x_1, y_1), (x_2, y_2), \dots,(x_n, y_n)\} ...