Introduction to SVM
Gain an understanding of SVM, and the concepts of signed and unsigned distance.
We'll cover the following...
Support vector machine (SVM) is a popular and powerful supervised learning algorithm for classification and regression problems. It works by finding the best possible boundary between different classes of data points. In this lesson, we’ll cover the basic concepts and principles behind SVMs and see how they can be applied in practice.
What is SVM?
Suppose a person works for a bank, and their job is to decide whether to approve or reject loan applications based on the applicant’s financial history. They have a loan dataset with various features such as credit score, income, and debt-to-income ratio, along with past approval and rejection records. The task is to use SVM to build a predictive model for future loan applications.
First, they map each loan application into a feature space based on its features and label each loan application as either “approved” or “rejected,” which creates two different classes in the dataset. Next, they try to find a decision boundary that will separate the data linearly.
SVM finds the best hyperplane that separates the two classes, that is, the decision boundary that separates the approved and rejected loan applications. It finds the hyperplane that maximizes the margin, which is the distance between the hyperplane and the closest loan applications from each class. This means they want the hyperplane to be as far away as possible from the closest approved and rejected loan applications.
However, they don’t need to consider every loan application when determining the hyperplane. Only the loan applications that lie closest to the hyperplane, called support vectors, are used to determine its position. This approach makes SVM memory-efficient and allows us to create a model that can handle a large number of features.
The plot above shows two classifiers that separate the positive and negative classes of a dataset. The blue line represents the SVM classifier, whereas the green line represents the other classifier. The points on the dotted line are called support vectors because they’re the closest to the hyperplane, and the distance between the blue dotted lines is called the margin, which is what we want to maximize in SVM to get the best possible classifier. We can’t say that the green line is a hyperplane of SVM because it doesn’t have a maximum margin.
Note: SVM can be thought of as a generalized linear discriminant with maximum margin.
Signed & unsigned distance
In SVM, the hyperplane is defined by a weight vector and a bias term . The hyperplane equation can be written as . Here, represents a data point, represents the normal vector to the hyperplane, and represents the offset of the hyperplane from the origin. The signed distance of a point from the hyperplane can be defined as the distance between and the hyperplane, taking into account the direction of the normal vector. This distance is signed because it can be positive or negative depending on which side of the hyperplane the point is on.
If the normal vector and the direction of the distance from the hyperplane to the point is pointing in the same direction, then the distance is positive, but if they’re pointing in opposite directions, then the distance is negative. All the points that are above the hyperplane have a positive distance, while all the points that are below the hyperplane have a negative distance, as shown in the figure below.
Note: Unless stated otherwise, we assume the bias parameter as the part of the vector , and we’ll append to the feature vectors.
A hyperplane in the feature space defined by mapping can be defined as . Given a binary classification dataset ...