Search⌘ K

Hard-Margin SVM

Learn how to implement and optimize the hard-margin SVM.

In this lesson, we explore Hard-margin SVM, a type of support vector machine designed to perfectly separate two classes with a maximum margin hyperplane. Hard-margin SVM is ideal for datasets that are linearly separable and free of outliers. We’ll start by understanding the concept of a linearly separable dataset and the idea of maximizing the margin. Then, we’ll carefully derive the equivalent optimization problem, discuss the critical role of support vectors, and implement hard-margin SVM using cvxpy to visualize the decision boundary and the support vectors.

Linearly separable case

Given a binary classification dataset D={(x1,y1),(x2,y2),,(xn,yn)}D=\{(\bold x_1, y_1), (\bold x_2, y_2), \dots,(\bold x_n, y_n)\}, where xiRd\bold x_i \in \R^d and yi{1,1}y_i \in \{-1, 1\}, if a hyperplane wTϕ(x)=0\bold w^T\phi(\bold x)=0 exists that separates the two classes, the dataset is said to be linearly separable in the feature space defined by the mapping ϕ\phi. We’ll assume for now that the dataset DD is linearly separable. The goal is to find the hyperplane with a maximum margin by optimizing the following objective:

maxw[1wmini(yiwTϕ(xi))]\max_{\bold w}\bigg[\frac{1}{\|\bold w\|}\min_{i}\big(y_i\bold w^T\phi(\bold x_i)\big)\bigg]

The direct optimization of the above objective is complex. Here is the derivation of the equivalent, simplified optimization problem:

maxw[1wmini(yiwTϕ(xi))]=maxw1w[mini(yiwTϕ(xi))]=maxw,γγw                s.t.              yiwTϕ(xi)γ              i=maxw,γ1w/γ        s.t.              yi(w/γ)Tϕ(xi)1        i=maxw1w              s.t.              yi(wTϕ(xi)1              i=minww                  s.t. ...