Kernel SVM and Sparsity

Learn how to implement kernel SVM and observe sparsity in the solution vector for better generalization.

We'll cover the following

Kernels in SVM

The dual formulation straightforwardly offers kernelization of SVM. As we notice in the following dual optimization problem, the Gram matrix KK can be computed using any kernel function:

maxaaT112ayTKays.t.0aC\begin{aligned} \max_{\bold a} \quad & \bold a^T\bold 1 - \frac{1}{2}\bold a^T_{\bold y}K\bold a_{\bold y}\\ \textrm{s.t.} \quad & 0\le \bold a \le C \end{aligned}

The prediction ayTΦ(X)Tϕ(xt)\bold a^T_{\bold y}\Phi(X)^T\phi(\bold x_t) can also be made using the same kernel function in place of Φ(X)Tϕ(xt)\Phi(X)^T\phi(\bold x_t).


The following code implements a binary classification SVM using various kernel functions (linear, polynomial, and RBF) on a synthetic dataset. It splits the data into training and test sets, fits the SVM using the training set, and evaluates the accuracy on the test set. The SVM optimization problem is formulated using cvxpy and solved using a convex optimization solver. Additionally, the code generates a decision boundary plot for visualization.

Get hands-on with 1200+ tech skills courses.