Kernel Trick Can Be Dangerous

Understand the importance of choosing an appropriate kernel and keeping an eye on model complexity.

The use of the kernel trick seems very effective and efficient. Although not every machine learning algorithm can be reformulated to incorporate kernels, but many popular algorithms that rely on dot products, inner products, or distances can be reformulated to incorporate kernels. However, the use of kernels in machine learning algorithms can become dangerous when the choice of kernel isn’t appropriate for the given problem. For example, using a linear kernel on a highly nonlinear dataset can result in underfitting, while using a polynomial or RBF kernel on a linearly separable dataset can lead to overfitting.

The model complexity

While using the kernel trick, it’s important to keep an eye on the number of parameters in the feature space. After all, a kernel function is achieving a dot product in some feature space defined by ϕ\phi. For example, if we use the RBF kernel in generalized linear regression, the number of parameters in the feature space becomes infinite. This is because the RBF kernel maps the input vectors to infinite dimensional space, and a linear model in that space requires the same number of parameters. The proof of this mapping to an infinite-dimensional feature space is based on Mercer’s theorem, which results in the expansion of the RBF kernel as an infinite sum of kernel functions. Unknowingly, we can end up with a very complex model and can tend to overfit.

The choice of the kernel is a hyperparameter

It’s important to choose a kernel that’s appropriate for the given problem and to validate the results using cross-validation.

Note: To showcase how different kernels can impact model performance, we created a Streamlit application that allows us to choose various datasets and kernels and observe the resulting decision boundary of a support vector machine (SVM)Support Vector Machine (SVM) is a machine learning algorithm that finds the optimal boundary between data classes. classifier. This application’s purpose is to help us gain an intuitive understanding of how different kernels work on different datasets, which can assist us in making informed decisions when selecting a kernel for our machine learning problems.

Get hands-on with 1200+ tech skills courses.