Search⌘ K

Kernel Linear Regression

Learn to implement kernel linear regression for a single target.

In this lesson, we extend the concepts of kernels and the Gram matrix to linear regression. We show how generalized linear regression can be reformulated using the kernel trick, allowing us to model non-linear relationships without explicitly computing transformed features. By parameterizing the model in terms of the Gram matrix, we can derive a closed-form solution for kernel linear regression. We will also explore how to make predictions using different kernel functions and implement the model in practice, connecting the theory directly to computation.

Single target example

It’s possible to reformulate generalized linear regression to incorporate the kernel trick. For example, the loss function L(w)L(w) for generalized linear regression with a single target is as follows:

L(w)=ϕ(X)wy22+λw22L(w)= \|\phi(X)w- y\|_2^2 + \lambda \| w\|_2^2

Note: wTw=w22w^Tw = \|w\|_2^2

The optimal weight vector ww is found by setting the gradient of the loss function, L(w)\nabla L(w), to the zero vector 00.

For calculating the gradient, the derivative with respect to ww of the squared error term is 2ϕ(X)T(ϕ(X)wy)2\phi(X)^T (\phi(X)w - y). The derivative of the regularization term is 2λw2\lambda w.

Summing the derivatives and setting the result to zero yields:

2ϕ(X)T(ϕ(X)wy)+2λw=02\phi(X)^T (\phi(X)w - y) + 2\lambda w = 0

Dividing the entire equation by 2 gives the simplified starting point:

ϕ(X)T(ϕ(X)wy)+λw=0\phi(X)^T(\phi(X)w-y)+\lambda w = 0

Isolate ww:

ϕ(X)T(ϕ(X)wy)+λw=0λw=ϕ(X)T(ϕ(X)wy)w=1λϕ(X)T(ϕ(X)wy)w=ϕ(X)Ta\begin{align*} & \phi(X)^T(\phi(X)w-y)+\lambda w = 0 \\ & \lambda w = - \phi(X)^T(\phi(X)w-y) \\ & w = -\frac{1}{\lambda}\phi(X)^T(\phi(X)w-y) \\ & w = \phi(X)^Ta \\ \end{align*} ...