# Kernel Linear Regression

Learn to implement kernel linear regression for a single target.

We'll cover the following

## Single target example

It’s possible to reformulate generalized linear regression to incorporate the kernel trick. For example, the loss function $L(\bold w)$ for generalised linear regression with a single target is as follows:

$L(\bold w)= \|\phi(X) \bold w-\bold y\|_2^2 + \lambda \|\bold w\|_2^2$

Note:

$\bold w^T\bold w = \|\bold w\|_2^2$

Setting the derivative of the loss with respect to $\bold w$ to $\bold 0$ results in the following:

\begin{align*} & \phi(X)^T(\phi(X)\bold w-\bold y)+\lambda \bold w = \bold 0 \\ & \bold w = -\frac{1}{\lambda}\phi(X)^T(\phi(X)\bold w-\bold y) \\ & \bold w = \phi(X)^T\bold a \\ \end{align*}

Here, $\bold a=-\frac{1}{\lambda}(\phi(X)\bold w-\bold y)$.

### Reparameterization

We can now parametrize the loss function with parameter vector $\bold a$ by replacing $\bold w$ with $\phi(X)^T\bold a$, as follows:

\begin{align*} L(\bold a)&= \|\phi(X) \phi(X)^T\bold a-\bold y\|_2^2 + \lambda \|\phi(X)^T\bold a\|_2^2 \\ &= \|\phi(X) \phi(X)^T\bold a-\bold y\|_2^2 + \lambda \bold a^T \phi(X)\phi(X)^T\bold a \\ &= \|K\bold a-\bold y\|_2^2 + \lambda \bold a^T K\bold a \\ \end{align*}

### Closed-form solution

Setting the derivative of the loss $L(\bold a)$ with respect to $\bold a$ to $\bold 0$ results in the following:

$K^T(K\bold a - \bold y)+\lambda K \bold a = \bold 0$

As the Gram matrix $K$ is symmetric, that is, $K^T=K$, so the above equation can be written as follows:

\begin{align*} & K(K\bold a - \bold y)+\lambda K \bold a = \bold 0 \\ & K(K\bold a - \bold y + \lambda \bold a) = \bold 0 \\ & (K + \lambda I)\bold a = \bold y \\ & \bold a = (K + \lambda I)^{-1} \bold y \end{align*}

### Prediction

Once $\bold a$ is computed, the prediction $\hat y_t$ on an input vector $\bold x_t$ can be made as follows:

\begin{align*} \hat y_t &= \bold w^T \phi(\bold x_t)\\ &= \bold a^T \phi(X) \phi(\bold x_t) \\ &= \begin{bmatrix}a_1 & a_2 & \dots & a_n\end{bmatrix} \begin{bmatrix}\phi(\bold x_1)^T\phi(\bold x_t) \\ \phi(\bold x_2)^T\phi(\bold x_t) \\ \vdots \\ \phi(\bold x_n)^T\phi(\bold x_t)\end{bmatrix} \\ \\ &= \begin{bmatrix}a_1 & a_2 & \dots & a_n\end{bmatrix} \begin{bmatrix}k(\bold x_1,\bold x_t) \\ k(\bold x_2,\bold x_t) \\ \vdots \\ k(\bold x_n,\bold x_t)\end{bmatrix} \end{align*}

## Implementation

We now implement the generalized linear regression for a single target using the kernel trick.

