Multi-target Linear Regression

Explore the concept of multi-target linear regression where multiple output variables are predicted simultaneously from input features. Understand matrix formulations, the Frobenius norm, and how to implement and evaluate this model using Python and sklearn. Gain practical skills in handling multi-output data sets for real-world data science problems.

We'll cover the following...

Multi-target data sets
Objective
- Frobenius norm
  - Frobenius norm in Python
Implementation
- Changes in the code
- Computing mean-squared error
Visualization
Putting it all together
Multi-target regression using sklearn

Multi-target data sets

It’s common in real applications to predict more than one target from given features. The data setUCI_Dataset we’ve discussed in the previous lesson comes from a similar scenario. We’ve already approximated a single target given its multiple features. In this lesson, we’ll predict both Y1 (heating load) and Y2 (cooling load), given X1 to X8 features. We can do so using multi-target linear regression. Formally, if $\bold{x_1},\bold{x_2},...,\bold{x_d}$ are the feature columns and $\bold{y_1},\bold{y_2},...,\bold{y_c}$ are the target columns, we can define data matrix as $A_{n \times d} = \begin{bmatrix}\bold{x_1}&\bold{x_2}&...&\bold{x_d}\end{bmatrix}$ and the target matrix as $Y_{n \times c} = \begin{bmatrix}\bold{y_1}&\bold{y_2}&...&\bold{y_c}\end{bmatrix}$ , where $n$ is the number of data points.

The linear system can then be modeled as:

AW=Y

where, $W=\begin{bmatrix}\bold{w_1}&\bold{w_2}&...&\bold{w_c}\end{bmatrix} \text{ is a }d \times c \text{ matrix of parameters.}$

Objective

The goal is to find $W$ that minimizes the objective $\|AW-Y\|_F^2$ , where the subscript $F$ represents the Frobenius norm.

Frobenius norm

A Frobenius norm of any matrix is defined as the square root of the sum of squares of all its elements. Formally, for any matrix, $A_{m \times n}$ , the Frobenius norm is $\|A\|_F=\sqrt{\sum_{i=1}^m\sum_{j=1}^na_{ij}^2}$ .

Frobenius norm in Python

We can use np.linalg.norm to compute different types of norms of vectors and matrices. For matrix input, the default type is the Frobenius norm, and we can also set it explicitly as a parameter, ord='fro' as described in the code below:

Changes in the code

Single target

def getAy(data):
    y = data.pop('Y2')
    y = np.array(y)[:,np.newaxis]
    A = np.array(data)
    A = np.hstack((np.ones((A.shape[0],1)),A))
    return A,y

Multiple targets

def getAY(data):
    y1 = data.pop('Y1')
    y1 = np.array(y1)[:,np.newaxis]
    y2 = data.pop('Y2')
    y2 = np.array(y2)[:,np.newaxis]
    Y = np.hstack((y1,y2))
    A = np.array(data)
    A = np.hstack((np.ones((A.shape[0],1)),A))
    return A,Y

Computing mean-squared error

For multiple targets, we define the mean-squared error as $\frac{1}{n}\|Y-\hat Y\|^2_F$ .

mse = (np.linalg.norm(test_Y_pred-test_Y)**2)/test_Y.shape[0]

The code below is a complete example of computing the mean-squared error.

Python 3.8

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
# Function to separate features and outputs
def getAY(data):
    y1 = data.pop('Y1')
    y1 = np.array(y1)[:, np.newaxis]
    y2 = data.pop('Y2')
    y2 = np.array(y2)[:, np.newaxis]
    Y = np.hstack((y1, y2))
    A = np.array(data)
    A = np.hstack((np.ones((A.shape[0], 1)), A))
    return A, Y
# Prepare data
df = pd.read_csv('reg_data.csv').drop('Unnamed: 0', axis=1)
train, test = train_test_split(df, test_size=0.2)
train_A, train_Y = getAY(train)
test_A, test_Y = getAY(test)
# Estimate parameters using least squares and test the model
W = np.linalg.lstsq(train_A, train_Y, rcond=None)[0]
test_Y_pred = test_A.dot(W)
mse = (np.linalg.norm(test_Y_pred-test_Y)**2)/test_Y.shape[0]
print(f'Mean Squared Error  = {mse}')

Visualization

We can make separate visualizations for each target attribute. However, a more intuitive way is to visualize the error of the prediction vector. If $\begin{bmatrix}\hat y_{i1}\\\hat y_{i2}\end{bmatrix}$ is the prediction vector for the $i^{th}$ test point, then we can define a difference vector as $\begin{bmatrix}y_{i1}-\hat y_{i1}\\y_{i2}-\hat y_{i2}\end{bmatrix}$ . The squared length of the difference vector is precisely the squared error. In the case of an exact solution, the difference vector is zero. When the difference vector is depicted as a point in the $xy$ plane, the good prediction will keep the point near to the origin. We can observe that most predictions are within the radius of $5$ (green circle).

Axes are defined as differences between predictions and ground truths. The green circle contains most of the predictions with error magnitude bounded by 5.

Python 3.8

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt
# Function to separate features and outputs 
def getAY(data):
    y1 = data.pop('Y1')
    y1 = np.array(y1)[:, np.newaxis]
    y2 = data.pop('Y2')
    y2 = np.array(y2)[:, np.newaxis]
    Y = np.hstack((y1, y2))
    A = np.array(data)
    A = np.hstack((np.ones((A.shape[0], 1)), A))
    return A, Y
# Function to visualize results
def plotResults(Y_true, Y_pred, title=''):
  plt.plot(Y_true[:, 0]-Y_pred[:, 0], Y_true[:, 1]-Y_pred[:, 1], 'r+')
  c1 = plt.Circle((0, 0), 5, color='g', alpha=1);plt.gca().add_patch(c1)
  c2 = plt.Circle((0, 0), 7, color='b', alpha=0.2);plt.gca().add_patch(c2)
  c3 = plt.Circle((0, 0), 9, color='r', alpha=0.2);plt.gca().add_patch(c3)
  plt.title(title); plt.axis('equal'); plt.axis('square')
  plt.xlim(plt.xlim()); plt.ylim(plt.ylim())
  plt.plot([0, 0], [-100, 100], 'k', linewidth = 1.2)
  plt.plot([-100, 100], [0, 0], 'k', linewidth=1.2)
  plt.grid(color = 'k', linestyle = '--', linewidth = 0.8)
  plt.legend([c1, c2, c3], ['r=5', 'r=7', 'r=9'])
  plt.savefig('output/graph.png', dpi=300)
# Load and prepare data
df = pd.read_csv('reg_data.csv').drop('Unnamed: 0', axis=1)
train, test = train_test_split(df, test_size=0.2)
train_A, train_Y = getAY(train)
test_A, test_Y = getAY(test)
# Estimate parameters and test the model
W = np.linalg.lstsq(train_A, train_Y, rcond=None)[0]
test_Y_pred = test_A.dot(W)
mse = (np.linalg.norm(test_Y_pred-test_Y)**2)/test_Y.shape[0]
print(f'Mean Squared Error  = {mse}')
# Plot the results
plotResults(test_Y, test_Y_pred, title='Multi-target Linear Regression')

Multi-target regression using `sklearn`

We can also use sklearn to perform multi-target linear regression as shown below:

from sklearn.linear_model import LinearRegression as LR
from sklearn.multioutput import MultiOutputRegressor as MLR
model = MLR(LR()).fit(train_A, train_Y)
test_Y_pred = model.predict(test_A)

The code below utilizes sklearn for multi-target linear regression:

Python 3.8

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression as LR
from sklearn.multioutput import MultiOutputRegressor as MLR
from matplotlib import pyplot as plt
# Function to visualize results
def plotResults(Y_true, Y_pred, title=''):
  plt.plot(Y_true[:, 0]-Y_pred[:, 0], Y_true[:, 1]-Y_pred[:, 1], 'r+')
  c1 = plt.Circle((0, 0), 5, color='g', alpha=1);plt.gca().add_patch(c1)
  c2 = plt.Circle((0, 0), 7, color='b', alpha=0.2);plt.gca().add_patch(c2)
  c3 = plt.Circle((0, 0), 9, color='r', alpha=0.2);plt.gca().add_patch(c3)
  plt.title(title); plt.axis('equal'); plt.axis('square')
  plt.xlim(plt.xlim()); plt.ylim(plt.ylim())
  plt.plot([0, 0], [-100, 100], 'k', linewidth = 1.2)
  plt.plot([-100, 100], [0, 0], 'k', linewidth=1.2)
  plt.grid(color = 'k', linestyle = '--', linewidth = 0.8)
  plt.legend([c1, c2, c3], ['r=5', 'r=7', 'r=9'])
  plt.savefig('output/graph.png', dpi=300)
# Function to separate features and output
def getAY(data):
    y1 = data.pop('Y1')
    y1 = np.array(y1)[:, np.newaxis]
    y2 = data.pop('Y2')
    y2 = np.array(y2)[:, np.newaxis]
    Y = np.hstack((y1, y2))
    A = np.array(data)
    A = np.hstack((np.ones((A.shape[0], 1)), A))
    return A, Y
# Prepare data
df = pd.read_csv('reg_data.csv').drop('Unnamed: 0', axis=1)
train, test = train_test_split(df, test_size=0.2)
train_A, train_Y = getAY(train)
test_A, test_Y = getAY(test)
# Train multi-linear regression model from sklearn
m = MLR(LR()).fit(train_A, train_Y)
# Compute results and error
test_Y_pred = m.predict(test_A)
mse = (np.linalg.norm(test_Y_pred-test_Y)**2)/test_Y.shape[0]
print(f'Mean Squared Error  = {mse}')
# Plot the results
plotResults(test_Y, test_Y_pred, title='Multi-target Linear Regression')

1.Introduction

2.Linearity

3.Matrices

4.Solving Linear Systems

5.Singularity

6.Linear Regression and Least Squares

7.Vector Space

8.Vector Spaces of a Matrix

9.Singular Value Decomposition: SVD

Mini Project

Multi-target Linear Regression

Multi-target data sets

Objective

Frobenius norm

Frobenius norm in Python

Implementation

Changes in the code

Single target

Multiple targets

Computing mean-squared error

Visualization

Putting it all together

Multi-target regression using `sklearn`

Multi-target Linear Regression

Multi-target data sets

Objective

Frobenius norm

Frobenius norm in Python

Implementation

Changes in the code

Single target

Multiple targets

Computing mean-squared error

Visualization

Putting it all together

Multi-target regression using sklearn

Multi-target regression using `sklearn`