Add More Dimensions

Explore how to extend linear regression to multiple input variables by using multiple linear regression and matrix operations. Understand the transition from simple lines to planes and hyperspaces in data, enabling more accurate predictions with multiple variables.

We'll cover the following...

What we have covered so far
More variables, more dimensions

What we have covered so far

In the previous two chapters, we predicted output from an input. A restaurant’s pizza sales from its reservations. Most interesting real-world problems, however, have more than one input. Even something as simple as pizza sales is not likely to depend on reservations alone. For example, if there are many tourists in town, the restaurant will probably sell more pizzas, even if it has as many reservations as yesterday.

If pizza sales have many variables, imagine how many variables we’ll have to consider once we get into complex domains, like recognizing pictures. A learning program that only supports one variable will never solve those hairy problems. If we ever want to tackle them, we would better upgrade our program to support multiple input variables.

We can learn from multiple input variables with an advanced version of linear regression called multiple linear regression. In this chapter, we’ll extend our program to support multiple linear regression. We’ll also add a few tricks to our bag, including a couple of useful matrix operations and several NumPy functions. Let’s dive right in!

More variables, more dimensions

In the previous chapter, we coded a gradient descent-based version of our learning program. The advanced program can potentially scale to complex models with more than one variable.

In a moment of weakness, we mentioned that opportunity to our friend Roberto. That was a mistake. Now Roberto is all pumped up about forecasting pizza sales from a bunch of different input variables besides reservations, such as the weather, or the number of tourists in town.

This is going to be more work for us, and we can not blame the pizza restaurant owner for wanting to add variables to the model. After all, the more variables we consider, the more likely we’ll get accurate predictions of pizza sales.

Let’s start with a detailed version of the old pizza.txt file. Here are the first few lines of this new dataset:

Reservations	Temperature	Pizzas
13	26	44
2	14	23
14	20	28

The owner suspects that more people drop into their pizzeria on warmer days, so they keep track of the temperature in degrees Celsius. (For reference, 2 °C is almost freezing and 26 °C is normal temperature). Now the third column contains labels (the pizzas), and the first two contain input variables.

First, let’s see what happens to linear regression when we move from one to two input variables. We know that linear regression is about approximating the examples with a line, like this:

Python 3.5

# Plot a plane that roughly approximates a dataset with two input variables.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits import mplot3d
import seaborn as sns
# Import the dataset
x1, x2, x3, y = np.loadtxt("pizza_3_vars.txt", skiprows=1, unpack=True)
# These weights came out of the training phase
w = np.array([-3.98230894, 0.37333539, 1.69202346])
# Plot the axes
sns.set(rc={"axes.facecolor": "white", "figure.facecolor": "white"})
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.set_xlabel("Temperature", labelpad=15, fontsize=15)
ax.set_ylabel("Reservations", labelpad=15, fontsize=15)
ax.set_zlabel("Pizzas", labelpad=5, fontsize=15)
# Plot the data points
ax.scatter3D(x1, x2, y, color='b')
# Plot the plane
MARGIN = 10
edges_x = [np.min(x1) - MARGIN, np.max(x1) + MARGIN]
edges_y = [np.min(x2) - MARGIN, np.max(x2) + MARGIN]
xs, ys = np.meshgrid(edges_x, edges_y)
zs = np.array([w[0] + x * w[1] + y * w[2] for x, y in
              zip(np.ravel(xs), np.ravel(ys))])
ax.plot_surface(xs, ys, zs.reshape((2, 2)), alpha=0.2)
plt.show()

We can calculate $\hat{y}$ by using the equation of a plane. That’s similar to the equation of a line but it has two input variables, $x1$ and $x2$ , and two weights, $w1$ and $w2$ :

\large{\hat{y} = x_1 * w_1 + x_2 * w_2 + b}

If we do not need a separate weight for each input variable, consider that in Roberto’s dataset, $x1$ is the number of reservations and $x2$ is the temperature. It makes sense that the reservations and the temperature have different impacts on the number of pizzas, so they must have different weights.

In the equation of a line, the bias $b$ shifts the line away from the origin. The same goes for a plane: if we did not have $b$ , then the plane would be constrained to pass by the origin of the axes. If we want to prove that, just set all the input variables to $0$ . Without a bias, $\hat{y}$ would also be $0$ . Thanks to the bias, the plane is free to shift vertically and find the position where it approximates the points as much as it can.

Now see what happens when the pizzeria owner adds yet another column to their dataset:

Reservations	Temperature	Tourists	Pizzas
13	26	9	44
2	14	6	23
14	20	3	28

This new input variable shows the density of tourists in town, downloaded from the local tourist office’s website. It ranges from 1 (not a soul in town) to 10 (tourist invasion).

We started by approximating bi-dimensional examples with a one-dimensional model. Then we moved on to approximate three-dimensional examples with a bi-dimensional model. Now that we have four-dimensional examples, we have to approximate them with a three-dimensional model. And this process continues as we add more input variables. To approximate examples with n dimensions, we need an $(n−1)$ -dimensional shape.

Humans cannot perceive more than three spatial dimensions. However, math has no problem dealing with those sanity-bending multidimensional spaces. It just calls them hyperspaces and describes them with the same equations as bi-dimensional and three-dimensional spaces. However, no matter how many dimensions we have, we can just add input variables and weights to the formula of the line and the plane:

\large{\hat{y} = x_1 * w_1 + x_2 * w_2 + x_3 * w_3 + ... + b}

This formula is called the weighted sum of the inputs. The equation of a line is a special case of this equation. It is the weighted sum of a single input. So, here’s a simple plan to upgrade our learning program from one to many input variables: we’ll replace the equation of a line with the more generic formula of the weighted sum.

1.How Machine Learning Works

2.Our First Learning Program

3.Walking the Gradient

4.Hyperspace

5.A Discern Machine

6.Get Real

7.The Final Challenge

8.The Perceptron

9.Designing the Network

10.Building the Network

11.Training the Network

12.How Classifiers Work

13.Batchin’ Up

14.The Zen of Testing

15.Let’s Do Development

16.A Deeper Kind of Network

Project

17.Defeating Overfitting

18.Taming Deep Networks

19.Beyond Vanilla Networks

20.Into the Deep

Project

Mock Interview

Add More Dimensions

What we have covered so far

More variables, more dimensions