Multivariate Linear Regression
With an understanding of Regression, you can also learn about Multivariate Linear Regression in this lesson.
We'll cover the following
Multivariate Linear Regression
In Multivariate Linear Regression, we have multiple input or independent features. Based on these features, we predict an output column. Again, let’s use the Tips Dataset.
We will use the following columns from the dataset for Multivariate Analysis.

Total_bill: It is the total bill of food served.

Sex: It is the sex of the bill payer.

Size: It is the number of people visiting the restaurant.

Smoker Is the person a smoker or not?

Tip: It is the tip given on the meal.
Goal of Multivariate Linear Regression: The goal is to predict the “tip”, given all the independent features above. The Regression model constructs an equation to do so.

We plot the Scatter plot between the numeric independent variables (total_bill) and numeric output variable (tip) to analyze the relationship.

We plot the BoxPlot between the categorical independent variables (sex, size and smoker) and the numeric output variable (tip) to analyze the relationship.

You can see that the points in the Scatter plot are mostly scattered along the diagonal.

This indicates that there might be some positive correlation between the total_bill and tip. This will be fruitful in modeling.

We can see that males tend to give more tips than females.

There are some outliers in males who have given exceptional tips as can be seen on the upper whisker above. There is an outlier in females too.

We can see that the tip tends to increase with the number of people. It is visible from the upward trend of BoxPlots. So, this will be fruitful in modeling.

There are some outliers in the size of two and three.

We can see that people who smoke tend to give a little higher tip.

There are many outliers in the people who do not smoke.
Working
Multivariate Linear Regression comes up with the following equation in higher dimensions:
$\hat{y} = w_0 * x_0 + w_1 * x_1 + w_2 * x_2 + w_3 * x_3 + w_4 * x_4$ … $w_n * x_n$
Here, $x_0$ = 1
$x = \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ . \\ . \\ x_n \\ \end{bmatrix} \quad w = \begin{bmatrix} w_0 \\ w_1 \\ w_2 \\ . \\ . \\ w_n \\ \end{bmatrix} \quad w^T = \begin{bmatrix} w_0 & w_1 & w_2 ... w_n\\ \end{bmatrix} \quad$
$\hat{y} = w^T * x$
Goal: Find such values of $w_0$, $w_1$, $w_2$, … where $w_0$ and $w_1$ are the parameters, so that the predicted tip ("$\hat{y}$") is as much close to actual tip i.e ("$y$") as possible. Mathematically we can say that we have to minimize the following function.
$J(w)$ = $\frac{1}{2m}\sum_{i=1}^{m}(\hat{y}^iy^i)^2$
This time $\hat{y}^i$ is incorporating more than one parameter $w_0, w_1, ...$ and more than one features $x_0, x_1, ...$, compared to Univariate Linear Regression. $w$ is a vector with the dimensions $(n+1) * 1$
Gradient Descent
Gradient descent changes as below
Repeat until convergence {
$w_j = w_j  \alpha \frac{\partial}{\partial w_j} J(w)$
}
 Here j = 0, 1, 2, 3, …
 $\frac{\partial}{\partial w_j} J(w)$ = $\frac{1}{m} \sum_{i=1}^{m}(\hat{y}^iy^i) * x_j^i$
So, the above equation becomes
Repeat until convergence {
$w_j = w_j  \alpha \frac{1}{m} \sum_{i=1}^{m}(\hat{y}^iy^i) * x_j^i$
}
Acknowledgement
I would like to thank Professor Andrew Ng from Stanford University for providing amazing resources to explain the mathematical foundations of models.
Get handson with 1200+ tech skills courses.