Gradients of Matrices
Explore how to calculate gradients when matrices are involved, crucial for optimization tasks in machine learning. Understand the role of Jacobians, partial derivatives, and useful matrix calculus identities to enhance your ability to compute gradients efficiently within vector calculus.
We'll cover the following...
Gradient with respect to matrices
Many times, machine learning objectives, such as minimizing the loss function of a linear regression, can be written using matrices and vectors, making them compact and easy to understand. Therefore, to make computations easier, it is worthwhile to understand how gradients are computed when matrices are involved.
The gradient of matrices with respect to vectors (or matrices) can also be computed like the Jacobian of vector-valued functions. The Jacobian can be thought of as a multi-dimensional tensor that is a collection of partial derivatives. For example, the gradient for a
To understand better, let’s consider a function
By definition, the gradient is a collection of partial derivatives and, as a result,
Each