Search⌘ K
AI Features

Gradients of Matrices

Explore how to calculate gradients when matrices are involved, crucial for optimization tasks in machine learning. Understand the role of Jacobians, partial derivatives, and useful matrix calculus identities to enhance your ability to compute gradients efficiently within vector calculus.

Gradient with respect to matrices

Many times, machine learning objectives, such as minimizing the loss function of a linear regression, can be written using matrices and vectors, making them compact and easy to understand. Therefore, to make computations easier, it is worthwhile to understand how gradients are computed when matrices are involved.

The gradient of matrices with respect to vectors (or matrices) can also be computed like the Jacobian of vector-valued functions. The Jacobian can be thought of as a multi-dimensional tensor that is a collection of partial derivatives. For example, the gradient for a m×nm \times n matrix AA with respect to the p×qp \times q matrix BB will be a (m×n)×(p×q)(m \times n) \times (p \times q) Jacobian whose entries will be given as follows:

To understand better, let’s consider a function f=Axf = Ax, where fRmf \in \R^m, ARm×nA \in \R^{m \times n} , and xRnx \in \R^n. Here, we want to calculate the gradient df/dAdf/dA. To do so, we will start by determining the dimensions of the Jacobian, as shown below:

By definition, the gradient is a collection of partial derivatives and, as a result, df/dAdf/dA can be written as follows:

Each fif_i can be explicitly written as fi=j=1nAijxjf_i = \sum_{j=1}^n A_{ij}x_j ...