Higher-Order Gradients: Hessian

Learn about higher-order gradients and the Hessian matrix.

Higher-order derivatives

Gradients and derivatives are also known as first-order derivatives. Algorithms like Newton’s method require computing higher-order derivatives, such as second-order derivatives. Given a multivariate function f(x1,x2,...,xm):RmRf(x_1, x_2, ..., x_m): \R^m \rightarrow \R, higher-order gradients/derivatives are represented as follows:

  • 2fxi2\frac{\partial^2 f}{\partial x_i^2} is known as the second partial derivative of ff with respect to xix_i. In other words, the second-order derivative is the derivative of the first-order derivative xifxi\frac{\partial}{\partial x_i} \frac{\partial f}{\partial x_i} .

  • nfxin\frac{\partial^n f}{\partial x_i^n} is the nnth order derivative of ff with respect to xix_i.

  • 2fxjxi=xjfxi\frac{\partial^2 f}{\partial x_j \partial x_i} = \frac{\partial}{\partial x_j} \frac{\partial f}{\partial x_i} is the partial derivative obtained by first partial differentiating with respect to xix_i and then with respect to xjx_j.

  • 2fxixj=xifxj\frac{\partial^2 f}{\partial x_i \partial x_j} = \frac{\partial}{\partial x_i} \frac{\partial f}{\partial x_j} is the partial derivative obtained by first partial differentiating with respect to xjx_j and then with respect to xix_i.

The Hessian matrix HH is a collection of all the second-order partial derivatives with the (i,j)th(i,j)^{th} entry given as Hij=2fxixjH_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}. In other words,

Get hands-on with 1400+ tech skills courses.