Quasi-Newton Methods

Learn how to approximate Newton’s methods when Hessians are difficult or expensive to compute.

Quasi-Newton methods are a class of second-order optimization algorithms that are particularly useful when the function’s Hessian is difficult to compute or not available.

Consider the update rule of the Newton algorithm at the time tt as follows:

Here, H(x)H(x) is the Hessian of the function f(x)f(x) at the point xx. Recall that the complexity of computing the Hessian is of the order O(m2)O(m^2) as opposed to the gradient, which is of the order O(m)O(m). This means that as the dimensionality (mm) of xx increases, the cost of computing the Hessian increases quadratically. Furthermore, the update rule in Newton’s method involves the inverse of the Hessian matrix, which can also become computationally expensive to calculate, especially for high-dimensional problems.

The key idea behind quasi-Newton methods is to approximate the inverse of Hessian using only the first-order derivative information, i.e., the gradient. This makes these methods more computationally efficient than Newton’s method, especially for problems with many variables. The update rule of quasi-Newton methods is the same as Newton’s method, simply replacing the Hessian inverse H1(xt1)H^{-1}(x_{t-1}) with its approximation Bt1B_{t-1} as follows:

The Broyden-Fletcher-Goldfarb-Shanno (BFGS) method

One of the most popular quasi-Newton methods is the BFGS method. The BFGS algorithm approximates the true Hessian inverse by starting with an initial guess. Then, in subsequent iterations, it updates its approximation of the Hessian inverse to a better estimate using an update rule.

The update rule ensures that the updated approximation of the Hessian inverse satisfies the secant equation, which is a necessary condition for the updated approximation to be a good estimate of the true Hessian. The secant equation is given as follows:

where:

  • Bt+1B_{t+1} ...