...

/

Nesterov Accelerated Gradient (NAG)

Nesterov Accelerated Gradient (NAG)

Learn how Nesterov Accelerated Gradient (NAG) can be used to escape local optimum in non-convex optimization.

What is NAG?

Consider the scenario where a company wants to determine the optimal production rate and the optimal selling price for one of its products to maximize profit, which is given by a non-convex objective having several local optimums.

NAG is a variant of gradient descent with momentum that improves the convergence rate and the stability of gradient descent. The main idea is to use a look-ahead term to calculate the gradient at a future point rather than the current point. This way, the algorithm can anticipate the direction of the optimal solution and avoid overshooting or oscillating. The figure below illustrates the idea:

Nesterov momentum
Nesterov momentum
NAG
NAG

The NAG update at a time tt is given as follows:

Here, vtv_t is the velocity vector that accumulates the gradient over time, β\beta is the momentum coefficient that controls how much of the previous velocity is retained, α\alpha is the learning rate that scales the gradient, θJ\nabla_\theta J ...