Nesterov Accelerated Gradient (NAG)
Learn how Nesterov Accelerated Gradient (NAG) optimizes non-convex problems by anticipating future gradients to improve convergence speed and stability. Discover how NAG differs from standard momentum methods and how it can avoid local optima, enhancing optimization in machine learning tasks using practical examples like the Rosenbrock function.
We'll cover the following...
What is NAG?
Consider the scenario where a company wants to determine the optimal production rate and the optimal selling price for one of its products to maximize profit, which is given by a non-convex objective having several local optimums.
NAG is a variant of gradient descent with momentum that improves the convergence rate and the stability of gradient descent. The main idea is to use a look-ahead term to calculate the gradient at a future point rather than the current point. This way, the algorithm can anticipate the direction of the optimal solution and avoid overshooting or oscillating. The figure below illustrates the idea:
The NAG update at a time
Here,