The Nesterov Momentum

Explore how the Nesterov momentum method improves gradient descent for non-convex problems by maintaining a velocity vector that helps escape local optima. Learn to implement this technique using the Rosenbrock function and visualize its convergence to the global optimum.

We'll cover the following...

Need for momentum
How does the Nesterov momentum work?
Implementation of the Nesterov momentum

When applied to non-convex optimization, we cannot guarantee the convergence of gradient descent to the global optimal solution. It often gets stuck at a local optimum because the gradient vanishes at that point and we cannot perform updates anymore.

Similar to the “ball falling down a valley” situation above, we also need a sense of momentum in non-convex optimization to escape a local optimum. The Nesterov momentum is a popular technique that mimics this behavior by maintaining a velocity vector that is an exponential moving average of negative gradients.

How does the Nesterov momentum work?

At every step, it then performs an update in the direction of the velocity vector. In simple terms, a velocity vector is an average direction that can be used to perform updates when the actual gradient is zero.

The Nesterov momentum update at a time $t$ ...

1.Introduction to Optimization

2.Vector Calculus

3.Convex Optimization

4.Gradient Descent for Non-Convex Optimization

Project

5.Constrained Optimization

6.Miscellaneous Methods

7.Course Conclusion

Assessment

Mini Project

The Nesterov Momentum

Need for momentum

How does the Nesterov momentum work?