Root Mean Square Propagation (RMSProp)
Explore how RMSProp improves gradient descent by adapting learning rates with an exponential moving average of squared gradients. This lesson helps you understand its advantages over AdaGrad and how RMSProp accelerates convergence in non-convex optimization problems.
We'll cover the following...
Root Mean Square Propagation (RMSProp) is an adaptive learning rate optimization algorithm designed to address the shortcomings of the gradient descent algorithm.
The limitation of AdaGrad is that the adaptive learning rate decreases monotonically with time and, therefore, takes too long to converge. RMSProp, on the other hand, seeks to adapt the learning rate without monotonically decreasing the learning rate like AdaGrad.
How does RMSProp work?
The key idea behind RMSProp is to keep track of a limited number of previously squared gradients rather than all of them, as in AdaGrad. This is achieved by exponentially weighted moving average of the squared gradients. By using an exponential moving average, RMSProp avoids the issue of continually shrinking learning rates.
The update rule of RMSProp at a time
where