Popular Optimization Algorithms

Discover the most frequently-used alternatives of gradient descent and the intuition behind them.

Concerns on SGD

This basic version of SGD comes with some limitations and problems that might negatively affect the training.

  1. If the loss function changes quickly in one direction and slowly in another, it may result in a high oscillation of gradients making the training progress very slow.

  2. If the loss function has a local minimum or a saddle point, it is highly likely that SGD will be stuck there without being able to “jump out” and proceed in finding a better minimum. This happens because the gradient becomes zero, so there is no update in the weight whatsoever.

A saddle point is a point on the surface of the graph of a function where the slopes (derivatives) are all zero but which is not a local maximum of the function.

Get hands-on with 1200+ tech skills courses.