Steepest Descent

Learn how to find the optimal step size in gradient descent using the method of steepest descent.

The method of steepest descent

So far, in gradient descent, we use the step size α\alpha to control the amount of descent we like to perform at a particular point. Choosing a large step size makes the algorithm unstable, whereas choosing a small step size requires more iterations to reach convergence.

Steepest descent works by finding the optimal step size for the gradient descent. For a convex function f(x)f(x), the steepest descent update at a time t>0t>0 can be written as follows:

The optimal step size αt\alpha_t at the time tt can then be found by optimizing the following objective:

where ϕt(α)=f(xt1αt(xf(xt1))T)\phi_t(\alpha) = f(x_{t-1} - \alpha_t(\nabla_x f(x_{t-1}))^T) ...