Search⌘ K
AI Features

Steepest Descent

Explore the steepest descent method to optimize step sizes during gradient descent. Understand how it improves convergence speed for convex functions and implement it using Python with practical examples.

The method of steepest descent

So far, in gradient descent, we use the step size α\alpha to control the amount of descent we like to perform at a particular point. Choosing a large step size makes the algorithm unstable, whereas choosing a small step size requires more iterations to reach convergence.

Steepest descent works by finding the optimal step size for the gradient descent. For a convex function f(x)f(x), the steepest descent update at a time t>0t>0 can be written as follows:

The optimal step size αt\alpha_t at the time tt can then be found by optimizing the following objective:

where ϕt(α)=f(xt1αt(xf(xt1))T)\phi_t(\alpha) = f(x_{t-1} - \alpha_t(\nabla_x f(x_{t-1}))^T) ...