Know what to expect from this chapter and why optimization algorithms are needed.

We know already how to solve optimization problems using derivatives and gradients. We can solve them no matter the number of variables. Then why isn’t the course over? Why do we need optimization algorithms?

Why optimization algorithms?

First of all, we’ve already seen an optimization algorithm. Using derivatives and gradients the way we did is an algorithm itself. We followed a well-structured, clear, and finite list of steps to achieve the goal of minimizing or maximizing a function.

So technically, these won’t be the first algorithms we’re going to know. But in essence, these are quite different. The analytical method (the method we already know with derivatives and gradients) gives us a list of candidate points by solving some equations. We just need to make some tests on those points to determine which one is a minimum and which one is a maximum.

The algorithms we’re going to see in this chapter implement a different approach. They’re more trial and error algorithms, where we evaluate the function at some point and then move toward some direction in the hope that we’re going toward a minimum. Of course, we’ll use some clever techniques to find a good direction. This is what we call an iterative process. On every iteration, we make a decision that can be based on previous iterations.

Reading both approaches, the analytical and the iterative ones, the former still seems more robust and reliable. But let’s dig deeper and find out why iterative algorithms are necessary.


We’ve mentioned this before. Some functions don’t have defined derivatives at some points. We saw the function x|x|; we know it has a minimum at x=0x = 0, but it’s impossible to reach that minimum with a computer program using the analytical method.

We can think that these are some strange functions that never arise in practice. But they do arise, and they can be even stranger than x|x|. So, the first reason to try other approaches and algorithms is the existence of functions that can’t be optimized with analytical approaches. It’s as simple as that. But even when we can use analytical methods, there are some advantages to using an iterative one.

Execution time

In analytical methods, we have steps like:

  • Solve the equation f=0f' = 0 or the system of equations f=0\nabla f = 0.
  • Calculate the determinant of the Hessian.

We solved examples with one, two, or three variables. But, for example, in machine learning, we usually have problems with hundreds, thousands, or even millions of variables! Steps like solving such a huge system of equations or calculating the determinant of that enormous matrix are too time-consuming even for a computer.

Iterative methods consume fewer resources to find a minimum of a function. We don’t have to calculate determinants or solve huge systems of equations.

These are the main reasons we’re going to study iterative algorithms: the existence of nondifferentiable functions and the speed and lower resource consumption of these new algorithms. But there’s yet another important reason we think is worth mentioning: Today, these algorithms are incredibly effective, fast, and robust. The reliability gap between analytical and iterative has become so thin that, in practice, it’s almost nonexistent.

Iterative algorithms unleash a great new power—the power of solving problems with millions of variables. They bring optimization to our real world. Let’s discover what they are and how to use them!

Iterative algorithms

We’re going to focus on three iterative algorithms:

  • Binary search
  • Gradient descent
  • Newton’s method

Some of these still use derivatives to make decisions (so they still struggle with nondifferentiable functions). The general approach is always the same: find a new point that’s closer to the minimum than all the other points we’ve found so far.

Remember: We can talk about the minimum and forget about the maximum since maxf=minf\max f = \min-f

We’ll see that the two main problems in these algorithms are:

  • What’s the first point to try?
  • How do we determine the next point from the previous one?

We’ll need to deal with a new concept: convergence.

An algorithm converges to a solution when it gets closer to that solution in each subsequent iteration
An algorithm converges to a solution when it gets closer to that solution in each subsequent iteration

An iterative algorithm converges if, in each new iteration, it’s nearer to the desired solution. An algorithm that doesn’t converge is not worthy of our time and effort. Convergence is the most essential property that these algorithms can have.

But the fact that an algorithm converges to the solution isn’t all that we look for. We need that convergence to be fast. The velocity of the convergence can be seen as the number of steps needed by the algorithms to find the optimal solution. This is also important as a slow convergence could make the solution impractical.

Of course, we made the previous definitions of convergences and velocity of convergence informally (so don’t show them to a mathematician!). They have rigorous and formal definitions that allow mathematicians to analyze the behavior of each algorithm in depth. But we’re good with our funnier and easier definitions. Don’t worry though, our approach won’t compromise thorough analysis. Thinking in an intuitive way will allow us to develop a deeper understanding of all these algorithms and use them more effectively.

Now it’s time to begin this new journey!