...

Gradient Descent

Learn about the gradient descent algorithm and the mathematics behind it.

We'll cover the following...

Gradient descent requirements
The gradient descent algorithm
Gradient descent implementation with Python

We learned about the binary search algorithm. Among the benefits of this algorithm is the possibility of optimizing a function even if we don’t know its formula. On the other hand, this method requires the function to be monotonic and the restriction (if any) should divide the x-axis into two separable and continuous regions. We can’t rely on binary search to solve all optimization problems because these conditions are too restrictive, and it’s not clear how to extend the method to solve multivariate problems.

In this lesson, we’re going to learn another method that requires different conditions and can be naturally extended to solve problems with many variables. This method is called the gradient descent algorithm. We’ll learn what gradient descent is, its advantages and disadvantages, when to use it, and how to implement it using Python.

Gradient descent’s applicability is widespread in many fields like robotics and video game development. But undoubtedly, the main application of gradient descent (and its variants) is in machine learning. Many algorithms, like linear regression and neural networks, can learn because of a gradient descent variant that carries on the process of automatic learning (which is just solving an optimization problem).

Let’s master gradient descent then!

Gradient descent requirements

As with binary search, we can apply gradient descent only under certain conditions. If some of these conditions aren’t met, then the algorithm isn’t guaranteed to converge. That means that we might not find the minimum using gradient descent.

In the first place, the target function should be differentiable and convex. A differentiable function is a function that has all its first-order derivatives defined in its whole domain. These tend to be smooth functions when we draw them, without corners or discontinuities (regions where the function isn’t defined). On the other hand, nondifferentiable functions always have some type of discontinuities and/or corners and cusps.

Press + to interact

Introduction

Derivatives and Gradients

First Optimization Algorithms

Population Methods

Adding Constraints

Solving Sudoku and the 8-Queens Puzzle as Constraint Satisfaction

Linear Constrained Optimization

Summary and Conclusion

Appendix

Maze Solver Using the Ant Colony Optimization Algorithm

Gradient Descent

Gradient descent requirements