# Conjugate Gradient Descent

Learn how the conjugate gradient descent algorithm can be used to solve problems of the form Ax = b.

## We'll cover the following

**Conjugate gradient descent** is an optimization algorithm primarily used for solving systems of linear equations.

## Why need conjugate gradient descent?

Conjugate gradient descent can be used in several real-world scenarios, such as facial recognition, image reconstruction, etc.

One of the limitations of the gradient descent algorithm is that the optimization route looks zigzaggy. In initial iterations, the cost function decreases rapidly, whereas when approaching the minimum, the rate at which the function converges becomes very slow because the gradient starts vanishing. Furthermore, the steps we take at the beginning are usually in the same direction.

Conjugate gradient circumvents this problem by choosing the best direction at each step and avoiding taking multiple steps that eventually lead to the same direction. Consider the following example to understand better:

When we’re using the gradient descent method, imagine we’re in a mountainous terrain, and our goal is to find the lowest valley. We start at a random point, and at each step, we move in the direction that goes downhill the steepest. However, this approach can lead to a zigzag pattern, especially when the terrain is shaped like a long, narrow valley. We might find ourselves going back and forth across the valley, descending slowly toward the bottom.

Now, the conjugate gradient method is like having a map of the terrain. Instead of just blindly going downhill, we plan our route to take the most direct path to the valley. We do this by choosing a new direction at each step that is conjugate (or orthogonal) to the previous directions. This means that we’re not retracing our steps, and we’re making the most efficient progress toward the goal.

The figure below illustrates the idea above:

Get hands-on with 1200+ tech skills courses.