What is the Newton-Conjugate Gradient method?

The Newton-Conjugate Gradient (NCG) method is an iterative optimization approach to find the minimum of a scalar-value object function. A scalar-valued objective function is a mathematical function that takes one or more input variables and returns a single real number as its output. In other words, it maps a vector of inputs to a scalar value.

The NCG method integrates the strength of Newton’s and the Conjugate Gradient methods, providing convergence for a broad range of optimization problems in an efficient way.

Let’s expand on the mathematical formulation of the NCG method.

Mathematical formulation

Consider an objective function $f(\theta)$ to be minimized, where $\theta$ is the parameter vector. The NCG method iteratively updates the parameter vector as follows:

Where:

$\theta_k$ is the current parameter vector.
$\alpha_k$ is the size of the step.
$\nabla f(\theta_k)$ is the gradient of the objective function at $\theta_k$ .
$\nabla^2f(\theta_k)$ is the Hessian matrix of the objective function at $\theta_k$ .
Combining the $\nabla^2f(\theta_k)^{-1} \nabla f(\theta_{k})$ is called the Newton step.

Let’s understand the NCG method with the help of an example:

Example

Find the minimum value of the following objective function using the NCG method:

Hovering over the plot above, we can estimate where the minimum value might be located. Next, we will look at precisely determining the minimum value using the NCG method in the upcoming code widget.

The following steps are involved in the optimization process:

Calculate the gradient: $\nabla f(\theta_1, \theta_2)= [2\theta_1-2\theta_2, 2\theta_2-2\theta_1]$
Calculate the Hessian matrix: $\nabla^2 f(\theta_1, \theta_2) = \begin{bmatrix} 2 & -2 \\ -2 & 2 \end{bmatrix}$
Calculate the Newton step along with the conjugate gradient step.
Update the parameters using the NCG equation.

Luckily, we can use a built-in function minimize in Python to optimize an objective function using the Newton-Conjugate Gradient method. Let's optimize our objective function in the following code:

Code

import numpy as np
from scipy.optimize import minimize
# Objective function to be minimized
def Function(theta):
    return theta[0]**2 + theta[1]**2 - 2 * theta[0] * theta[1] + 5
# Calculate gradient of the function
def Gradient(theta):
    return np.array([2*theta[0] - 2*theta[1], 2*theta[1] - 2*theta[0]])
# Calculate Hessian matrix
def Hessian(theta):
    return np.array([[2, -2], [-2, 2]])
# Apply the Newton-Conjugate method
result = minimize(Function, [1, 1], method='Newton-CG', jac=Gradient, hess=Hessian)
# Print the optimal parameters and minimum value
print("Optimal parameters:", result.x)
print("Minimum value:", result.fun)

Code explanation

Line 1–2: Imports numpy and minimize libraries.
Line 5–6: We define our objective function.
Line 9–10: We define a function to compute the gradient of the objective function.
Line 13–14: The Hessian matrix of the objective function is computed.
Line 17: This line optimizes the objective function via the Newton-Conjugate Gradient method using the minimize function provided in the scipy library.
Line 20–21: We print the optimal parameters and the minimum value of the function.

Limitations of the NCG method

While the NCG method is a powerful optimization method, it has some of the following limitations:

Computational cost: The NCG method involves computing the Hessian matrix and its inverse in each iteration. Therefore, it can be computationally expensive for large-scale optimization problems.
Storage requirements: Since the Hessian matrix is dense, the memory required to store it is proportional to the number of parameters in the matrix. Storing and manipulating the matrix becomes memory-intensive for problems with large number of parameters.
Non-convexity: The disadvantage of the NCG method is that it doesn’t work well on non-convex objective functions. The NCG method might converge to the local minimum instead of the global minimum if the function is non-convex.
Strategy of line search: The NCG method typically uses a line search to find the size of the step in each iteration. The selection of a line search strategy can affect the convergence and performance of the method.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources