What is the Newton-Conjugate Gradient method?

The Newton-Conjugate Gradient (NCG) method is an iterative optimization approach to find the minimum of a scalar-value object function. A scalar-valued objective function is a mathematical function that takes one or more input variables and returns a single real number as its output. In other words, it maps a vector of inputs to a scalar value.

The NCG method integrates the strength of Newton’s and the Conjugate Gradient methods, providing convergence for a broad range of optimization problems in an efficient way.

Let’s expand on the mathematical formulation of the NCG method.

Mathematical formulation

Consider an objective function f(θ)f(\theta) to be minimized, where θ\theta is the parameter vector. The NCG method iteratively updates the parameter vector as follows:

Where:

  • θk\theta_k is the current parameter vector.

  • αk\alpha_k is the size of the step.

  • f(θk)\nabla f(\theta_k) is the gradient of the objective function at θk\theta_k.

  • 2f(θk)\nabla^2f(\theta_k) is the Hessian matrix of the objective function at θk\theta_k.

  • Combining the 2f(θk)1f(θk)\nabla^2f(\theta_k)^{-1} \nabla f(\theta_{k}) is called the Newton step.

Let’s understand the NCG method with the help of an example:

Example

Find the minimum value of the following objective function using the NCG method:

Surface plot of the objective function

Hovering over the plot above, we can estimate where the minimum value might be located. Next, we will look at precisely determining the minimum value using the NCG method in the upcoming code widget.

The following steps are involved in the optimization process:

  1. Calculate the gradient: f(θ1,θ2)=[2θ12θ2,2θ22θ1]\nabla f(\theta_1, \theta_2)= [2\theta_1-2\theta_2, 2\theta_2-2\theta_1]

  2. Calculate the Hessian matrix: 2f(θ1,θ2)=[2222]\nabla^2 f(\theta_1, \theta_2) = \begin{bmatrix} 2 & -2 \\ -2 & 2 \end{bmatrix}

  3. Calculate the Newton step along with the conjugate gradient step.

  4. Update the parameters using the NCG equation.

Luckily, we can use a built-in function minimize in Python to optimize an objective function using the Newton-Conjugate Gradient method. Let's optimize our objective function in the following code:

Code

import numpy as np
from scipy.optimize import minimize
# Objective function to be minimized
def Function(theta):
return theta[0]**2 + theta[1]**2 - 2 * theta[0] * theta[1] + 5
# Calculate gradient of the function
def Gradient(theta):
return np.array([2*theta[0] - 2*theta[1], 2*theta[1] - 2*theta[0]])
# Calculate Hessian matrix
def Hessian(theta):
return np.array([[2, -2], [-2, 2]])
# Apply the Newton-Conjugate method
result = minimize(Function, [1, 1], method='Newton-CG', jac=Gradient, hess=Hessian)
# Print the optimal parameters and minimum value
print("Optimal parameters:", result.x)
print("Minimum value:", result.fun)

Code explanation

  • Line 1–2: Imports numpy and minimize libraries.

  • Line 5–6: We define our objective function.

  • Line 9–10: We define a function to compute the gradient of the objective function.

  • Line 13–14: The Hessian matrix of the objective function is computed.

  • Line 17: This line optimizes the objective function via the Newton-Conjugate Gradient method using the minimize function provided in the scipy library.

  • Line 20–21: We print the optimal parameters and the minimum value of the function.

Limitations of the NCG method

While the NCG method is a powerful optimization method, it has some of the following limitations:

  • Computational cost: The NCG method involves computing the Hessian matrix and its inverse in each iteration. Therefore, it can be computationally expensive for large-scale optimization problems.

  • Storage requirements: Since the Hessian matrix is dense, the memory required to store it is proportional to the number of parameters in the matrix. Storing and manipulating the matrix becomes memory-intensive for problems with large number of parameters.

  • Non-convexity: The disadvantage of the NCG method is that it doesn’t work well on non-convex objective functions. The NCG method might converge to the local minimum instead of the global minimum if the function is non-convex.

  • Strategy of line search: The NCG method typically uses a line search to find the size of the step in each iteration. The selection of a line search strategy can affect the convergence and performance of the method.

Copyright ©2024 Educative, Inc. All rights reserved