Forcing Hessian Matrix to be Positively Definite
Mini-Project by Suphannee Pongkitwitoon

updated on Febrary 14, 2002 

Rewritten by Suphannee Pongkitiwoon

 

The Mini-Project of Forcing the Hessian Matrix to be positively definite is objected to understand the methods of forcing the Hessian Matrix to be positively definite of Modified Marquardt and Cholesky Factorization including their basic ideas and theorem.  Additionally, the basic concepts to understand the reasons of forcing the Hessian matrix into positively definite are briefly discussed.  Furthermore, the recent topics related aspects of forcing matrix to be positively definite are roughly presented.

 

 

1. Newton’s Method

 

One effective and robust technique for numerical optimization of general nonlinear multivariable objective functions with unconstrained variables is Newton’s Method, which is indirect method of inverse Hessian Matrix multiplied by negative gradient with step size, a ,equal to 1.  Suppose that a quadratic approximation of f(x) at xk, by using Taylor’s series approximation, is following;

                                                                                                                                                                                                                           (1.1)

where     H(xk) is the Hessian Matrix of f(x) defined that the matrix of second partial derivatives with respect to x is evaluated at xk­­­­­­.  The minimum of the quardratic approximation of f(x) in equation (1.1) is obtained by differentiating the equation (1) with respect to each of components of DX and equating the resulting expressions to zero to give

                                                                                                             (1.2)

or

                                                                                                    (1.3)

 

where [H(xk)]-1­­­ is the inverse of the Hessian Matrix H(xk).  Note that both the direction and step length are specified as a result of equation (1.2).  If f(x) is actually quadratic function, only one step is required to reach the minimum of f(x).  For general nonlinear objective functions, the minimum of f(x) cannot be reach in one step, so that equation (1.3) can be modified by introducing the parameter for the step length as following;

                                                                                                                                                                                                                                                 (1.4)

Consequently, the search direction s for minimization is now given by multiplied negative gradient by inverse Hessian as

                                                                                                                                                                                                      (1.5)

and that the step length is ak, which equals 1 on each step for the traditional Newton’s Method.  However, this alpha equal to 1 often does not converge if the initial point is not close enough to a local minimum.  Even Newton’s Method usually requires the fewest iterations of all the methods, but it has one disadvantage of using a step size of unity maybe not leading to converge.  To overcome this drawback, two classes of methods exist to modify the pure Newton’s method so that it is guaranteed to converge to a local minimum from an arbitrary starting point.  The first of these, called Trust Region Methods, minimize the quadratic approximation within an elliptical region, whose size is adjusted so that the objective improves at each iteration.  The Trust Region is the region, in which objective function is quadratic function and convex, guaranteed its local to be global optimum.  The second class, Line Search Methods, modifies the pure Newton’s Method in two ways:

 

1.1)              instead of taking a step size of one, a line search is used, and

1.2)              if the Hessian Matrix H(xk) is not positively definite, it is replaced by a positively definite matrix that is close enough to H(xk).

 

If f(x) is convex, H(x) is positively semi-definite at all point x and is usually positively definite.  Hence Newton’s Method, using a line search, is convergent.  If f(x) is not strictly convex, this case is often in regions far from the optimum and H(x) may not be positively definite in everywhere, so one approach to forcing convergence is to replace H(x) by another positively definite matrix.  The Marquardt-Levenberg method is one way of doing this as forcing the Hessian matrix into positively definite.

 

2. Forcing the Hessian matrix to be positively definite

 

Marquardt (1963), Levenberg(1944), and others have suggested that the Hessian matrix of f(x) can be modified on each stage of the search as needed to ensure that the modified Hessian,  is positively definite and well conditioned.  The procedure adds elements to the diagonal elements of H(x) as

                                                                                                                                                     (2.1)

where b is a positive constant large enough to make H(x) positively definite when H(x) is not.

 

A simpler procedure that may result in a suitable value of b is to apply modified Cholesky factorization as follows;

                                                                                                                                                      (2.2)

where D is a diagonal matrix with nonnegative elements, which dij = 0 if H(xk) is positively definite and L is a lower triangular matrix.  Upper bounded on the elements in D are calculated using Gershgorin circle theorem.

 

3. The LevenBerg-Marquardt Method

 

In the Levenberg-Marquardt, modified the Gauss-Newton method equation is into

                                                                      (3.1)

 

Here the parameter l is adjusted during the course of the algorithm to ensure convergence.  For very large values of l, we got

                                                                                                                                                                                                                                                                                            (3.2)

 

This is a steepest descent step to make this algorithm simply moving to downhill.  The steepest descent step provides very slow, but certain convergence.  For small values of l, we can use the Gauss-Newton’s Direction.

 

The hard part of this method is determining the right value of l.  The general idea is to use small values of l in situations where Gauss-Newton is working well, but switch to larger values if not making progress.  In general, the Levenberg-Marquardt method is the choice for small to medium sized nonlinear least square problems.

 

4. Cholesky Factorization

 

Cholesky Factorization factors an n x n, symmetric, positively definite matrix A into the product of a lower triangular matrix L and its transpose, i.e., A = LLT or A = UTU, where U is an upper triangular matrix.  It is assumed that the lower triangular portion of A is stored in the lower triangular of a two dimentional array and that the computed elements of L overwrite the given elements of A.  At the k-th step, making partition the n x n matrices Ak, Lk, and [L(k)]T, and write the system as

 

                                                                                                                    (4.1)

                                                                                                                                                                                                           (4.2)

 

 

5. The Gershgorin Circle Theorem

 

If all the eigen values of A are in the left-half plane, have negative real part, then solution for Z’=AZ collapse to zero.  If at least one eigen value is in the right-half plane, then some solution grows without bound.  Consequently, it is very important to know where the eigen values of A are, even if the matrix is such that the eigen values are hard to find.  The Gershgorin Circle Theorem is cited that if A is a matrix and S is surface elements of Z in  less that summation of  all j not equal to m, then every eigen value of A lies in S or equation (12).

 

                                                                                                                    (5.1)                                                                                                                                                                            

6. Recent Development of Marquardt, Cholesky and related topics

 

For Newton’s method, many applications are modified for solving nonlinear equations such as single equation, simultaneous equations for obtaining the new strategies or even well optimized results (CERN, 1998).

 

The method of Marquardt is intermediate step between the Taylor’s series or Guass-Newton method and the Steepest Descent or Gradient method.  Further modified of Marquardt’s method by Flecher is used in MULTI and MULTI-FORTE 3D graphical program for calculated upper and lower limits including weighted each parameters along each grid intersection (W.A., David, 1999).

 

A Modified Cholesky Algorithm based on Symmetric Indefinite Factorization (Sheung and et al. 1998) is example of method neglected the positively definite of Hessian matrix by computing Cholesky factorization P(A+E)PT=RTR for analyzing optimum with new effective algorithm both in theory and practice.

 

7. References

 

1)                   Borchers, Brian (2000). Nonlinear Problems Part I, NJ.

2)                   CERN, (1998). Application of Newton’s Method for solution of Nonlinear Equations, U.S. Geological Investigation Report 96-4240.

3)                   Dennis and Schnabel, (1986). Numerical Methods for Nonlinear optimization and Equations, 22.

4)                   Edgar, T.F., and Lasdon, L.S. (2001). Optimization of Chemical Processes, chapter 6. MaGraw-Hill, Singapore.

5)                   Ostrouchov, Susan (1995). Factorization Methodes.

6)                   Sheung, Hun cheng and Higham, Nicholas J. (1998). A Modified Cholesky Algorithm Based on a Symmetric Indefinite Factorization, Industrial and Applied Mathematics, 1097-1110.

 

 

8. Appendices

 

8.1 Theory and supported ideas to prove LLT to be positively definite

 

From equation (2.2),

                                               

if LLT is positively definite, the H(xk) + D is also positively definite.  To prove the positively definite LLT, the application of right-half plane positively definite will be applied as following,

 

                                                                                                                                                         (8.1.1)

 

For that reason, for all x is not zero,  is more than zero guaranteed that the LLT is positively definite.

8.2 Sizing D and method to size it

 

The lower limit of values of D are the values in which the Hessian matrix can be factorized into LLT, whereas the upper limit of D will be designated by inequality equation (8.2.2)

 

The magnitude of D is limited by Gershgorin circle, which defined that each element in diagonal m x m matrix subtracted by Z is less than summation of the rest elements as equation (8.2.1).

 

                                                                                                                                                         (8.2.1)

where Amm is element in column m row m of Hessian matrix and Ajm is element in column m and row j at all j m.                                                                                         

 

This restriction means that the maximum value of elements d in matrix D is equal to the summation of every element in column m and row j at all j not m.  For that reason, the following equations present above idea:

 

                                                                                                                          (8.2.2)

If all solution is bounded, all Z is zero, which follows Gershgorin circle, so the equation (8.2.2) will be replaced by subsequent equation (8.2.3)

 

                                                                                                                                   (8.2.3)

It is maybe considered as equation (8.2.4), which D is large as much as its value can make combination of Hmm and Dmm less than summation of Hjm and Djm at all j m.

 

                                                                                                                         (8.2.4)

 

                                                                                                                                      

 

 

 

 

       

Return Home

                                                                                                                            Next

Hosted by www.Geocities.ws

1