GLOBALLY CONVERGENT GAUSS-NEWTON METHODS

Size: px

Start display at page:

Download "GLOBALLY CONVERGENT GAUSS-NEWTON METHODS"

Bartholomew Randall
5 years ago
Views:

1 GLOBALLY CONVERGENT GAUSS-NEWTON METHODS by C. Fraley TECHNICAL REPORT No. 200 March 1991 Department of Statistics, GN-22 University of Washington Seattle, Washington USA

2 Globally Convergent Gauss-Newton Methods C. Fraley * Department of Statistics GN-22 t University of Washington Seattle, vva USA & Statistical Sciences, Inc. t 1700 Westlake Ave. N., Suite 500 Seattle, WA USA March 1991 Abstract This paper introduces a linesearch algorithm for nonlinear least squares and nonlinear equations. Each iteration involves a one-dimensional minimization along an unmodified Gauss-Newton direction which may be different from the classical Gauss Newton direction. Global convergence to a stationary point is proved by showing that in the worst case the algorithm uses the steepest-descent direction - which is itself Gauss-Newton direction.

4 1 Introduction This paper addresses the problem of minimizing a sum of squares of smooth nonlinear functions or, equivalently where 1 m min -2 I>pf(x) xelr n i=l I (x) = ( ~l ~x) ). ~m(x) The factor 1(2 is introduced to avoid a factor of 2 in derivatives and does not affect the location of the minimum. We place no restrictions on m relative to n, so that the discussion applies to nonlinear equations (m :::; (1. n) as well as to overdetermined systems (m > n). In what follows we often drop the argument x, assuming that I (and functions of f) are evaluated at the current iterate. The classical Gauss-Newton method for (1.1) is the sequential one-dimensional minimization of ITI along the solution to the linear least-squares problem min HJp pelr n Here J denotes the m x n Jacobian matrix for I, 71 \ ( 8~lfx) 8~1(X) \ 8X1 8xn J \.X) - I 8~~(X) 8<P~(X)) \ 8X1 8xn Solving (1.2) is equivalent to solving the normal equations JT Jp = _JT f. Since JTI = %V (IT I), the classical Gauss-Newton method can be interpreted as a mourned Newton method in which the matrix JTJ replaces the Hessian matrix!v 2 (ITI) = ~iv2~i (see, e. and Schnabel J does not column JTJ is an of We cenne.2) (1.3)

5 In general, linesearch algorithms for unconstrained minimization are globally convergent if each search direction is a descent direction for the objective function, and if the sequence of search directions is bounded away from orthogonality to the gradient (see, e. g.,. For the nonlinear least-squares problem (1.1), the descent condition is r Jp < O. The classical Gauss-Newton direction PGN is a descent direction for ft f when JTf is nonzero. This can be seen as follows. Let be the singular value decomposition of the Jacobian matrix, in which U is an m x m orthogonal matrix, V is an n x ti orthogonal matrix, and E is a min(m,n) x min(m,n) diagonal matrix whose diagonals are the singular values of J in decreasing order of magnitude: (see, e. g., Golub and Van Loan [7]). The rank of J is the number of nonzero singular values. If u; denotes the ith column of U and Vi denotes the ith column of V, then and min(m,n) rank(j) J = I: O"iUiV; - I: O"iUiV;, i=l i=l T+ J' = rank(j) Li=l Therefore and so 2 <

6 ""r:ank(j) L..t=1 2 j""r:ank(j) (.(ft.))2 ""r:ank(j) ( :-I( '1'f))2 L..t=1 a, U t L..t=1 (Jt U t > ""r: ank( J) ( Tf) 2 L..t=1 U t j 2 ""rank(j) (Tf)2-2 ""rank(j) ( Tf)2 (JI L..i=1 Ui (Jrank(J) L..i=1 Ui (Jrank(J) _ 1 (JI - cond(j). Thus, either JTf vanishes in the limit, implying that the iterates converge to a local minimizer of ft f, or else the sequence of Jacobians approaches a loss of rank. In the next section we show that it is possible to guarantee that the second case will never occur while remaining within the Gauss-Newton framework. 3 Globally-Convergent Algorithm The problem of computing a Gauss-Newton direction need not be viewed as that of solving a fixed linear least-squares problem. Because of nonlinearity, there are many Gauss-Newton directions associated with a given nonlinear objective and a given iterate (Nocedal and Overton [10], Fraley [4]). If Q(x) is an 1x m smooth, nonconstant matrix function satisfying Q(x)TQ(x) = I for all x, then Q(x)f(x) defines the same nonlinear least-squares problem as..j. f \ l.lh h '-h J 1.." J.. " _., """ ""'-T',.,. -, coes J,X), an loug 1 \, e acornan matrices (anc consequently tne \.:iauss-r~ewtoncurectionsj would be different. A globally-convergent algorithm can be obtained by choosing from among several possible Gauss-Newton directions. Define <PI (x) <P2(X) IS dnterentiabte nrrh1hip;j

7 where ek is the kth standard unit vector. The matrix H[kj(X) is an (orthogonal) Householder matrix (see, e. g., [7]). It is defined and differentiable in a neighborhood of x whenever h[kj(x) =f. 0, or equivalently, whenever f(k) (x) =f. O. When f(k) (x) =f. 0, the first k - 1 rows of the Jacobian matrix J[kj of f[kj are identical to those of the Jacobian matrix J of I, and the kth row of J[kj is the following linear combination of the last k - m rows of J : ( '\'1J1' -1...( )Or/>i(X) L...-t=k w«x --a:iil '\'1J1' -1...( )O<Pi(X) ) L...-t=k '/'. a: -ax;:- (Note that \7 (f[ljf[kj) is independent of k whenever it is defined, as is f[ljf[kj, although f[kj and J[kj vary with k.) The case k = 1 is of particular interest, since then the Jacobian matrix consists of a single row, a scalar multiple of the transpose of the steepest-descent direction for ftf. In this instance, the Gauss-Newton direction is the minimum 12-norm solution to the one-dimensional linear least-squares problem and is therefore a negative multiple of the gradient JTf when f O. Hence the steepestdescent direction is one possible Gauss-Newton direction. The derivation of a globally-convergent algorithm is now straightforward. Choose in advance an upper bound K, 2:: 1 on the allowed condition number of the Jacobian. At beginning of an iteration, sort the components <Pi of f so that they are arranged in decreasing order of magnitude. Let k be the largest index for which <Pk(x) =f. O. k as until rank(j[k]) = k and cond(j[kj) :::; K,. These conditions met k = = 1 = L Continue decreasing next iterate.

8 Globally Convergent Gauss-Newton Algorithm initialization: starting value z: K ~ 1 loop sort the components <Pi of f so that l<pl(x)1 ~ 1<P2(X)1 ~.. ~ l<pm(x)1 k <- max{i I <Pi(X) -:j:. O} while (rank(j[k]) < k or cond(j[k]) > K) do k <- k: - 1 P[k] <- argminpeir n II J[k]P + f[k]t ark] <- argminaeir Ilf(X + ap[k])ii: x <- x + Q[k]P[k] forever The results of previous section hold not only for the classical Gauss-Newton directions, but for any sequence of Gauss-Newton directions. The vectors P[k] are descent directions for ft f since they are all unmodified full rank Gauss-Newton directions. Moreover, these vectors are bounded away form orthogonality to the gradient because the condition numbers of the associated Jacobians are bounded above by K. Hence the sequence of iterates produced by the algorithm converges to a stationary point of ftf. 4 Additional Remarks Alternative definitions for f[k] are possible. For example, the contribution last m - k elements of f could be distributed differently among the first k components. The essential properties that f[k] must possess are that f~]f[k] be equal to ft f for all x ; that f[k] be differentiable, and that k = 1 corresponds to steepest descent for the global convergence result. Practical implications remain to be explored. algorithm given the nr~>,n{"\l1<:section a nujmenc'3.1 algorithm. is near 0 can as evidenced descent. F,H/V<.W. convergence is of consequence if rate of convergence IS unobserva steepest-descent orrecnon IS a V<l,U""- ",,,,"urcar> direction

9 which f(x*) = 0 and J(x) has full column rank in a neighborhood of x* (see, e. g., ). Quadratic convergence cannot be assured in the new algorithm under these conditions on account of the upper bound on the condition number of J[k] and the requirement that rpk =i= O. However, if f[k] is redefined as follows : for k - < I', V[rPI(X)]2 + L~k-l [rpi(x)]2 rpl+l(x) for k > I, where 1= max{i I rpi(x) =i= O}, then quadratic convergence will occur whenever it would in the classical method as long as cond(j(x*» ::; K. It would be necessary to replace rank(j[k]) < k by rank(j[k]) < min(k, n) as a condition for decreasing k, and to start each iteration with k = m. Note that this definition of f[k] differs from the previous one only if k ;;:: I, and that f[k] can still be interpreted as the application of a nonlinear Householder transformation to the original problem. The rate of convergence of the algorithms introduced here can be no faster than linear for problems on which the classical Gauss-Newton has linear convergence, Jacobian new methods higher at the solution because nonzero residuals remain nonzero new ruevuous, decreasing k

10 J may also find some use within other algorithms for nonlinear least squares (e.,more [9], Dennis, Gay, and Welsch [2]), as well as for the underdetermined equations that arise in nonlinearly-constrained optimization (e. g., Fletcher [3], Gill, Murray, and vvright [6]). Acknowledgement. Many thanks to Ariela Sofer, Stephen Nash, Karla Hoffman, and Nat Wilson. References [1] J. E. Dennis Jr. and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall (1983). [2] J. E. Dennis Jr., D. M. Gay, and R. E. Welsch, "An adaptive nonlinear least squares algorithm", ACjVf Transactions on l\1athematical Software 7 (1981) [3] R. Fletcher, Practical Alethods of Optimization (2nd ed.), Wiley (1987). [4] C. Fraley, "Computational behavior of Gauss-Newton methods", SIA.M Journal on Scientific and Statistical Computing 10 (1989) [5] P. E. Gill and VV. Murray, "Algorithms for the solution of the nonlinear least-squares problem", SIAM Journal on Numerical Analysis 15 (1978) [6] P. E. Gill, W. Murray, and M. H. Wright, Practical Optimization, Academic Press (1981). [7] G. H. Golub and C. F. Van Loan, Matrix Computations (2nd ed.), Johns Hopkins (1989). [8] C. L. Lawson and R. J. Hanson, Solving Least Squares Problems, Prentice-Hall (1974). [9] J. J. More, "The Levenberg-Marquardt algorithm: motivation and theory", in Numerical Analysis - Proceedings Dundee 1977, Lecture Notes in Mailiematics 630, Springe-Verlag (1978) [10] J. Nocedal and M. L. Overton, "Projected Hessian updating algorithms for nonlinearly constrained optimization", SIAM Journal on Numerical Analysis 22 (1985)

The Newton-Raphson Algorithm

The Newton-Raphson Algorithm David Allen University of Kentucky January 31, 2013 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm, also called Newton s method, is a method for finding the minimum