Quasi-Newton Methods

Similar documents
2. Quasi-Newton methods

5 Quasi-Newton Methods

Quasi-Newton methods for minimization

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Optimization and Root Finding. Kurt Hornik

Search Directions for Unconstrained Optimization

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Chapter 4. Unconstrained optimization

Statistics 580 Optimization Methods

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Lecture 14: October 17

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Convex Optimization CMU-10725

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Two improved classes of Broyden s methods for solving nonlinear systems of equations

A NOTE ON Q-ORDER OF CONVERGENCE

Algorithms for Constrained Optimization

Unconstrained optimization

Lecture 18: November Review on Primal-dual interior-poit methods

ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS

Programming, numerics and optimization

MATH 4211/6211 Optimization Quasi-Newton Method

Maria Cameron. f(x) = 1 n

A derivative-free nonmonotone line search and its application to the spectral residual method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Optimization II: Unconstrained Multivariable

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

Nonlinear Programming

Improved Damped Quasi-Newton Methods for Unconstrained Optimization

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Convex Optimization. Problem set 2. Due Monday April 26th

1 Numerical optimization

Matrix Secant Methods

Quasi-Newton Methods

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Step lengths in BFGS method for monotone gradients

Higher-Order Methods

Newton s Method. Javier Peña Convex Optimization /36-725

A DIMENSION REDUCING CONIC METHOD FOR UNCONSTRAINED OPTIMIZATION

Lecture V. Numerical Optimization

Numerical Optimization: Basic Concepts and Algorithms

ORIE 6326: Convex Optimization. Quasi-Newton Methods

1 Numerical optimization

1. Introduction and motivation. We propose an algorithm for solving unconstrained optimization problems of the form (1) min

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Newton s Method and Efficient, Robust Variants

CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING

Optimization Tutorial 1. Basic Gradient Descent

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization

Numerical Methods for Large-Scale Nonlinear Systems

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

An Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations

Numerical Methods for Large-Scale Nonlinear Equations

Gradient-Based Optimization

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

Spectral gradient projection method for solving nonlinear monotone equations

Optimization II: Unconstrained Multivariable

5 Handling Constraints

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Notes on Numerical Optimization

Nonlinear Optimization: What s important?

CONVERGENCE BEHAVIOUR OF INEXACT NEWTON METHODS

Lecture 7 Unconstrained nonlinear programming

Step-size Estimation for Unconstrained Optimization Methods

Cubic regularization in symmetric rank-1 quasi-newton methods

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

Geometry optimization

A New Approach for Solving Dual Fuzzy Nonlinear Equations Using Broyden's and Newton's Methods

Preconditioned conjugate gradient algorithms with column scaling

Scientific Computing: An Introductory Survey

Unconstrained Multivariate Optimization

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

17 Solution of Nonlinear Systems

arxiv: v1 [math.oc] 10 Apr 2017

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

ABSTRACT 1. INTRODUCTION

Math 408A: Non-Linear Optimization

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

A COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS

Global convergence of a regularized factorized quasi-newton method for nonlinear least squares problems

Static unconstrained optimization

ALGORITHM XXX: SC-SR1: MATLAB SOFTWARE FOR SOLVING SHAPE-CHANGING L-SR1 TRUST-REGION SUBPROBLEMS

1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν

DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS

Minimum Norm Symmetric Quasi-Newton Updates Restricted to Subspaces

A NOTE ON PAN S SECOND-ORDER QUASI-NEWTON UPDATES

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Unconstrained minimization of smooth functions

8 Numerical methods for unconstrained problems

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Transcription:

Quasi-Newton Methods Werner C. Rheinboldt These are excerpts of material relating to the boos [OR00 and [Rhe98 and of write-ups prepared for courses held at the University of Pittsburgh. Some further references are [Kel95, [Kel99, [DS98. 1 Broyden s Method Let F (x) = 0, F : R n R n, (1) be a given system of nonlinear equations defined by a sufficiently smooth function F. A linearization method for the numerical solution of (1) has the general form x +1 = x F (x ), = 0, 1,... () where the n n matrices B are suitably chosen. The integral mean value theorem for F states that [ 1 DF (x + t(y x))dt (y x) = F (y) F (x), x, y R n, 0 The matrix in bracets can be interpreted as the average of the Jacobian matrix on the line segment between the points x and y. This suggests requiring the matrices B to satisfy the so called quasi-newton condition B +1 (x +1 x ) = F (x +1 ) F (x ). In several papers around 1967, C. G. Broyden suggested that it is numerically advantageous to choose the matrices in () such that ran(b +1 B ) is small. This led to the development of the so-called quasi-newton methods, which can be characterized by the following three properties: (a) B (x +1 x ) + F (x ) = 0, (b) B +1 (x +1 x ) = F (x +1 ) F (x ) (c) B +1 = B + B, ran B = m 1 = 0, 1,.... (3) Up to now only the values m = 1 or m = have been used in the design of quasi-newton methods. From (3) we obtain some frequently used relations (a) (B +1 B )s = F (x +1 ), s = x +1 x ), (b) F (x +1 ) = y B s, y = F (x +1 ) F (x ). (4) 1

C. G. Broyden himself developed two quasi-newton methods with m = 1 and called one of them his good method. This terminology has persisted. The good method uses B +1 := B + F (x+1 )(s ) (s ) s, (5) or, in view of (4)(b), B +1 := B + (y B s )(s ) (s ) s. (6) As for all standard linearization methods, the matrices B should be invertible. Recall the well nown Sherman-Morrison formula: 1.1. For u, v R n the matrix I + uv is invertible if and only if 1 + u v 0, and in that case ( I + uv ) 1 = I 1 1 + u v uv If in the Broyden method the matrix B is nonsingular, then 1.1 shows that [ B +1 = B I + (B 1 F (x+1 ))(s ) s (7) is again nonsingular, provided that in which case, the inverse is With H = +1 = [I (s + (B 1 F (x+1 )) s 0, ( F (x+1 ))((s ) F (x+1 )) s s + (B 1 and H +1 = +1 this can be written in the form. (8) H +1 = H + (s H y )(s ) (s ) H y H. (9) Various convergence results for Broyden s method have been proved. We refer to the cited references and cite only a simplified version of such a result: 1.. Let F : Ω R n R n be continuously differentiable on an open set Ω. Suppose that x Ω a solution of F (x) = 0 where DF (x ) is invertible and DF (x) DF (x ) γ x x, x Ω. Then there exist δ, η > 0 such that for x 0 x < δ and B 0 DF (x ) < η Broyden s method converges to x and the rate of convergence is superlinear in the sense that x + x +1 lim x +1 x = 0. (10)

Recursive Implementation With the notation the inverse formula (8) is +1 = [I w = F (x+1 ), (11) w (s ) s + (w ) s and the next step equals [ s +1 = +1 F (x+1 ) = I From (13) it follows that, (1) w (s ) s + (w ) s s = s + (w ) s w. s +1 (s ) w (s ) s = s + (w ) s j=0 w (13) whence (1) becomes [ +1 = I + s+1 (s ) s = [I + sj+1 (s j ) s j B0 1. (14) while (13) can be written as x +1 = [I + s+1 (s ) s w, (15) that is, [1 + (s ) w s s +1 = w. (16) Suppose now that the steps s j, j = 0, 1,..., and their norms have been stored. Then (14) and (16) imply that w = 1 j=0 [I + sj+1 (s j ) s j s +1 = 1 w, τ = (s ) w 1 + τ s, which can be evaluated by the recursive algorithm w := 0 F (x+1 ); for j = 0,..., 1 τ := [(s j ) w/ s j ; w := w + τs j+1 ; τ := [(s ) w/ s ; s +1 := [1/(1 + τ)w; 3 w, w := 0 F (x+1 ) (17) (18)

In order to complete this algorithm, we need some divergence and convergence criteria. In the convergence proof a controlling quantity is the quotient Θ := B 1 F (x+1 ) s, 0, (19) and it turns out, that we should declare divergence if the condition Θ < 1 (0) is violated. In view of the superlinear convergence it suffices to declare convergence as soon as s +1 tol. Altogether the Broyden algorithm can now be formulated as follows, where in contrast to (18) we wor with v = w: input: x 0, B 0, max, tol; solve B 0 s 0 = F (x 0 ); ξ 0 := s 0 ; store ξ 0, s 0 ; for = 0, 1,..., max x +1 := x + s ; solve B 0 v = F (x +1 ); if > 0 for j = 1,..., τ := [(s j 1 ) v/ξj 1 ; v := v + τs j ; endif τ := [(s ) v/ξ ; Θ := v /ξ ; if Θ 1/ then return {divergence}; s +1 := v/(1 τ); ξ +1 := s +1 ; store ξ +1, s +1 ; if ξ +1 tol then return {x := x +1 + s +1 }; return {maximal number of steps} An implementation of this algorithm is the FORTRAN program NLEQ1 of P. Deuflhard, U. Nowa, and L. Weimann available in the ZIB-Elib library. There exists also a Matlab version. A somewhat different Matlab program is brsol.m by C. T. Kelley [Kel03. The recursive form of the Broyden method has shown itself to be very economical in practice. But it has been observed occasionally, that the condition of the matrices may deteriorate over several steps causing the method to become instable. For any matrix A = I + uv v, u, v R n, κ = u < 1, v 4

we have uv u v and therefore 1 κ 1 uv v A 1 + uv v 1 + κ, This shows that A 1 (1 κ) 1 and cond (A) := A A 1 1 + κ 1 κ, Hence, for the Broyden matrices (7) it follows from the convergence condition (19), (0), that cond (B +1 ) 1 + Θ 1 Θ cond (B ) < 3cond (B ), and hence that the growth of the condition numbers is not unduly fast and can be controlled by means of these estimates. 3 Linear Equations The recursive form of the Broyden method also provides a very useful iterative method for linear problems In that case (5) has the form Ax = b, A GL(R n ). B +1 := B + (b Ax+1 )(s ) s and with (B A)s + b Ax +1 = 0 it follows that B +1 A = (B A)(I P ), P = s (s ) (s ) s. (1) Here I P is the orthogonal projection onto the orthogonal complement of the linear space spanned by s. We introduce now the matrices E j = A 1 B j I. Then it follows from (1) that E j+1 E j, j 0. Moreover, implies that and hence that B j s j = Ax j b = A(x j x ), x = A 1 b, x j x = A 1 B j s j = (E j + I)s j, ( 1 E js j ) ( s j s j x j x 1 + E js j ) s j s j. 5

Under the conditions of the local convergence theorem 1. one can show that lim j E j s j / s j = 0. This leads to the asymptotic error estimate x j x s j In order to smooth any possible erratic behavior, it is here useful to wor with the average of several steps and to declare convergence if ɛ := 1 [ s j 1 + s j + s j+1 1/ η x j tol, () with some given safety factor β < 1. Then the algorithm has the form: input: A, b, y, B 0, max, τ min, tol; r := b Ay; s 0 = B0 1 0 := (s 0 ) s 0 ; for = 0, 1,..., max store s 0, η 0 ; q := As ; z = B0 1 if > 0 for j = 1,..., z := z + [(s j 1 ) z / η j (s j s j 1 ); endif τ := η /[(s ) z; if τ < τ min then return {restart}; x := x + s ; s +1 := τ(s z); η +1 := (s +1 ) s +1 ; Store s +1, η +1 ; ɛ := (1/)[s 1 + s + s +1 1/ ; if ɛ β x tol then return {x := x + s +1 }; return {maximal number of steps} A FORTRAN implementation is the GBITR program in the ZIB-Elib library. Note that for the matrix A only a facility for computing the product Ax, x R n, has to be provided. 4 Ran-Two Updates The variety of possible methods increases considerably in the ran-two case. Many of these methods have been developed for application in optimization problems. In that case the interest centers on update formulas, which preserve the symmetry of the matrices. Evidently, the direct updates should then have the form B +1 = B + ( b c ) ( ) ( ) b σ1 σ Σ, Σ =, b, c R n. (3) σ σ 3 c Some examples show that here the matrix Σ should be nonsingular with a negative determinant, otherwise there may be convergence problems. 6

Since the vectors b, c are essentially free, some suitable basis in R may be chosen in which Σ assumes a simpler form. In particular, because of det Σ < 0 we may transform Σ such that either σ 1 or σ 3 is zero. In fact, if, say, σ 3 0 then a simple calculation shows that Σ = ( 1 µ 0 1 ) ( 0 δ ) ( 1 0 δ σ 3 µ 1 ), µ = σ δ σ 3, δ = det Σ. Thus, there is no loss of generality to assume that σ 1 = 0 in (3). As before we use the abbreviations s = x +1 x, y = F (x +1 ) F (x ). (4) The condition (4) requires y B s to be in the subspace spanned by b and c and hence it is no restriction to set b = y B s. Then, for any c R n such that c s 0 it follows that σ = 1/c s and σ 3 = (y B s ) s /(c s ). In other words, all symmetric direct update formulas with nonpositive determinant can be written in the form B +1 = B + (y B s )c + c(y B s ) c s provided, of course, that c s 0. (y B s ) s (c s ) cc, (5) For c = s (5) becomes the Powell-symmetric-Broyden (PSB) update formula B +1 = B + (y B s )(s ) + s (y B s ) (s ) s (y B s ) s ((s ) s ) s (s ) (6) of M. J. D. Powell, while for c = y we obtain the Davidon Fletcher Powell (DFP) update formula B +1 = B + (y B s )(y ) + y (y B s ) (y ) s (y B s ) s ((y ) s ) y (y ) (7) given by D. Davidon and independently by R. Fletcher and M. J. D. Powell. Instead of woring with the direct update (3) we may consider updating the inverses H = such that H +1 H has ran two. Here we can begin with H +1 H in a form analogous with (3) and then proceed as before. We will not go into details, but mention only one of the formulas that can be obtained in this way. It was independently suggested by C. G. Broyden, R. Fletcher, D. Goldfarb and D. F. Shanno, and is generally called the BFGS formula reflecting the first letters of the four authors. H +1 = (I s (y ) )H (y ) s (I y (s ) ) (y ) s + s (s ) (y ) s. (8) This is widely considered the most effective update formula for minimization problems. As before, we can apply here the Sherman-Morrison formula 1.1 and then obtain the direct-update form of the BFGS update B +1 = B + y (y ) (y ) s B s (B s ) (s ) B s. (9) 7

5 The BFGS Method in Optimization Extremal problems are of foremost importance in almost all applications of mathematics. Many boundary value problems of mathematical physics may be phrased as variational problems. For instance, holonomic equilibrium problem in Lagrangian mechanics derive from the minimization of a suitable energy function. Similarly, the determination of a geodesic between two points on a manifold is a minimization problem, and so are optimal control problems in engineering, or problems involving the optimal determination of unnown parameters of a technical process. There are close connections between such extremal problems and the solution of nonlinear equations, as is readily seen in the finite dimensional case. Let g : Ω R n R 1 be some functional on some set Ω. A point x Ω is a local minimizer of g in Ω if there exists an open neighborhood U of x in R n such that g(x) g(x ), x U Ω, (30) and a global minimizer on Ω if the inequality (30) holds for all x Ω. A point x in the interior int(ω) of Ω is a critical point of g if g has a derivative at x and Dg(x ) = 0. A well-nown result states that if x int(e) is a local minimizer where g is differentiable, then x is a critical point of g. Of course, a critical point need not be local minimizer. But if g has a continuous second derivative at a critical point x intω and the Hessian matrix D g(x ) is positive definite then x is a proper local minimizer; that is, strict inequality holds in (30) for all x U Ω, x x. Conversely, at a local minimizer x, D g(x ) is positive semi-definite. For a differentiable functional g : Ω R n R 1 we call the transposed first derivative g(x) = Dg(x) R n the gradient of g at x Ω. The problem of finding critical points of g is precisely that of solving the gradient system g(x) = 0, x Ω. (31) Conversely, a differentiable mapping F : Ω R n R n is called a gradient or potential mapping on Ω if there exists a differentiable functional g : Ω R n R 1 such that F (x) = g(x) for all x Ω. A continuously differentiable mapping F on an open convex set Ω is a gradient mapping on Ω if and only if DF (x) is symmetric for all x Ω. This is called Kerner s theorem. For any gradient mapping the problem of solving F (x) = 0 may be replaced by that of minimizing the functional g, provided, of course, we eep in mind that a local minimizer of g need not be a critical point, nor that a critical point is necessarily a minimizer. Let g : Ω R n R 1 be a (sufficiently smooth) functional for which we want to compute a minimizer. Many of the iterative methods for this purpose have the general form x +1 = x λ d, = 0, 1,, (3) involving a direction vector d R n and a steplength λ 0 chosen such that g(x ) > g(x +1 ), = 0, 1,, (33) 8

Obviously, it will not suffice to ensure only a decrease of the value of g, but to require that the decrease (33) is sufficiently large. Thus, at the -th step of the methods the major tass are the selection of a suitable direction vector d and the construction of an appropriate steplength λ. The literature in this area is very extensive, see, e.g., [Kel99 and [Rhe98 for an introduction and further references. Clearly, given a current point x Ω, we want to use a (nonzero direction vector d such that for some δ > 0 we have g(x td) g(x) for t [0, δ). From lim t 0 (1/t)[g(x) g(x tp) = Dg(x)p it follows that, in order for this to hold it is sufficient that Dg(x)d > 0 and necessary that Dg(x)d 0. Accordingly, we call a vector d 0 an admissible direction of g at a point x if Dg(x)d > 0. In accordance with the linearization methods () we consider now methods of the form x +1 = x + λ g(x ), = 0, 1,. (34) Hence the direction vectors are here d := g(x ), = 0, 1,. (35) If the matrices B are assumed to be symmetric, positive definite, then we have Dg(x)d = Dg(x) g(x ) = ( g(x )) g(x ) > 0 if g(x ) 0, (36) that is, the directions (35) are admissible. This is the reason, why in section 4 the emphasis was placed on the construction of update formulas that preserve symmetry. Actually many of these update methods also preserve positive definiteness. In particular, this holds for the BFGS formula: 5.1. With the abbreviations (4) suppose that B is symmetric, positive definite, and that (y ) s > 0. Then B +1 given by (9) is also symmetric, positive definite. Proof. By (8) we have ( +1 = I s (y ) ) (y ) s (I y (s ) ) (y ) s + s (s ) (y ) s. (37) Thus, under the stated conditions we have (z B s ) ((s ) B s ) (z B z) z 0, with equality only if z = 0 and s = 0. Moreover, it follows from (9) that z B +1 z = (z y) y s + z B z (z B s ) (s ) B s whence as claimed. z B +1 z > (z y ) (y ) s 0. 9

A step of a descent method of the form (34) with the BFGS update formula has now the generic form: Compute the search direction d = H g(x ); Determine suitable λ > 0 such that g(x ) g(x + λ d ) > 0; s = λ d ; x +1 = x + s ; y = g(x +1 ) g(x ); If (y ) s 0 then return; Update H to H +1 by means of the BFGS formula. Numerous algorithms have been proposed for constructing an acceptable step λ. One of the simplest is the so called Armijo rule, where we search along the line t > 0 x + td for a point such that g(x ) g(x + td ) > tα g(x ), (38) where, say, α = 10 4. More specifically we use a bactracing approach and test (38) first with t = 1 and then with succesively smaller t = β j, j = 0, 1,..., jmax, where 0 < β < 1. In other words, the algorithm has the generic form: input g, x, d, α, β, j max, g = g(x ); p = g(x ); γ = α p ; t = 1; for j = 0 : j max if g g(x + td ) > tγ then return {λ = t}; t = βt; return {f ailure} For the implementation of the overall algorithm one has to decide on the storage of all needed data and on a strategy for a more effective handling of the error case (y ) s 0. These issues are discussed, e.g., in chapter 4 of [Kel99. The simplest approach is to store the entire matrix H, which then allows for the computation of the update once the vectors s and y are available. Clearly, this is costly in storage for large dimensions. A second possibility is to store the sequences {s } and {y } and then to recompute recursively the matrices by means of (9) when they are needed. It turns out that with only a modest increase in complexity the required storage can be decreased to one vector per iteration step. We will not enter into the details, but refer to the discussion in section 4..1 of [Kel99. There also a Matlab implementation bfgsopt involving the above Armijo algorithm is given. Certainly the BFGS updates are not the only possible choice. In fact, numerous other software pacages exist that implement quasi-newton methods for minimization problems have been written; see, e.g., [MW93. 10

References [Deu04 P. Deuflhard, Newton Methods for Nonlinear Problems, Springer verlag, Heidelberg, New Yor, 004. [DS98 [Kel03 [Kel95 [Kel99 J. E. Dennis and Robert B. Schnabel, Numerical methods for unconstrained optimization and nonlinear equations, Classics in Applied Mathematics, Vol 16, SIAM Publications, Philadelphia, PA, 1998. Originally published by Prentice Hall 1983. C. T. Kelley, Solving Nonlinear Equations with Newton s Method, Fundamentals of Algorithms, SIAM Publications, Philadelphia, PA, 003., Iterative Methods for Linear and Nonlinear equations, Frontiers in Appl. Math., vol. 16, SIAM Publications, Philadelphia, PA, 1995., Iterative Methods for Optimization, Frontiers in Appl. Math., vol. 18, SIAM Publications, Philadelphia, PA, 1999. [MW93 J. J. More and S. J. Wright, Optimization Software Guide, SIAM Publications, Philadelphia, PA, 1993. [OR00 [Rhe98 J. M. Ortega and W. C. Rheinboldt, Iterative Solutions of Nonlinear Equations in Several Variables, Classics in Applied Mathematics, Vol 30, SIAM Publications, Philadelphia, PA, 000. Originally published by Academic Press, 1970, Russian translation 1976, Chinese translation 198. W. C. Rheinboldt, Methods for Solving Systems of Nonlinear Equations, Regional Conf. Series in Appl. Math., Vol. 70, Siam Publications, Philadelphia, PA, 1998. 11