Handout on Newton s Method for Systems The following summarizes the main points of our class discussion of Newton s method for approximately solving a system of nonlinear equations F (x) = 0, F : IR n IR n. Conventions: Notation for the nonlinear system is x 1 f 1 (x) x =.., F (x) =. x n f n (x), F (x) = f 1 (x)/ x 1... f 1 (x)/ x n... f n (x)/ x 1... f n (x)/ x n Subscripts are used to denote vector components and matrix entries. Superscripts are used to denote members of sequences, e.g., x (0) is the initial member of {x (k) }. The norm is an arbitrary norm of interest. The phrase x is sufficiently near x means that x x is sufficiently small. Similarly, x is near x means that x x is appropriately small. Newton s method. The basic method is Newton s Method: Given an initial x, x x F (x) 1 F (x). A more appropriate framework for practical implementation is Newton s Method: Given an initial x, evaluate F (x). Update x x + s. The following is our basic local convergence theorem for Newton s method. Theorem 1: Suppose that F is continuously differentiable near x IR n such that F (x ) = 0 and F (x ) is non-singular. Then whenever x (0) is sufficiently near x, the Newton iterates {x (k) } converge to x superlinearly, i.e., with x (k+1) x β k x (k) x, k = 0, 1,... where β k 0. If F also satisfies an inequality F (x) F (x ) L x x (1) 1
for x near x, then the convergence is quadratic, i.e., for a constant β independent of k. x (k+1) x β x (k) x 2, k = 0, 1,... Remark: The property (1) is called Lipschitz continuity of F at x. A proof of local quadratic convergence assuming (1) is given in [1, Th. 5.2.1]. With only a little effort, this can be extended to a proof of local superlinear convergence assuming only continuity of F near x. Newton s method with backtracking. In this, we augment Newton s method with a globalization procedure that tests each step for adequate progress toward a solution and, if necessary, modifies it to obtain a step that gives adequate progress. The globalization considered here is backtracking: at the current approximate solution x, the procedure begins with the Newton step s N = F (x) 1 F (x) and shortens it, if necessary, to obtain an acceptable step s = λs N for some λ (0, 1]. Our test for adequate progress is based on the actual reduction in F and the predicted reduction in F, given, respectively, by ared = F (x) F (x + s), pred = F (x) F (x) + F (x)s. We accept a step s from the current approximate solution x if ared t pred > 0 (2) for a prescribed t (0, 1). The following proposition confirms that a sufficiently short step obtained by backtracking will be acceptable. Proposition 2: If F is differentiable at x and F (x) 0, then a step s = λs N satisfies (2) for all sufficiently small λ > 0. Proof. Note that if s = λs N and 0 < λ 1, then pred = F (x) F (x) + F (x)s = F (x) F (x) + F (x)(λs N ) = F (x) (1 λ)f (x) + λ[f (x) + F (x)s N ] = F (x) (1 λ) F (x) = λ F (x). (3) To justify the third line in (3), we note that F (x) + F (x)s N = 0 since s N = F (x) 1 F (x) and that (1 λ)f (x) = (1 λ) F (x) since 1 λ 0. Then ared = F (x) F (x + s) = F (x) F (x) + F (x)s + o( s ) F (x) F (x) + F (x)s + o( s ) = pred + o( λs N ) = λ F (x) + o(λ). It follows that if F (x) 0 and t (0, 1), then ared t (λ F (x) ) = t pred for all sufficiently small λ > 0. 2
Our first method is the following somewhat general formulation. Newton s Method with Backtracking: Given t (0, 1), 0 < θ min < θ max < 1, and an initial x, evaluate F (x). Evaluate F (x + s). While ared < t pred do: Choose θ [θ min, θ max ]. Update s θs and re-evaluate F (x + s). Update x x + s and F (x) F (x + s). The backtracking globalization is implemented in the while loop. At each pass through the loop, the step s is shortened by a factor θ [θ min, θ max ], where 0 < θ min < θ max < 1. This is known as safeguarded backtracking. The requirement θ θ max < 1 ensures that the step length will be reduced by at least the fraction θ max, and it follows from Proposition 2 that an acceptable step will be determined after at most a finite number of passes through the loop. The requirement 0 < θ min θ ensures that step lengths will not be reduced so much that the iterates cannot converge to a solution. The following is the global convergence result for the method. Theorem 3 [2, Cor. 6.2]: Suppose that F is continuously differentiable and that {x (k) } is a sequence of iterates produced by the method. If x is a limit point 1 of {x (k) } such that F (x ) is non-singular, then F (x ) = 0, x (k) x, and s (k) x (k+1) x (k) = F (x k ) 1 F (x k ) for all sufficiently large k. Note that the theorem does not guarantee that the iterates will always converge to a solution. (Indeed, there can be no such guarantee some problems have no solutions!) Rather, it only asserts that the iterates will behave about as desirably as the function F will allow. Another way of stating the result, which may offer additional insight, is that exactly one of the following must hold: (i) x (k) ; (ii) {x (k) } has one or more limit points, and F is singular at each of them; (iii) {x (k) } converges to a solution x such that F (x ) is nonsingular, and the iterates are ultimately those of Newton s method. In the case of alternative (i), the iterates diverge. In the case of (ii), the iterates may or may not converge, depending on additional properties of F. Alternative (iii) is the 1 We say x is a limit point of {x (k) } if, for every δ > 0, there are infinitely many x (k) such that x (k) x < δ. Note that if {x (k) } is bounded, i.e., there exists an M such that x (k) M for all k, then {x (k) } converges to x if and only if x is the only limit point of {x (k) }. 3
desirable outcome; in this case the iterates converge to a solution, ultimately with the speed of Newton iterates (at least superlinearly and typically quadratically). We now work toward a more refined version of the method. With s = λs N for λ (0, 1] and with pred = λ F (x) by (3), the condition ared < t pred can be simplified to F (x + s) / F (x) > 1 t λ. Also, we can make a sophisticated choice of each θ [θ min, θ max ] in an important (and common) special case: that in which the norm is an inner-product norm, i.e., v = v, v 1/2 for all v IR n, where, is an inner product on IR n. 2 Then, in the while loop, we can choose each θ to minimize over [θ min, θ max ] a quadratic p(θ) = a + bθ + cθ 2 that satisfies p(0) = F (x) 2, p(1) = F (x + s) 2, p (0) = d dθ ( F (x + θs) 2 ) θ=0 = 2 F (x), F (x)s. The quadratic satisfying these conditions is p(θ) = F (x) 2 + 2 F (x), F (x)s θ + { F (x + s) 2 F (x) 2 2 F (x), F (x)s } θ 2. Writing s = λs N and noting F (x)s = λf (x)s N = λf (x), we have F (x), F (x)s = λ F (x) 2 and p(θ) = F (x) 2[ 1 2λθ + { F (x + s) 2 / F (x) 2 1 + 2λ } θ 2]. We have that p (θ) = 0 if and only if θ = λ /{ F (x + s) 2 / F (x) 2 1 + 2λ }, and this θ minimizes p if p (θ) = 2 { F (x + s) 2 / F (x) 2 1 + 2λ } > 0. These observations lead to the following more refined method. 2 An inner product on IR n is a function, from pairs of vectors (u, v) to scalars in IR 1 that satisfies (a) v, v 0 for all v IR n, with v, v = 0 if and only if v = 0; and for all u IR n and v IR n, (b) u, v = v, u, (c) αu, v = α u, v for all α IR 1, and (d) u + v, w = u, w + v, w for all w IR n. The most familiar example is the Euclidean inner-product (the usual dot product), given by u, v = n i=1 u iv i. 4
Newton s Method with Backtracking: Given t (0, 1), 0 < θ min < θ max < 1, and an initial x, evaluate F (x). Evaluate F (x + s) and set λ = 1. While ρ F (x + s) / F (x) > 1 t λ do: If δ ρ 2 1 + 2λ 0, set θ = θ max. Else do: Set θ = λ/δ. If θ > θ max, θ θ max. If θ < θ min, θ θ min. Update s θs, λ θλ, and re-evaluate F (x + s). Update x x + s and F (x) F (x + s). Remarks: Common practical recommendations are to take t = 10 4, θ min = 1/10, and θ max = 1/2. An additional refinement can be added to the backtracking, as follows: After the first step-length reduction in the while loop, there is enough information about F (x + θs) to construct a cubic interpolating polynomial, and one can choose θ to minimize this cubic over [θ min, θ max ]. See [1, Ch. 6] for details. References. 1. J. E. Dennis, Jr., and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Classics in Applied Mathematics, SIAM, Philadelphia, 1996; originally published in Series in Automatic Computation, Prentice Hall, Englewood Cliffs, NJ, 1983. 2. S. C. Eisenstat and H. F. Walker, Globally convergent inexact Newton methods, SIAM J. Optimization, 4 (1994), pp. 393 422. 5