Numerical Methods for Large-Scale Nonlinear Systems

Size: px

Start display at page:

Download "Numerical Methods for Large-Scale Nonlinear Systems"

Ambrose Owens
6 years ago
Views:

1 Numerical Methods for Large-Scale Nonlinear Systems Handouts by Ronald H.W. Hoppe following the monograph P. Deuflhard Newton Methods for Nonlinear Problems Springer, Berlin-Heidelberg-New York, 2004

2 Num. Meth. Large-Scale Nonlinear Systems 2 1. Classical Newton Convergence Theorems 1.1 Classical Newton-Kantorovich Theorem Theorem 1.1 Classical Newton-Kantorovich Theorem Let X and Y be Banach spaces, D X a convex subset, and suppose that that F : D X Y is continuously Fréchet differentiable on D with an invertible Fréchet derivative F (x 0 for some initial guess x 0 D. Assume further that the following conditions hold true: F (x 0 1 F (x 0 α, (1.1 F (y F (x γ y x, x, y D, (1.2 h 0 := α γ F (x 0 1 < 1 2, (1.3 B(x 0, ρ 0 D, ρ 0 := 1 1 2h 0 γ F (x 0 1. (1.4 Then, for the sequence {x k } ln0 of Newton iterates F (x k x k = F (x k, x k+1 = x k + x k there holds (i F (x is invertible for all Newton iterates x = x k, k ln 0, (ii The sequence {x k } ln of Newton iterates is well defined with x k B(x 0, ρ 0, k ln 0, and x k x B(x 0, ρ 0, k ln 0 (k, where F (x = 0, (iii (iv The convergence x k x (k is quadratic, The solution x of F (x = 0 is unique in B(x 0, ρ 0 (D B(x 0, ρ 0, ρ 0 := h 0 γ F (x 0 1. Proof. We have F (x k F (x 0 γ x k x 0 t k for some upper bound t k, k ln. If we can prove x k B(x 0, ρ 0 and t k := F (x 0 1 t k < 1, k ln, then by the

3 Num. Meth. Large-Scale Nonlinear Systems 3 Banach perturbation lemma F (x k is invertible with F (x k 1 F (x F (x 0 1 F (x k F (x 0 (1.5 F (x γ F (x 0 1 x k x 0 F (x 0 1 =: β k. 1 t k We prove x k B(x 0, ρ 0 and t k < 1, k ln, by induction on k: For k = 1 we have x 1 x 0 = F (x 0 1 F (x 0 α = since h 0 < 1 1 2h 0, and h 0 γ F (x 0 1 < ρ 0, t 1 := F (x 0 1 t 1 = γ F (x 0 1 x 1 x 0 = = γ F (x 0 1 F (x 0 1 F (x 0 α γ F (x 0 1 < = h 0 Assuming the assertion to be true for some k ln, for k + 1, using (1.2 we obtain Setting 1 2. x k+1 x k = F (x k 1 F (x k = (1.6 = F (x k 1 (F (x k F (x k 1 F (x k 1 x k 1 = = F (x k 1 F (x k β k γ x k x k 1 2. we thus get the recursion In view of the relationship 0 ( F (x k 1 + s x k 1 F (x k 1 x k 1 ds F (x k 1 + s x k 1 F (x k 1 s γ x k 1 h k := γ x k+1 x k, x k 1 ds h k = 1 2 β k h 2 k 1, k ln. (1.7 γ x k+1 x 0 t k+1 γ x k+1 x k + γ x k x 0 = h k t k,

4 Num. Meth. Large-Scale Nonlinear Systems 4 we consider the recursion Observing (1.5 and (1.7, we find t k+1 = t k + h k. (1.8 t k+1 t k = 1 2 F (x t k (t k t k 1 2. Hence, multiplying both sides with F (x 0 1, we end up with the following three-term recursion for t k : t k+1 t k = t k ( t k t k 1 2, t 0 = 0, t 1 = h 0. (1.9 The famous Ortega-trick allows to reduce (1.9 to a two-term recursion which can be interpreted as a Newton method in lr 1 : Multiplying both sides in (1.9 by 1 t k results in from which we deduce ( t k+1 t k (1 t k = 1 2 ( t k t k 1 2, t k+1 t k+1 t k t 2 k = ψ( t k+1, t k = t k t k t k t 2 k 1 = ψ( t k, t k 1. It follows that from which we deduce ψ( t k+1, t k = ψ( t 1, t 0 = h 0, t k+1 t k = h 0 t k t 2 k 1 t k = ϕ( t k ϕ ( t k, where ϕ : lr lr is given by ϕ( t := h 0 t t 2. Obviously, ϕ has the zeroes t 1 := 1 1 2h 0, t 2 := h 0.

5 Num. Meth. Large-Scale Nonlinear Systems 5 Since ϕ is convex, the Newton method converges for t 1 to t 1. It follows from the definition of t k that x k B(x 0, ρ 0. Moreover, as a consequence of (1.6 we readily find that {x k } ln is a Cauchy sequence in B(x 0, ρ 0. Hence, there exists x B(x 0, ρ 0 such that x k x (k and x k = F (x k 1 F (x k F (x 1 F (x = 0, whence F (x = 0. The quadratic convergence can be deduced from (1.6 as well. Finally, the uniqueness of x in B(x 0, ρ 0 (D B(x 0, ρ 0 follows readily from the properties of the function ϕ. 1.2 Classical Newton-Mysovskikh Theorem Theorem 1.2 Classical Newton-Mysovskikh Theorem Let X and Y be Banach spaces, D X a convex subset, and suppose that that F : D X Y is continuously Fréchet differentiable on D with invertible Fréchet derivatives F (x, x D, and let x 0 D be some initial guess. Assume further that the following conditions hold true: Then, for the sequence {x k } ln0 there holds F (x 0 1 F (x 0 α, (1.10 F (x 1 β, x D, (1.11 F (y F (x ω y x, x, y D, (1.12 h 0 := 1 2 β ω F (x 0 1 F (x α β ω < 1, (1.13 B(x 0, ρ D, ρ := α h 2j 1 α 0. ( h 0 j=0 of Newton iterates F (x k x k = F (x k, x k+1 = x k + x k (i x k B(x 0, ρ, k ln 0, and there exists x B(x 0, ρ such that F (x = 0 and x k x (k, (ii x k+1 x k 1 2 β ω xk x k 1, k ln, (iii x k x ε k x k x k 1 2, where ε k := 1 2 β ω (1 + j=1 (h 2k 0 2j 1 2 β ω 1 h 2k 0.

6 Num. Meth. Large-Scale Nonlinear Systems 6 Proof. Observing F (x k 1 x k 1 + F (x k 1 = 0, we obtain x k = F (x k 1 F (x k = = F (x k 1 ( F (x k F (x k 1 F (x k 1 x k 1 1 F (x k β ω xk 1 2, 0 ( F (x k 1 + s x k 1 F (x k 1 x k 1 ds which gives the assertion (ii. We now prove that {x k } ln0 is a Cauchy sequence in B(x 0, ρ. By induction on k we show For k = 0, we have in view of (1.10 x k+1 x k 2 βω h2k 0, k ln 0. (1.15 x 0 α = 2 βω h 0. Assuming (1.15 to be true for some k ln, we get x k+2 x k+1 = x k β ω xk 2 1 ( β ω 2 βω h2k 0 = h2k+1 0. βω It follows readily from (1.15 that x k+1 B(x 0, ρ: x k+1 x 0 x k+1 x k x 1 x 0 2 ( h 2k h 0 = 2 βω βω h 0 h 2j 1 0 = ρ. j=0 = α Similarly, it can be shown that x m+k x m 0 (,.

7 Num. Meth. Large-Scale Nonlinear Systems 7 Since {x k } ln0 is a Cauchy sequence in B(x 0, ρ, there exists x B(x 0, ρ such that x k x (k. Hence, x k = F (x k 1 F (x k F (x F (x = 0, and thus F (x = 0 which proves (i. The assertion (iii is shown as follows: Setting we obtain h k := 1 2 β ω xk, x k x = lim k<m xk x m [ ] lim x m x m x k+1 x k k<m 2 [ ] lim h m h k = βω k<m = 2h k βω lim k<m [ 1 + h k+1 h k h m 1 h k ]. On the other hand, taking (ii into account whence h k ( βω 2 2 x k 1 2 = h 2 k 1, h k+l h 2l k, k ln 0. We conclude x k x 1 2 β ω xk 1 2 [ 1 2 β ω (1 + j=1 h 2j k x k 1 2, ] 1 + h 2 k +... which proves (iii.

8 Num. Meth. Large-Scale Nonlinear Systems 8 2. Affine Invariant/Conjugate Newton Convergence Theorems 2.1 Affine Covariant Newton Convergence Theorems Theorem 2.1 Affine Covariant Newton-Kantorovich Theorem Let F : D lr n lr n be continuously differentiable on D with an invertible Jacobian F (x 0 for some initial guess x 0 D. Assume further that the following conditions hold true: F (x 0 1 F (x 0 α, (1.16 F (x 0 1 (F (y F (x γ y x, x, y D, (1.17 h 0 := α γ < 1 2, (1.18 B(x 0, ρ 0 D, ρ 0 := 1 1 2h 0 γ. (1.19 Then, for the sequence {x k } ln0 of Newton iterates F (x k x k = F (x k, x k+1 = x k + x k there holds (i F (x is invertible for all Newton iterates x = x k, k ln 0, (ii The sequence {x k } ln of Newton iterates is well defined with x k B(x 0, ρ 0, k ln 0, and x k x B(x 0, ρ 0, k ln 0 (k, where F (x = 0, (iii (iv The convergence x k x (k is quadratic, The solution x of F (x = 0 is unique in B(x 0, ρ 0 (D B(x 0, ρ 0, ρ 0 := h 0 γ. Proof. First homework assignment.

9 Num. Meth. Large-Scale Nonlinear Systems 9 Theorem 2.2 Affine Covariant Newton-Mysovskikh Theorem Let F : D lr n lr n, d lr n convex, be continuously differentiable on D with invertible Jacobians F (x, x D, and let x 0 D be some initial guess. Assume further that the following conditions hold true: F (x 0 1 F (x 0 α, (1.20 F (z (F 1 (y F (x (y x ω y x 2, x, y, z D (1.21, h 0 := ω x 0 α ω < 2, (1.22 B(x 0, ρ D, ρ := x0. ( h 0 2 Then, for the sequence {x k } ln0 of Newton iterates F (x k x k = F (x k, x k+1 = x k + x k there holds x k B(x 0, ρ, k ln 0, and there exists x B(x 0, ρ such that F (x = 0 and x k x (k with x k+1 x k 1 2 ω xk x k 1 2, x k x x k x k ω xk x k 1. Proof. slight modification of the Classical Newton-Mysovskikh Theorem. 2.2 Affine Contravariant Newton Convergence Theorem Theorem 2.3 Affine Contravariant Newton-Mysovskikh Theorem Let F : D lr n lr n, d lr n convex, be continuously differentiable on D with invertible Jacobians F (x, x D, and let x 0 D be some initial guess. Assume further that the following conditions hold true: ( F (y F (x (y x ω F (x(y x 2, x, y D,(1.24 L ω D, L ω := {x D F (x < 2 ω, (1.25 h 0 := ω F (x 0 < 2. (1.26

10 Num. Meth. Large-Scale Nonlinear Systems 10 Then, the sequence {x k } ln0 of Newton iterates stays in L ω, and there exists an x L ω such that x k x for some subsequence N ln and F (x = 0. Moreover, for the residuals F (x k there holds F (x k 1 2 ω F (xk 2. Proof. We first prove x k L ω by induction on k: (i k = 0: in view of (1.26 F (x 0 < 2 ω = x 0 L ω. (ii Assume that the assertion holds true for some k ln. (iii For any λ [0, 1] such that x k + t x k L ω, t [0, λ], we have F (x k + λ x k = F (x k + λ F (x k + t x k x k dt. 0 Since F (x k = F (x k x k, F (x k = λ F (x k x k + (1 λ F (x k, and hence, F (x k + λ x k = (1.27 λ = 0 [( ] F (x k + t x k F (x k x k + (1 λ F (x k dt λ 0 dt + (1 λ F (x ( F (x k + t x k F (x k x k k ωt F (x k x k 2 } {{ } λ ω F (x k x k 2 t dt + (1 λ F (x k = 0 = F (x k

11 Num. Meth. Large-Scale Nonlinear Systems 11 = (1 λ ωλ2 F (x k F (x k. We assume x k+1 = x k + x k / L ω. Then there exists λ := min{λ (0, 1] x k + λ x k L ω }. It follows from (1.27 F (x k + λ x k (1 λ ωλ2 F (x k F (x k < < 2 ω < (1 λ + λ 2 < 1 F (x k < 2 ω < 2 ω, and hence, x k + λ x k L ω which is a contradiction. For λ = 1, (1.27 gives the asserted residual estimate. For the proof of the rest of the assertion, we define the residual oriented Kantorovich quantities then, (1.26 implies i.e., Since h 0 < 2, for k = 0 we obtain h k := ω F (x k. ω F (x k ω2 F (x k, h k h2 k = 1 2 h k h k. h h 0 < 2 h 0 < h 0,

12 Num. Meth. Large-Scale Nonlinear Systems 12 and an induction argument shows Moreover, h k+1 < h k < 2, k ln 0. F (x k+1 < F (x k < 2 ω and lim k F (xk = 0, which implies x k L ω D, k ln. Since L ω is bounded, there exist x L ω and a subsequence ln ln such that x k x (k ln and F (x = 0. Affine conjugacy Assume that D lr n is a convex set and that f : D lr is a strictly convex functional. Consider the minimization problem min f(x. x D Then, a necessary and sufficient optimality condition is given by the nonlinear equation F (x = grad f(x = f (x T = 0, x D. We note that the Jacobian F (x = f (x is symmetric and uniformly positive definite on D. In particular, F (x 1/2 is well defined and symmetric, positive definite as well. Consequently, the energy product (u, v E := u T F (xv, u, v, x D defines locally an inner product with associated norm u 2 E = u T F (xu = F (x 1/2 u 2 which is referred to as a local energy norm. For regular B lr n n, we consider the transformed minimization problem min y g(y, g(y := f(by, x = By.

13 Num. Meth. Large-Scale Nonlinear Systems 13 We obtain the optimality condition G(y = grad g(y = (f (ByB T = B T f (x T = B T F (By = 0 with the transformed Jacobian G (y = B T F (xb. Hence, the Jacobian transformation is conjugate which motivates the notion of affine conjugacy.

14 Num. Meth. Large-Scale Nonlinear Systems 14 An appropriate affine conjugate Lipschitz condition is as follows F (z (F 1/2 (y F (x (y x ω F (z 1/2 (y x Affine Conjugate Newton Convergence Theorem Theorem 2.4 Affine Conjugate Newton-Mysovskikh Theorem Assume that D lr n is a convex domain and f : D lr a strictly convex, twice continuously differentiable functional. Let F (x = f (x T and F (x = f (x. Consider the minimization problem and the associated optimality condition min f(x (1.28 x D F (x = grad f(x = 0, x D. (1.29 Note that (1.28 has a unique solution x D. Let x 0 D be an initial guess and assume that the following conditions are satisfied: F (z (F 1/2 (y F (x (y x ω F (z 1/2 (y x 2 (1.30 for collinear x, y, z D and h 0 := ω F (x 0 1/2 x 0 < 2, (1.31 L 0 := {x D f(x < f(x 0 } is compact. (1.32 Then, for the Newton iterates x k, k ln 0, there holds: (i x k L 0, k ln 0, and x k x (k with F (x k+1 1/2 x k ω F (x k 1/2 x k 2. (1.33 (ii For ε k := F (x k 1/2 x k 2 and the Kantorovich quantities h k := ω ε 1/2 k have 1 6 h k ε k f(x k f(x k ε k 1 6 h k ε k, ( ε k f(x k f(x k ε k. (1.35 (iii We have the a priori estimate f(x 0 f(x we 5 6 ε 0 1 h 0 2. (1.36

15 Num. Meth. Large-Scale Nonlinear Systems 15 Proof: Assertion (i and (1.33 can be verified as in the proof of the affine contravariant version of the Newton-Mysovskikh theorem. For the proof of (1.34 in (ii, observing F (x k x k = F (x k, we obtain f(x k+1 f(x k F (x k 1/2 x k 2 = = 1 s=0 < F (x k + s x k, x k > ds < F (x k, x k > < F (x k x k, x k > < F (x k x k, x k > = = 1 < F (x k + s x k F (x k, x k > ds 1 2 < F (x k x k, x k > = = s=0 1 1 < F (x k + st x k x k, x k > dt ds 1 2 < F (x k x k, x k >= = = s=0 t=0 1 1 s=0 1 s=0 1 s=0 s s t=0 1 s t=0 1 t=0 < (F (x k + st x k F (x k x k =: w k, x k > dt ds = < F (x k 1/2 w k, F (x k 1/2 x k > dt ds F (x k 1/2 w k ωst F (x k 1/2 x k 2 F (x k 1/2 x k 2 ω F (x k 1/2 x k = ε k which proves (1.34. = h k F (x k 1/2 x k dt ds 1 s=0 1 s 2 t=0 Using the right-hand side of (1.34 and h k < 2 yields f(x k f(x k+1 ( h k ε k < 5 6 ε k. t dt 1 6 h k ε k,

16 Num. Meth. Large-Scale Nonlinear Systems 16 Likewise, using the left-hand side of (1.34 and h k < 2 f(x k f(x k+1 ( h k ε k > 1 6 ε k. Together, this proves (1.35. In order to prove (iii, we use (1.34 and obtain 0 ω 2 (f(x 0 f(x ω 2 (f(x k f(x k+1 < 5 6 ω2 ε k = = 5 6 h2 k = (1 2 h k 2. k=0 Using we further get 1 2 h k+1 ( 1 2 h k h k < 1, ( 1 2 h ( 1 2 h ( 1 2 h ( 1 2 h ( 1 2 h h2 0 ( 1 2 h 0 k = k=0 1 4 h2 0 1 h 0 2, which proves (1.36.

17 Num. Meth. Large-Scale Nonlinear Systems Inexact Newton Methods We recall that Newton s method computes iterates successively as the solution of linear algebraic systems F (x k x k = F (x k, k ln 0, (1.37 x k+1 = x k + x k. The classical convergence theorems of Newton-Kantorovich and Newton-Mysovskikh and its affine covariant, affine contravariant, and affine conjugate versions assume the exact solution of (1.37. In practice however, in particular if the dimension is large, (1.37 will be solved by an iterative method. In this case, we end up with an outer/inner iteration, where the outer iterations are the Newton steps and the inner iterations result from the application of an iterative scheme to (1.37. It is important to tune the outer and inner iterations and to keep track of the iteration errors. With regard to affine covariance, affine contravariance, and affine conjugacy the iterative scheme for the inner iterations has to be chosen in such a way, that it easily provides information about the error norm in case of affine covariance, residual norm in case of affine contravariance, and energy norm in case of affine conjugacy. Except for convex optimization, we cannot expect F (x, x D, to be symmetric positive definite. Hence, for affine covariance and affine contravariance we have to pick iterative solvers that are designed for nonsymmetric matrices. Appropriate candidates are CGNE (Conjugate Gradient for the Normal Equations in case of affine covariance, GMRES (Generalized Minimum RESidual in case of affine contravariance, and PCG (Preconditioned Conjugate Gradient in case of affine conjugacy.

18 Num. Meth. Large-Scale Nonlinear Systems Affine Covariant Inexact Newton Methods CGNE (Conjugate Gradient for the Normal Equations We assume A lr n n to be a regular, nonsymmetric matrix and b lr n to be given and look for y lr n as the unique solution of the linear algebraic system Ay = b. (1.38 As the name already suggests, CGNE is the conjugate gradient method applied to the normal equations: It solves the system for z and then computes y according to The implementation of CGNE is as follows: CGNE Initialization: AA T z = b, (1.39 y = A T z. (1.40 Given an initial guess y 0 lr n, compute the residual r 0 = b Ay 0 and set p 0 = r 0, p 0 = 0, β 0 = 0, σ 0 = r 0 2. CGNE Iteration Loop: For 1 i i max compute p i = A T r i 1 + β i 1 p i 1, α i = σ i 1 p i 2, y i = y i 1 α i p i, γ 2 i 1 = α i σ i 1, r i = r i 1 α i Ap i, σ i = r i 2, β i = σ i σ i 1. CGNE has the error minimizing property y y i = min y v, (1.41 v K i (A T r 0,A T A where K i (A T r 0, A T A stands for the Krylov subspace K i (A T r 0, A T A := span{a T r 0, (A T AA T r 0,..., (A T A i 1 A T r 0 }. (1.42

19 Num. Meth. Large-Scale Nonlinear Systems 19 Lemma 3.1 Representation of the iteration error Let ε i := y y i 2 be the square of the CGNE iteration error with respect to the i-th iterate. Then, there holds ε i = n 1 γj 2. (1.43 j=i Proof. CGNE has the Galerkin orthogonality (y i y 0, y i+m y i = 0, m ln. (1.44 Setting m = 1, this implies the orthogonal decomposition y i+1 y 0 2 = y i+1 y i 2 + y i y 0 2, (1.45 which readily gives y i y 0 2 = i 1 y j+1 y j 2 = j=0 i 1 γj 2. (1.46 j=0 On the other hand, observing y n = y, for m = n i the Galerkin orthogonality yields y y 0 2 = n 1 γj 2 j=0 = y y i 2 = ε 2 i + y i y 0 2 = i 1 γj 2 j=0. (1.47 Computable lower bound for the iteration error It follows readily from Lemma 3.1 that the computable quantity [ε i ] := i+m γj 2, m ln, (1.48 j=i provides a lower bound for the iteration error. In practice, we will test the relative error norm according to δ i := y y i y i [εi ] y i δ, (1.49 where δ is a user specified accuracy.

20 Num. Meth. Large-Scale Nonlinear Systems Convergence of affine covariant inexact Newton methods We denote by δx k lr n the result of an inner iteration, e.g., CGNE, for the solution of (1.37. Then, it is easy to see that the iteration error δx k x k satisfies the error equation F (x k (δx k x k = F (x k + F (x k δx k =: r k. (1.50 We will measure the impact of the inexact solution of (1.37 by the relative error δ k := δxk x k δx k. (1.51 Theorem 3.1 Affine covariant convergence theorem for the inexact Newton method. Part I: Linear convergence Suppose that that F : D lr n lr n is continuously differentiable on D with invertible Fréchet derivatives F (x, x lr n. Assume further that the following affine covariant Lipschitz condition is satisfied F (z (F 1 (y F (x v ω y x v, (1.52 where x, y, z D, v lr n. Assume that x 0 D is an initial guess for the outer Newton iteration and that δx 0 = 0 is chosen as the startiterate for the inner iteration. Consider the Kantorovich quantities h k := ω x k, h δ k := ω δx k = h k 1 + δ 2 k (1.53 associated with the outer and inner iteration. Assume that h 0 < 2 Θ, 0 Θ < 1, (1.54 and control the inner iterations according to ϑ(h k, δ k := 1 2 hδ k + δ k(1 + h δ k Θ < 1, ( δ 2 k which implies linear convergence. Note that a necessary condition for ϑ(h k, δ k Θ is that it holds true for δ k = 0, which is satisfied due to assumption (1.37.

21 Num. Meth. Large-Scale Nonlinear Systems 21 Then, there holds: (i The Newton CGNE iterates x k, k ln 0 stay in B(x 0, ρ, ρ := δx0 1 Θ (1.56 and converge linearly to some x B(x 0, ρ with F (x = 0. (ii The exact Newton increments decrease monotonically according to x k+1 x k Θ, (1.57 whereas for the inexact Newton increments we have δx k+1 δx k 1 + δ 2 k 1 + δ 2 k+1 Θ Θ. (1.58 Proof. By elementary calculations we find x k+1 = F (x k+1 1 F (x k+1 = (1.59 ] = F (x k+1 [F 1 (x k+1 F (x k + F (x k+1 1 F (x k = r k F (x k δx k = F (x k+1 1 [ F (x k+1 F (x k F (x k δx k ] + + F (x k+1 1 r k = F (x k (δx k x k 1 ] F (x k+1 [F 1 (x k + tδx k F (x k δx k dt + 0 } {{ } =: I + F (x k+1 1 F (x k (δx k x k =: II.

22 Num. Meth. Large-Scale Nonlinear Systems 22 Using the affine covariant Lipschitz condition (1.52, the first term on the right-hand side in (1.59 can be estimated according to I ω δx k 2 1 t dt = 1 2 ω δxk 2. ( For the second term we obtain by the same argument ] II = F (x k+1 [F 1 (x k (δx k x k ± F (x k+1 (δx k x k (1.61 F (x k+1 1 (F (x k+1 F (x k (δx k x k + + F (x k+1 1 F (x k+1 (δx k x k 1 2 ω δxk δx k x k + δx k x k 2. Combining (1.60 and (1.61 yields δx k+1 δx k 1 2 ω δxk = h δ k h δ k + δ k (1 + h δ k. Observing (1.53, we finally get x k+1 x k ϑ(h k, δ k = ω δxk δxk x k δx k = δ k h δ k + δxk x k δx k = δ k 1 2 hδ k + δ k(1 + h δ k Θ < 1, ( δ 2 k which implies linear convergence. Note that a necessary condition for ϑ(h k, δ k Θ is that it holds true for δ k = 0, which is satisfied due to assumption (1.54. For the contraction of the inexact Newton increments we get δx k+1 δx k = 1 + δ 2 k 1 + δ 2 k+1 x k+1 x k 1 + δ 2 k 1 + δ 2 k+1 Θ Θ. (1.63 It can be easily shown that {x k } ln0 is a Cauchy sequence in B(x 0, ρ. Consequently, there exists x B(x 0, ρ such that x k x (k. Since we conclude F (x = 0. F (x k δx k 0 = F (x k + r k, F (x

23 Num. Meth. Large-Scale Nonlinear Systems 23 Theorem 3.2 Affine covariant convergence theorem for the inexact Newton method. Part II: Quadratic convergence Under the same assumptions on F : D lr n lr n as in Theorem 3.1 suppose that the initial guess x 0 D satisfies h 0 < ρ for some appropriate ρ > 0 and control the inner iterations such that Then, there holds: (i δ k ρ 2 h δ k 1 + h δ k The Newton CGNE iterates x k, k ln 0 stay in B(x 0, ρ, ρ := (1.64. (1.65 δx ρ 2 h 0 (1.66 and converge quadratically to some x B(x 0, ρ with F (x = 0. (ii The exact Newton increments and the inexact Newton increments decrease quadratically according to x k ρ 2 δx k ρ 2 ω x k 2, (1.67 ω δx k 2. (1.68 Proof. We proceed as in the proof of Theorem 3.1 to obtain x k+1 x k ϑ(h k, δ k = 1 2 hδ k + δ k(1 + h δ k. 1 + δ 2 k and δx k+1 δx k = 1 + δ 2 k 1 + δ 2 k+1 x k+1 x k. In view of (1.65 we get the further estimates x k+1 x k 1 + ρ 2 h k 1 + δ 2 k 1 + ρ 2 h k.

24 Num. Meth. Large-Scale Nonlinear Systems 24 and δx k+1 δx k 1 + ρ 2 h δ k 1 + δ 2 k ρ 2 h δ k, from which (1.67 and (1.68 follow by the definition of the Kantorovich quantities. In order to deduce quadratic convergence we have to make sure that the initial increments (k = 0 are small enough, i.e., 1 + ρ 2 h δ ρ 2 h 0 < 1. (1.69 Furthermore, (1.68 and (1.69 allow us to show that the iterates x k, k ln stay in B(x 0, ρ. Indeed, (1.68 implies and hence, δx j 1 + ρ 2 x k x h j 1 δx j ρ 2 k δx j δx 0 j=0 k j=0 h 0 δx j 1, j ln, ( 1 + ρ 2 h 0 j δx ρ 2 h Algorithmic aspects of affine covariant inexact Newton methods (i Convergence monitor Let us assume that the quantity Θ < 1 in both the linear convergence mode and the quadratic convergence mode has been specified and let us further assume that we use CGNE with δx k 0 = 0 in the inner iteration. Then, (1.58 suggests the monotonicity test Θ k := 1 + δ2 k δ 2 k δx k+1 δx k Θ, (1.70 where δ 2 k and δ 2 k+1 are computationally available estimates of δ 2 k and δ2 k+1. (ii Termination criterion We recall that the termination criterion for the exact Newton iteration with respect to a user specified accuracy XT OL is given by x k 1 Θ 2 k 1 XTOL.

25 Num. Meth. Large-Scale Nonlinear Systems 25 According to (1.53 we have x k = 1 + δ 2 k δxk. Consequently, replacing Θ k 1 and δ k by the computable quantities Θ k 1 and δ k, we arrive at the termination criterion 1 + δ 2 k 1 Θ 2 k 1 XTOL. (1.71 (iii Balancing outer and inner iterations According to (1.55 of Theorem 3.1, in the linear convergence mode the adaptive termination criterion for the inner iteration is ϑ(h k, δ k := 1 2 hδ k + δ k(1 + h δ k Θ < δ 2 k On the other hand, in view of (1.65 of Theorem 3.2, in the quadratic convergence mode the termination criterion is δ k ρ 2 h δ k 1 + h δ k. Since the theoretical Kantorovich quantities (cf. (1.53 h δ k = ω δx k = h k 1 + δ 2 k are not directly accessible, we have to replace them by computationally available estimates [h δ k ]. We recall that for h k we have the a priori estimate [h k ] = 2 Θ 2 k 1 h k. Consequently, replacing δ k by δ k, h k by [h k ], and Θ k 1 by Θ k 1 (cf. (1.70, we get the a priori estimates [h δ k] = [h k ] 1 + δ 2 k, [h k ] = 2 Θ 2 k 1, k ln. (1.72 For k = 0, we choose δ 0 = δ 0 = 1 4. In practice, for k 1 we begin with the quadratic convergence mode and switch

26 Num. Meth. Large-Scale Nonlinear Systems 26 to the linear convergence mode as soon as the approximate contraction factor Θ k is below some prespecified threshold value Θ 1 2. (iii 1 Quadratic convergence mode The computationally realizable termination criterion for the inner iteration in the quadratic convergence mode is δ k ρ 2 [h δ k ] 1 + [h δ k ]. (1.73 Inserting (1.72 into (1.73, we obtain a simple nonlinear equation in δ k. Remark 3.1 Validity of the approximate termination criterion Observing that the right-hand side in (1.73 is a monotonically increasing function of [h δ k ], and taking [hδ k ] hδ k into account, it follows that for δ k δ k the approximate termination criterion (1.73 implies the exact termination criterion (1.65. Remark 3.2 Computational work in the quadratic convergence mode Since δ k 0 (k is enforced, it follows that: The more the iterates x k approach the solution x, the more computational work is required for the inner iterations to guarantee quadratic convergence of the outer iteration. (iii 2 Linear convergence mode We switch to the linear convergence mode, once the criterion Θ k < Θ (1.74 is met. The computationally realizable termination criterion for the inner iteration in the linear convergence mode is [ϑ(h k, δ k ] := ϑ([h k ], δ k = 1 2 [hδ k ] + δ k(1 + [h δ k ] 1 + δ 2 k Θ. (1.75 Remark 3.3 Validity of the approximate termination criterion Since the right-hand side in (1.75 is a monotonically increasing function in [h δ k ] and [h δ k ] hδ k, the estimate provided by (1.75 may be too small and thus result in an overestimation of δ k. However, since the exact quantities and their a priori estimates both tend to zero as k approaches infinity, asymptotically we may rely on (1.75.

27 Num. Meth. Large-Scale Nonlinear Systems 27 In practice, we require the monotonicity test (1.70 in CGNE and run the inner iterations until δ k satisfies (1.75 or divergence occurs, i.e., Remark 3.4 Θ k > 2 Θ. Computational work in the linear convergence mode As opposed to the quadratic convergence mode, we observe The more the iterates x k approach the solution x, the less computational work is required for the inner iterations to guarantee linear convergence of the outer iteration.

28 Num. Meth. Large-Scale Nonlinear Systems Affine Contravariant Inexact Newton Methods GMRES (Generalized Minimum RESidual The Generalized Minimum RESidual Method (GMRES is an iterative solver for nonsymmetric linear algebraic systems which generates an orthogonal basis of the Krylov subspace K i (r 0, A := span{r 0, Ar 0,..., A i 1 r 0 }. (1.76 by a modified Gram-Schmidt orthogonalization called the Arnoldi method. The inner product coefficients are stored in an upper Hessenberg matrix so that an approximate solution can be obtained by the solution of a leastsquares problem in terms of that Hessenberg matrix: GMRES Initialization: Given an initial guess y 0 lr n, compute the residual r 0 = b Ay 0 and set β := r 0, v 1 := r 0 β, V 1 := v 1. (1.77 GMRES Iteration Loop: For 1 i i max : I. Orthogonalization: II. Normalization: ˆv i+1 = Av i V i h i, (1.78 where h i = V T i Av i. (1.79 ˆv i+1 = ˆv i+1 ˆv i+1. (1.80 III. Update: V i+1 = (V i v i+1. (1.81 H i = ( hi ˆv i+1, i = 1, (1.82 H i = ( Hi 1 h i 0 ˆv i+1, i > 1. (1.83

29 Num. Meth. Large-Scale Nonlinear Systems 29 IV. Least squares problem: Compute z i as the solution of β e 1 V. Approximate solution: H i z i = min z lr n β e 1 H i z. (1.84 y i = V i z i + y 0. (1.85 GMRES has the residual norm minimizing property b Ay i = min b Az. (1.86 z K i (r 0,A Moreover, the inner residuals decrease monotonically r i+1 r i, i ln 0. (1.87 Termination criterion for the GMRES iteration The residuals satisfy the orthogonality relation from which we readily deduce (r i, r i r 0 = 0, i ln, (1.88 r 0 2 = r i r r i 2, i ln. (1.89 We define the relative residual norm error Clearly, η i < 1, i ln, and η i := r i r 0. (1.90 η i+1 < η i if η i 0. (1.91 Consequently, given a user specified accuracy η, an appropriate adaptive termination criterion is We note that, in terms of η i, (1.89 can be written as η i η. (1.92 r i r 0 2 = (1 η 2 i r 0 2. (1.93

30 Num. Meth. Large-Scale Nonlinear Systems Convergence of affine contravariant inexact Newton methods We denote by δx k lr n the result of the inner GMRES iteration. As initial values for GMRES we choose δx k 0 = 0, r k 0 = F (x k. (1.94 Consequently, during the inner GMRES iteration the relative error η i, i ln 0, in the residuals satisfies η i = rk i F (x k 1, η i+1 < η i, if η i 0. (1.95 In the sequel, we drop the subindices i for the inner iterations and refer to η k as the final value of the inner iterations at each outer iteration step k. Theorem 3.3 Affine contravariant convergence theorem for the inexact Newton GMRES method. Part I: Linear convergence Suppose that F : D lr n lr n is continuously differentiable on D and let x 0 D be some initial guess. Let further the following affine contravariant Lipschitz condition be satisfied (F (y F (x(y x ω F (x(y x 2, x, y D, ω 0. (1.96 Assume further that the level set L 0 := {x lr n F (x F (x 0 } (1.97 is a compact subset of D. In terms of the Kantorovich quantities h k := ω F (x k, k ln 0. (1.98 the outer residual norms can be bounded according to F (x k+1 (η k (1 η2k h k F (x k. (1.99 Assume that and control the inner iterations according to h 0 < 2 (1.100 η k Θ 1 2 h k, (1.101

31 Num. Meth. Large-Scale Nonlinear Systems 31 for some h 0 2 < Θ < 1. Then, the Newton GMRES iterates x k, k ln 0 stay in L 0 and converge linearly to some x L 0 with F (x = 0 at an estimated rate F (x k+1 Θ F (x k. (1.102 Proof. We recall that the Newton GMRES iterates satisfy F (x k δx k = F (x k + r k, (1.103 x k+1 = x k + δx k. (1.104 It follows from the generalized mean value theorem that F (x k+1 = F (x k + 1 F (x k + tδx k δx k dt. (1.105 Consequently, replacing F (x k in (1.105 by (1.103, we obtain 1 F (x k+1 = 0 ( F (x k + tδx k F (x k δx k dt + r k 0 1 ( F (x k + tδx k F (x k δx k dt + r k ω F (x k δx k 2 + r k 1 2 ω F (xk r k 2 + r k. We recall (1.93 r k F (x k 2 = (1 η 2 k F (x k 2, from which (1.99 can be immediately deduced. Now, in view of (1.101, (1.99 yields ( F (x k+1 η k (1 η2 kh k F (x k Θ 1 2 h k (Θ 1 2 η2 k h k F (x k Θ F (x k. Taking advantage of the previous inequality, by induction on k it follows that x k L 0 D, k ln 0.

32 Num. Meth. Large-Scale Nonlinear Systems 32 Hence, there exists a subsequence ln ln and an x L 0 such that x k x (k ln and F (x = 0. Moreover, since F (x k+l F (x k F (x k+l + F (x k (1 + Θ l F (x k (1 + Θ l Θ k F (x 0 0 (k ln, the whole sequence must converge to x. Theorem 3.4 Affine contravariant convergence theorem for the inexact Newton GMRES method. Part II: Quadratic convergence Under the same assumptions on F : D lr n lr n as in Theorem 3.3 suppose that the initial guess x 0 D satisfies h 0 < ρ for some appropriate ρ > 0 and control the inner iterations such that (1.106 η k 1 η 2 k ρ 2 h k. (1.107 Then, the Newton GMRES iterates x k, k ln 0 stay in L 0 and converge quadratically to some x B(x 0, ρ with F (x = 0 at an estimated rate F (x k ω (1 + ρ (1 η2 k F (x k 2. (1.108 Proof. Inserting (1.107 into (1.99 and observing h k = ω F (x k gives the assertion Algorithmic aspects of affine contravariant inexact Newton methods (i Convergence monitor Throughout the inexact Newton GMRES iteration we use the residual monotonicity test Θ k := F (xk+1 F (x k Θ < 1. (1.109 The iteration is considered as divergent, if Θ k > Θ. (1.110

33 Num. Meth. Large-Scale Nonlinear Systems 33 (ii Termination criterion As in the exact Newton iteration, specifying a residual accuracy F T OL, the termination criterion for the inexact Newton GMRES iteration is (iii Balancing outer and inner iterations F (x k FTOL. (1.111 With regard to (1.101 of Theorem 3.3, in the linear convergence mode the adaptive termination criterion for the inner GMRES iteration is η k Θ 1 2 h k, whereas, in view of (1.107 of Theorem 3.4, in the quadratic convergence mode the termination criterion is η k 1 η 2 k ρ 2 h k. Again, we replace the theoretical Kantorovich quantities h k by some computationally easily available a priori estimates. We distinguish between the quadratic and the linear convergence mode: (iii 1 Quadratic convergence mode We recall the termination criterion (1.107 for the quadratic convergence mode η k 1 η 2 k ρ 2 h k. It suggests the a posteriori estimate [h k ] 2 := 2 Θ k (1 + ρ (1 η 2 k h k. In view of h k+1 = Θ k h k, this implies the a priori estimate [h k+1 ] := Θ k [h k ] 2 Θ k h k = h k+1. (1.112 Using (1.112 in (1.107 results in the computationally feasible termination criterion η k 1 η 2 k 1 2 ρ [h k], ρ 1.0. (1.113

34 Num. Meth. Large-Scale Nonlinear Systems 34 (iii 2 Linear convergence mode We switch from the quadratic to the linear convergence mode, if the local contraction factor satisfies The proof of the previous theorems reveals Θ k < Θ. (1.114 F (x k+1 r k ω 2 F (xk r k 2 = 1 2 (1 η2 k h k F (x k. (1.115 The above inequality (1.115 implies the a posteriori estimate [h k ] 1 := 2 F (xk+1 r k (1 η 2 k F (xk h k (1.116 and the a priori estimate Based on (1.117 we define If we find [h k+1 ] := Θ k [h k ] 1 h k+1. (1.117 η k+1 := Θ 1 2 [h k+1]. (1.118 η k+1 < η k (1.119 with η k from (1.113, we continue the iteration in the quadratic convergence mode. Otherwise, we realize the linear convergence mode with some η k+1 η k+1. (1.120

35 Num. Meth. Large-Scale Nonlinear Systems Affine Conjugate Inexact Newton Methods PCG (Preconditioned Conjugate Gradient The Preconditioned Conjugate Gradient Method (PCG is an iterative solver for linear algebraic systems with a symmetric positive definite coefficient matrix A lr n n. We recall that any symmetric positive definite matrix C lr n n defines an energy inner product (, C according to (u, v C := (u, Cv, u, v lr n. The associated energy norm is denoted by C. The PCG Method with a symmetric positive definite preconditioner B lr n n corresponds to the CG Method applied to the transformed linear algebraic system B 1/2 AB 1/2 (B 1/2 y = B 1/2 b. The PCG Method is implemented as follows: PCG Initialization: Given an initial guess y 0 lr n, compute the residual r 0 = b Ay 0 and the preconditioned residual r 0 = Br 0 and set p 0 := r 0, σ 0 := (r 0, r 0 = r 0 2 B. PCG Iteration Loop: For 0 i i max compute: y i+1 = y i + 1 α i p i, r i+1 = r i 1 α i Ap i, r i+1 = Br i+1, α i = p i 2 A σ i γ 2 i = σ i α i (= y i+1 y i 2 A, p i+1 = r i+1 + σ i+1 σ i p i, σ i+1 = r i+1 2 B.

36 Num. Meth. Large-Scale Nonlinear Systems 36 PCG minimizes the energy error norm y y i A = min z K i (r 0,A y z A, (1.121 where K i (r 0, A denotes the Krylov subspace K i (r 0, A := span{r 0,..., A i 1 r 0 }. (1.122 PCG satisfies the Galerkin orthogonality (y i y 0, y i+m y i A = 0, m ln. (1.123 Denoting by y lr n the unique solution of Ay = b and by ε i := y y i 2 A the square of the iteration error in the energy norm, we have the following error representation: Lemma 3.2 Representation of the iteration error The PCG iteration error satisfies ε i = n 1 γj 2. (1.124 j=i Proof. For m = 1 the Galerkin orthogonality implies the orthogonal decompositions y i+1 y 0 2 A = y i+1 y i 2 A = γ 2 i + y i y 0 2 A, (1.125 y i y 0 2 A = i 1 y j+1 y j 2 A = j=0 i 1 γj 2. (1.126 j=0 On the other hand, observing y n = y, for m = n i the Galerkin orthogonality yields y y 0 2 A = n 1 γj 2 j=0 = y y i 2 A = ε 2 i + y i y 0 2 A = i 1 γj 2 j=0. (1.127

37 Num. Meth. Large-Scale Nonlinear Systems 37 Computable lower bound for the iteration error A lower bound for the iteration error in the energy norm is obviously given by [ε i ] = i+m γj 2. (1.128 j=0 In the inexact Newton PCG method we will control the inner PCG iterations by the relative energy error norms δ i = y y i A y i A [εi ] y i A (1.129 and use the termination criterion where δ is a user specified accuracy. δ i δ, ( Convergence of affine conjugate inexact Newton methods We denote by δx k lr n the result of the inner PCG iteration. As initial value for PCG we choose δx k 0 = 0. (1.131 Again, we will drop the subindices i for the inner PCG iterations and refer to η k as the final value of the inner iterations at each outer iteration step k. We recall the Galerkin orthogonality (cf. (1.123 (δx k, F (x k (δx k x k = (δx k, r k = 0. (1.132 Theorem 3.5 Affine conjugate convergence theorem for the inexact Newton PCG method. Part I: Linear convergence Suppose that f : D lr n lr is a twice continuously differentiable strictly convex functional on D with the first derivative F := f and the Hessian F = f which is symmetric and uniformly positive definite. Assume that x 0 D is some initial guess such that the level set L 0 := {x D f(x f(x 0 }

38 Num. Meth. Large-Scale Nonlinear Systems 38 is compact. Let further the following affine conjugate Lipschitz condition be satisfied F (z (F 1/2 (y F (x v (1.133 ω F (x 1/2 (y x F (x 1/2 v, x, y, z D, ω 0. For the inner Newton PCG iterations consider the exact error terms ε k := F (x k 1/2 x k 2 and the Kantorovich quantities h k := ω F (x k 1/2 x k as well as their inexact analogues ε δ k := F (x k 1/2 δx k 2 = ε k 1 + δ 2 k and h δ k := ω F (x k 1/2 δx k = h k 1 + δ 2 k, where δ k characterizes the inner PCG iteration error ( F (x k 1/2 δx k x k δ k :=. F (x k 1/2 δx k Assume that for some Θ < 1 and that h 0 < 2 Θ < 2 (1.134 δ k+1 δ k, k ln 0 (1.135 holds true throughout the outer Newton iterations. Control the inner iterations according to h δ ϑ(h δ k + δ k (h δk (h δk 2 k, δ k := δk 2 θ. (1.136

39 Num. Meth. Large-Scale Nonlinear Systems 39 Then, the inexact Newton PCG iterates x k, k ln 0 stay in L 0 and converge linearly to some x L 0 with f(x = min x D f(x. The following estimates hold true F (x k+1 1/2 x k+1 Θ F (x k 1/2 x k, k ln 0, (1.137 F (x k+1 1/2 δx k+1 Θ F (x k 1/2 δx k, k ln 0. (1.138 Moreover, the objective functional is reduced according to Proof hδ k ε δ k f(x k f(x k εδ k 1 10 hδ k ε δ k. (1.139 Observing for λ [0, 1] we obtain r k = F (x k + F (x k δx k, k ln 0, f(x k + λδx k f(x k = λ λ (δx k, F (x k + sδx k ds = (1.140 s=0 = λ λ s=0 (δx k, F (x k + sδx k F (x k ds + λ λ s=0 (δx k, F (x k ds = = λ λ s s (δx k, F (x k + stδx k δx k dt ds + λ λ (δx k, F (x k ds = s=0 t=0 s=0 = λ λ s s ( (δx k, F (x k + stδx k F (x k δx k dt ds + s=0 t=0 + λ λ s=0 s s t=0 (δx k, F (x k δx k dt ds + λ λ s=0 (δx k, F (x k r k F (x k δx k ds =

40 Num. Meth. Large-Scale Nonlinear Systems 40 = λ λ s=0 s s t=0 (F (x k 1/2 δx k, F (x k (F 1/2 (x k + stδx k F (x k δx k F (x k 1/2 δx k ω s t F (x k 1/2 δx k 2 = s t h δ k εδ k dt ds + λ λ s s (δx k, F (x k δx k dt ds λ λ (δx k, F (x k δx k ds + s=0 t=0 s=0 + λ λ s=0 (δx k, r k = 0 due to (1.123 It readily follows from (1.140 that ds 1 10 λ6 h δ k ε δ k λ4 ε δ k λ 2 ε δ k. f(x k + λδx k f(x k + λ 2 ( 1 10 hδ k ε δ k + ( 1 3 λ2 1 ε δ k. (1.141 Denoting by L k the level set by induction on k we prove L k := { x D f(x f(x k }, h k < 2 and hence, x k+1 L k. (1.142 For k = 0, we have h 0 < 2 by assumption ( Since h δ 0 h 0, (1.141 readily shows f(x 1 < f(x 0, whence x 1 L 0. Now, assuming (1.142 to hold true for some k ln, again taking advantage of h δ k h k < 2, (1.141 yields f(x k+1 < f(x k and thus x k+1 L k. Moreover, choosing λ = 1 in (1.141, we obtain the left-hand side of the functional descent property ( We note that we get the right-hand side of (1.139, if in (1.140 we estimate by the other direction of the Cauchy-Schwarz inequality. Finally, in order to prove the contraction properties (1.137,(1.138 and linear convergence, we estimate the local energy norms as follows: F (x k+1 1/2 x k+1 = F (x k+1 1/2 F (x k+1 x k+1 = = F (x k+1 = F (x k+1 (F 1/2 (x k+1 ± F (x k =

41 Num. Meth. Large-Scale Nonlinear Systems 41 = F (x k+1 (F 1/2 (x k+1 F (x k + F (x k+1 1/2 F (x k. Observing F (x k = F (x k δx k + r k, and using the affine conjugate Lipschitz condition we obtain F (x k+1 1/2 x k+1 = (1.143 ( 1 = F (x k+1 1/2 0 ( F (x k + tδx k F (x k δx k dt + r k 1 2 ω F (x k 1/2 δx k 2 + F (x k+1 1/2 r k. Setting z = δx k x k, for the second term on the right-hand side of the previous inequality we get the implicit estimate F (x k+1 1/2 r k 2 F (x k 1/2 z 2 + h δ k F (x k 1/2 z F (x k+1 1/2 r k, which gives the explicit bound F (x k+1 1/2 r k 1 2 ( h δ k (h δk 2 F (x k z. (1.144 Using (1.144 in (1.143 results in ω F (x k+1 1/2 x k ω2 F (x k 1/2 δx k 2 + = (h δ k ( h δ k (h δk 2 ω F (x k 1/2 z = δ k h δ k Taking (1.136 into account, we thus get the contraction factor estimate. Θ k := ω F (x k+1 1/2 x k+1 ω F (x k 1/2 x k = h k = 1+δ 2 k hδ k ϑ(h δ k, δ k Θ, (1.145

42 Num. Meth. Large-Scale Nonlinear Systems 42 which proves (1.137 and linear convergence. For the proof of (1.138 we observe F (x l 1/2 x l 2 = (1 + δ 2 l F (x l 1/2 δx l 2, l = k, k + 1, as well as δ k+1 δ k and obtain F (x k+1 1/2 δx k+1 F (x k 1/2 δx k 1 + δ 2 k 1 + δ 2 k+1 Θ k Θ k Θ. (1.146 By standard arguments we further show that the sequence {x k } ln0 of inexact Newton PCG iterates is a Cauchy sequence in L 0 and there exists an x L 0 such that x k x (k with F (x = 0. Theorem 3.6 Affine conjugate convergence theorem for the inexact Newton PCG method. Part II: Quadratic convergence Under the same assumptions on F : D lr n lr n as in Theorem 3.5 suppose that the initial guess x 0 D satisfies h δ 0 < ρ (1.147 for some appropriate ρ > 0 and control the inner iterations such that δ k ρ 2 h δ k h δ k +. ( (h δ k 2 Then, there holds: (i The Newton CGNE iterates x k, k ln 0 stay in L 0 and converge quadratically to some x L 0 with F (x = 0. (ii The exact Newton increments and the inexact Newton increments decrease quadratically according to F (x k+1 1/2 x k ρ 2 ω F (x k 1/2 x k 2, (1.149 F (x k+1 1/2 δx k ρ 2 ω F (x k 1/2 δx k 2. (1.150

43 Num. Meth. Large-Scale Nonlinear Systems 43 Proof. Using (1.148 in (1.145 yields F (x k+1 1/2 x k+1 F (x k 1/2 x k hδ k + δ k (h δ k (h δ k δ 2 k 1 2 (1 + ρ hδ k, which proves (1.149 in view of h δ k h k h 0 < 2Θ. The proof of (1.150 follows along the same line by using (1.148 in ( Algorithmic aspects of the affine conjugate inexact Newton PCG method (i Convergence monitor Let us assume that the quantity Θ < 1 in both the linear convergence mode and the quadratic convergence mode has been specified and let us further assume that we use the startiterate δx k 0 = 0 in the inner PCG iteration. Denoting by δ k an easily computable estimate of the relative energy norm iteration error δ k, we accept a new iterate x k+1, if the condition f(x k+1 f(x k 1 10 ε k = 1 10 (1 + δ2 kε δ k (1.151 or the monotonicity test Θ k := ( εk+1 2 k+1 ε δ k+1 1/2 ( (1 + δ = ε k (1 + δ 2 k ε δ k 1/2 Θ < 1 (1.152 is satisfied. We consider the outer iteration as divergent, if neither (1.151 nor (1.152 hold true. (ii Termination criterion With respect to a user specified accuracy ETOL, the inexact Newton PCG iteration will be terminated, if either or ε k = (1 + δ 2 k ε δ k ETOL 2. (1.153 f(x k f(x k ETOL2. (1.154 (iii Balancing outer and inner iterations For k = 0, we choose δ 0 = δ 0 = 1 4. As in case of the inexact Newton CGNE iteration, for k 1 we begin with the

44 Num. Meth. Large-Scale Nonlinear Systems 44 quadratic convergence mode and switch to the linear convergence mode as soon as the approximate contraction factor Θ k is below some prespecified threshold value Θ 1 2. (iii 1 Quadratic convergence mode A computationally realizable termination criterion for the inner PCG iteration in the quadratic convergence mode is given by δ k ρ [h δ k ] [h δ k ] +, ( [h δ k ]2 where [h δ k ] is an appropriate a priori estimate of the inexact Kantorovich quantity h δ k. In view of (1.145, we have the a posteriori estimates [h δ k] 2 := 10 ε δ k f(x k+1 f(x k εδ k (1.156 and [h k ] 2 := 1 + δ 2 k [h δ k] 2. (1.157 We note that (1.157 yields the a priori estimate [h k ] := Θ k 1 [h k 1 ] 2. (1.158 Using (1.158 in (1.157, for the inexact Kantorovich quantity we obtain the following a priori estimate [h δ k] := [h k ] 1 + δ 2 k. (1.159 Inserting (1.159 into (1.155, we obtain a simple nonlinear equation in δ k. Remark 3.5 Computational work in the quadratic convergence mode Since δ k 0 (k is enforced, it follows that: The more the iterates x k approach the solution x, the more computational work is required for the inner iterations to guarantee quadratic convergence of the outer iteration. (iii 2 Linear convergence mode We switch to the linear convergence mode, if Θ k < Θ (1.160

45 Num. Meth. Large-Scale Nonlinear Systems 45 is satisfied. The computationally realizable termination criterion for the inner iteration in the linear convergence mode is Since asymptotically there holds [ϑ(h δ k, δ k ] := ϑ([h δ k], δ k Θ. (1.161 δ k Θ 1 Θ 2 (k, we observe: Remark 3.6 Computational work in the linear convergence mode The more the iterates x k approach the solution x, the less computational work is required for the inner iterations to guarantee linear convergence of the outer iteration.

46 Num. Meth. Large-Scale Nonlinear Systems Quasi-Newton Methods 4.1 Introduction Given F : D lr n lr n as well as x k, x k+1 D, x k x k+1, the idea is to approximate F locally around x k+1 by an affine function such that S k+1 (x := F (x k+1 + J k+1 (x x k+1, J k+1 lr n n, (1.162 S k+1 (x k = F (x k. (1.163 The requirement (1.163 gives rise to the so-called secant condition J (x k+1 x k = F (x k+1 F (x k. (1.164 =: δx k =: y k The matrix J is not uniquely determined by (1.164, since where dim S k+1 = (n 1n, (1.165 S k+1 := {J lr n n Jδx k = y k }. (1.166 There are different criteria to select an appropriate J S k The Good Broyden rank 1 update Let us consider the change in the affine model as given by S k+1 (x S k (x = (J k+1 J k (x x k. (1.167 An appropriate idea is to choose J k+1 S k+1 such that there is a least change in the affine model in the sense J k+1 J k F = min J S k+1 J J k F, (1.168 where F stands for the Frobenius norm (observe J = (J ik n i,k=1 J F := ( n 1/2 Jik 2. (1.169 i,k=1

47 Num. Meth. Large-Scale Nonlinear Systems 47 The solution of (1.169 can be heuristically motivated as follows: Choose t k δx k such that Then, (1.167 reads x x k = αδx k + t k. S k+1 (x S k (x = α(j k+1 J k δx k = α(y k J k δx k Now, choose J k+1 S k+1 such that It follows that (J k+1 J k t k = 0. + (J k+1 J k t k. (1.170 rank (J k+1 J k = 1, J k+1 J k = v k (δx k T. (1.171 Inserting (1.171 into (1.170 yields which results in α v k (δx k T δx k = α (y k J k δx k, v k = yk J k δx k (δx k T δx k. Altogether, this gives us Broyden s rank 1 update (Good Broyden J k+1 = J k + [ ] F (x k+1 F (x k J k δx k (δx k T (δx k T δx k. (1.172 For the solution of nonlinear systems, we are more interested in updates of the inverse of J k. Such an update can be provided by the Sherman-Morrison- Woodbury formula Setting (A + uv T 1 = A 1 A 1 uv T A v T A 1 u. (1.173 A := J k, u := F (x k+1 F (x k J k δx k, v := (δxk T (δx k T δx k, we obtain J 1 k+1 = J 1 k + [ δx k J 1 ] k (F (xk+1 F (x k (δx k T J 1 k [ ]. (1.174 F (x k+1 F (x k (δx k T J 1 k

48 Num. Meth. Large-Scale Nonlinear Systems The Bad Broyden rank 1 update Instead of (1.168, an alternative to choose J k+1 S k+1 such that there is a least change in the solution of the affine model, i.e., J 1 k+1 J 1 k F = min J S k+1 J 1 J 1 k F. (1.175 Similar considerations as before lead us to the Broyden s alternative rank 1 update (Bad Broyden J 1 k+1 = J 1 k + [ ( ]( T δx k J 1 k F (x k+1 F (x k F (x k+1 F (x k ( T (.(1.176 F (x k+1 F (x k F (x k+1 F (x k 4.2 Affine covariant Quasi-Newton method Affine covariant Quasi-Newton convergence theory Affine covariant Quasi-Newton methods require the secant condition (1.164 to be stated by means of affine covariant terms in the domain of definition of the nonlinear mapping F. Observing that we compute the Quasi-Newton increment δx k as the solution of we can rewrite (1.164 according to J k δx k = F (x k, (1.177 (J k Jδx k = F (x k+1. Multiplication by J 1 k yields the affine covariant secant condition δx k+1 := (I J 1 k }{{ J δx k } =: E k (J = J 1 k F (xk+1. (1.178 we note that any rank 1 update of the form J k+1 = J k ( I δxk+1 v T v T δx k satisfies the affine covariant secant condition ( In particular, for v = δx k we recover the Good Broyden., v lr n \ {0} (1.179

49 Num. Meth. Large-Scale Nonlinear Systems 49 Theorem 4.1 Properties of the affine covariant Quasi-Newton method For Broyden s affine covariant rank 1 update (Good Broyden J k+1 = J k ( I δxk+1 (δx k T δx k 2 (1.180 assume that the local contraction condition Θ k = δxk+1 δx k < 1 2 (1.181 is satisfied. Then, there holds: (i The update matrix J k+1 is a least change update in the sense that E k (J k+1 E k (J, J S k+1, (1.182 E k (J k+1 Θ k. (1.183 (ii If J k is regular, then J k+1 is regular as well with the inverse given by J 1 k+1 = (I + δx k+1 (δx k T J 1 (1 α k+1 δx k 2 k, (1.184 where α k+1 = (δxk T δx k+1 δx k 2 < 1 2. (iii The Quasi-Newton increment δx k+1 is given by δx k+1 = J 1 k+1 F (xk+1 = δxk+1 1 α k+1. (1.185 (iv The Quasi-Newton increments decrease according to δx k+1 δx k Θ k 1 α k+1 < 1. (1.186 Proof. In view of (1.178 we have E k (J k+1 = δxk+1 (δx k T δx k 2 = E k (J δxk (δx k T δx k 2 E k (J,

50 Num. Meth. Large-Scale Nonlinear Systems 50 which proves ( Moreover, (1.183 follows readily from E k (J k+1 = δxk+1 (δx k T δx k 2 δxk+1 δx k = Θ k. The same argument shows and hence, (1.186 follows from α k+1 Θ k < 1 2, δx k+1 δx k = Θ k 1 α k+1 Θ k 1 Θ k < 1. Finally, the proofs of (ii and (iii are direct consequences of the Sherman- Morrison-Woodbury formula ( Theorem 4.2 Convergence of the affine covariant Quasi-Newton method Suppose that that F : D lr n lr n, D lr n convex, is continuously differentiable on D. Let x D be the unique solution of F (x = 0 in D with invertible Jacobian F (x. Assume that the following affine covariant Lipschitz condition is satisfied F (x (F 1 (x F (x v ω x x v, (1.187 where x, x + v D, v lr n. For some 0 < Θ < 1 assume further that: (a (b The initial approximate Jacobian J 0 satisfies δ 0 := F (x 1 (J 0 F (x 0 < The initial guess x 0 D satisfies t 0 Then, there holds: (i := ω x 0 x + 1 Θ 2 Θ Θ 1 + Θ. (1.188 ( Θ 1 + Θ δ 0. (1.189 The Quasi-Newton iterates x k, k ln 0 converge to x according to x k+1 x < Θ x k x, (1.190

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider