Numerical Methods for Large-Scale Nonlinear Systems

Size: px
Start display at page:

Download "Numerical Methods for Large-Scale Nonlinear Systems"

Transcription

1 Numerical Methods for Large-Scale Nonlinear Systems Handouts by Ronald H.W. Hoppe following the monograph P. Deuflhard Newton Methods for Nonlinear Problems Springer, Berlin-Heidelberg-New York, 2004

2 Num. Meth. Large-Scale Nonlinear Systems 2 1. Classical Newton Convergence Theorems 1.1 Classical Newton-Kantorovich Theorem Theorem 1.1 Classical Newton-Kantorovich Theorem Let X and Y be Banach spaces, D X a convex subset, and suppose that that F : D X Y is continuously Fréchet differentiable on D with an invertible Fréchet derivative F (x 0 for some initial guess x 0 D. Assume further that the following conditions hold true: F (x 0 1 F (x 0 α, (1.1 F (y F (x γ y x, x, y D, (1.2 h 0 := α γ F (x 0 1 < 1 2, (1.3 B(x 0, ρ 0 D, ρ 0 := 1 1 2h 0 γ F (x 0 1. (1.4 Then, for the sequence {x k } ln0 of Newton iterates F (x k x k = F (x k, x k+1 = x k + x k there holds (i F (x is invertible for all Newton iterates x = x k, k ln 0, (ii The sequence {x k } ln of Newton iterates is well defined with x k B(x 0, ρ 0, k ln 0, and x k x B(x 0, ρ 0, k ln 0 (k, where F (x = 0, (iii (iv The convergence x k x (k is quadratic, The solution x of F (x = 0 is unique in B(x 0, ρ 0 (D B(x 0, ρ 0, ρ 0 := h 0 γ F (x 0 1. Proof. We have F (x k F (x 0 γ x k x 0 t k for some upper bound t k, k ln. If we can prove x k B(x 0, ρ 0 and t k := F (x 0 1 t k < 1, k ln, then by the

3 Num. Meth. Large-Scale Nonlinear Systems 3 Banach perturbation lemma F (x k is invertible with F (x k 1 F (x F (x 0 1 F (x k F (x 0 (1.5 F (x γ F (x 0 1 x k x 0 F (x 0 1 =: β k. 1 t k We prove x k B(x 0, ρ 0 and t k < 1, k ln, by induction on k: For k = 1 we have x 1 x 0 = F (x 0 1 F (x 0 α = since h 0 < 1 1 2h 0, and h 0 γ F (x 0 1 < ρ 0, t 1 := F (x 0 1 t 1 = γ F (x 0 1 x 1 x 0 = = γ F (x 0 1 F (x 0 1 F (x 0 α γ F (x 0 1 < = h 0 Assuming the assertion to be true for some k ln, for k + 1, using (1.2 we obtain Setting 1 2. x k+1 x k = F (x k 1 F (x k = (1.6 = F (x k 1 (F (x k F (x k 1 F (x k 1 x k 1 = = F (x k 1 F (x k β k γ x k x k 1 2. we thus get the recursion In view of the relationship 0 ( F (x k 1 + s x k 1 F (x k 1 x k 1 ds F (x k 1 + s x k 1 F (x k 1 s γ x k 1 h k := γ x k+1 x k, x k 1 ds h k = 1 2 β k h 2 k 1, k ln. (1.7 γ x k+1 x 0 t k+1 γ x k+1 x k + γ x k x 0 = h k t k,

4 Num. Meth. Large-Scale Nonlinear Systems 4 we consider the recursion Observing (1.5 and (1.7, we find t k+1 = t k + h k. (1.8 t k+1 t k = 1 2 F (x t k (t k t k 1 2. Hence, multiplying both sides with F (x 0 1, we end up with the following three-term recursion for t k : t k+1 t k = t k ( t k t k 1 2, t 0 = 0, t 1 = h 0. (1.9 The famous Ortega-trick allows to reduce (1.9 to a two-term recursion which can be interpreted as a Newton method in lr 1 : Multiplying both sides in (1.9 by 1 t k results in from which we deduce ( t k+1 t k (1 t k = 1 2 ( t k t k 1 2, t k+1 t k+1 t k t 2 k = ψ( t k+1, t k = t k t k t k t 2 k 1 = ψ( t k, t k 1. It follows that from which we deduce ψ( t k+1, t k = ψ( t 1, t 0 = h 0, t k+1 t k = h 0 t k t 2 k 1 t k = ϕ( t k ϕ ( t k, where ϕ : lr lr is given by ϕ( t := h 0 t t 2. Obviously, ϕ has the zeroes t 1 := 1 1 2h 0, t 2 := h 0.

5 Num. Meth. Large-Scale Nonlinear Systems 5 Since ϕ is convex, the Newton method converges for t 1 to t 1. It follows from the definition of t k that x k B(x 0, ρ 0. Moreover, as a consequence of (1.6 we readily find that {x k } ln is a Cauchy sequence in B(x 0, ρ 0. Hence, there exists x B(x 0, ρ 0 such that x k x (k and x k = F (x k 1 F (x k F (x 1 F (x = 0, whence F (x = 0. The quadratic convergence can be deduced from (1.6 as well. Finally, the uniqueness of x in B(x 0, ρ 0 (D B(x 0, ρ 0 follows readily from the properties of the function ϕ. 1.2 Classical Newton-Mysovskikh Theorem Theorem 1.2 Classical Newton-Mysovskikh Theorem Let X and Y be Banach spaces, D X a convex subset, and suppose that that F : D X Y is continuously Fréchet differentiable on D with invertible Fréchet derivatives F (x, x D, and let x 0 D be some initial guess. Assume further that the following conditions hold true: Then, for the sequence {x k } ln0 there holds F (x 0 1 F (x 0 α, (1.10 F (x 1 β, x D, (1.11 F (y F (x ω y x, x, y D, (1.12 h 0 := 1 2 β ω F (x 0 1 F (x α β ω < 1, (1.13 B(x 0, ρ D, ρ := α h 2j 1 α 0. ( h 0 j=0 of Newton iterates F (x k x k = F (x k, x k+1 = x k + x k (i x k B(x 0, ρ, k ln 0, and there exists x B(x 0, ρ such that F (x = 0 and x k x (k, (ii x k+1 x k 1 2 β ω xk x k 1, k ln, (iii x k x ε k x k x k 1 2, where ε k := 1 2 β ω (1 + j=1 (h 2k 0 2j 1 2 β ω 1 h 2k 0.

6 Num. Meth. Large-Scale Nonlinear Systems 6 Proof. Observing F (x k 1 x k 1 + F (x k 1 = 0, we obtain x k = F (x k 1 F (x k = = F (x k 1 ( F (x k F (x k 1 F (x k 1 x k 1 1 F (x k β ω xk 1 2, 0 ( F (x k 1 + s x k 1 F (x k 1 x k 1 ds which gives the assertion (ii. We now prove that {x k } ln0 is a Cauchy sequence in B(x 0, ρ. By induction on k we show For k = 0, we have in view of (1.10 x k+1 x k 2 βω h2k 0, k ln 0. (1.15 x 0 α = 2 βω h 0. Assuming (1.15 to be true for some k ln, we get x k+2 x k+1 = x k β ω xk 2 1 ( β ω 2 βω h2k 0 = h2k+1 0. βω It follows readily from (1.15 that x k+1 B(x 0, ρ: x k+1 x 0 x k+1 x k x 1 x 0 2 ( h 2k h 0 = 2 βω βω h 0 h 2j 1 0 = ρ. j=0 = α Similarly, it can be shown that x m+k x m 0 (,.

7 Num. Meth. Large-Scale Nonlinear Systems 7 Since {x k } ln0 is a Cauchy sequence in B(x 0, ρ, there exists x B(x 0, ρ such that x k x (k. Hence, x k = F (x k 1 F (x k F (x F (x = 0, and thus F (x = 0 which proves (i. The assertion (iii is shown as follows: Setting we obtain h k := 1 2 β ω xk, x k x = lim k<m xk x m [ ] lim x m x m x k+1 x k k<m 2 [ ] lim h m h k = βω k<m = 2h k βω lim k<m [ 1 + h k+1 h k h m 1 h k ]. On the other hand, taking (ii into account whence h k ( βω 2 2 x k 1 2 = h 2 k 1, h k+l h 2l k, k ln 0. We conclude x k x 1 2 β ω xk 1 2 [ 1 2 β ω (1 + j=1 h 2j k x k 1 2, ] 1 + h 2 k +... which proves (iii.

8 Num. Meth. Large-Scale Nonlinear Systems 8 2. Affine Invariant/Conjugate Newton Convergence Theorems 2.1 Affine Covariant Newton Convergence Theorems Theorem 2.1 Affine Covariant Newton-Kantorovich Theorem Let F : D lr n lr n be continuously differentiable on D with an invertible Jacobian F (x 0 for some initial guess x 0 D. Assume further that the following conditions hold true: F (x 0 1 F (x 0 α, (1.16 F (x 0 1 (F (y F (x γ y x, x, y D, (1.17 h 0 := α γ < 1 2, (1.18 B(x 0, ρ 0 D, ρ 0 := 1 1 2h 0 γ. (1.19 Then, for the sequence {x k } ln0 of Newton iterates F (x k x k = F (x k, x k+1 = x k + x k there holds (i F (x is invertible for all Newton iterates x = x k, k ln 0, (ii The sequence {x k } ln of Newton iterates is well defined with x k B(x 0, ρ 0, k ln 0, and x k x B(x 0, ρ 0, k ln 0 (k, where F (x = 0, (iii (iv The convergence x k x (k is quadratic, The solution x of F (x = 0 is unique in B(x 0, ρ 0 (D B(x 0, ρ 0, ρ 0 := h 0 γ. Proof. First homework assignment.

9 Num. Meth. Large-Scale Nonlinear Systems 9 Theorem 2.2 Affine Covariant Newton-Mysovskikh Theorem Let F : D lr n lr n, d lr n convex, be continuously differentiable on D with invertible Jacobians F (x, x D, and let x 0 D be some initial guess. Assume further that the following conditions hold true: F (x 0 1 F (x 0 α, (1.20 F (z (F 1 (y F (x (y x ω y x 2, x, y, z D (1.21, h 0 := ω x 0 α ω < 2, (1.22 B(x 0, ρ D, ρ := x0. ( h 0 2 Then, for the sequence {x k } ln0 of Newton iterates F (x k x k = F (x k, x k+1 = x k + x k there holds x k B(x 0, ρ, k ln 0, and there exists x B(x 0, ρ such that F (x = 0 and x k x (k with x k+1 x k 1 2 ω xk x k 1 2, x k x x k x k ω xk x k 1. Proof. slight modification of the Classical Newton-Mysovskikh Theorem. 2.2 Affine Contravariant Newton Convergence Theorem Theorem 2.3 Affine Contravariant Newton-Mysovskikh Theorem Let F : D lr n lr n, d lr n convex, be continuously differentiable on D with invertible Jacobians F (x, x D, and let x 0 D be some initial guess. Assume further that the following conditions hold true: ( F (y F (x (y x ω F (x(y x 2, x, y D,(1.24 L ω D, L ω := {x D F (x < 2 ω, (1.25 h 0 := ω F (x 0 < 2. (1.26

10 Num. Meth. Large-Scale Nonlinear Systems 10 Then, the sequence {x k } ln0 of Newton iterates stays in L ω, and there exists an x L ω such that x k x for some subsequence N ln and F (x = 0. Moreover, for the residuals F (x k there holds F (x k 1 2 ω F (xk 2. Proof. We first prove x k L ω by induction on k: (i k = 0: in view of (1.26 F (x 0 < 2 ω = x 0 L ω. (ii Assume that the assertion holds true for some k ln. (iii For any λ [0, 1] such that x k + t x k L ω, t [0, λ], we have F (x k + λ x k = F (x k + λ F (x k + t x k x k dt. 0 Since F (x k = F (x k x k, F (x k = λ F (x k x k + (1 λ F (x k, and hence, F (x k + λ x k = (1.27 λ = 0 [( ] F (x k + t x k F (x k x k + (1 λ F (x k dt λ 0 dt + (1 λ F (x ( F (x k + t x k F (x k x k k ωt F (x k x k 2 } {{ } λ ω F (x k x k 2 t dt + (1 λ F (x k = 0 = F (x k

11 Num. Meth. Large-Scale Nonlinear Systems 11 = (1 λ ωλ2 F (x k F (x k. We assume x k+1 = x k + x k / L ω. Then there exists λ := min{λ (0, 1] x k + λ x k L ω }. It follows from (1.27 F (x k + λ x k (1 λ ωλ2 F (x k F (x k < < 2 ω < (1 λ + λ 2 < 1 F (x k < 2 ω < 2 ω, and hence, x k + λ x k L ω which is a contradiction. For λ = 1, (1.27 gives the asserted residual estimate. For the proof of the rest of the assertion, we define the residual oriented Kantorovich quantities then, (1.26 implies i.e., Since h 0 < 2, for k = 0 we obtain h k := ω F (x k. ω F (x k ω2 F (x k, h k h2 k = 1 2 h k h k. h h 0 < 2 h 0 < h 0,

12 Num. Meth. Large-Scale Nonlinear Systems 12 and an induction argument shows Moreover, h k+1 < h k < 2, k ln 0. F (x k+1 < F (x k < 2 ω and lim k F (xk = 0, which implies x k L ω D, k ln. Since L ω is bounded, there exist x L ω and a subsequence ln ln such that x k x (k ln and F (x = 0. Affine conjugacy Assume that D lr n is a convex set and that f : D lr is a strictly convex functional. Consider the minimization problem min f(x. x D Then, a necessary and sufficient optimality condition is given by the nonlinear equation F (x = grad f(x = f (x T = 0, x D. We note that the Jacobian F (x = f (x is symmetric and uniformly positive definite on D. In particular, F (x 1/2 is well defined and symmetric, positive definite as well. Consequently, the energy product (u, v E := u T F (xv, u, v, x D defines locally an inner product with associated norm u 2 E = u T F (xu = F (x 1/2 u 2 which is referred to as a local energy norm. For regular B lr n n, we consider the transformed minimization problem min y g(y, g(y := f(by, x = By.

13 Num. Meth. Large-Scale Nonlinear Systems 13 We obtain the optimality condition G(y = grad g(y = (f (ByB T = B T f (x T = B T F (By = 0 with the transformed Jacobian G (y = B T F (xb. Hence, the Jacobian transformation is conjugate which motivates the notion of affine conjugacy.

14 Num. Meth. Large-Scale Nonlinear Systems 14 An appropriate affine conjugate Lipschitz condition is as follows F (z (F 1/2 (y F (x (y x ω F (z 1/2 (y x Affine Conjugate Newton Convergence Theorem Theorem 2.4 Affine Conjugate Newton-Mysovskikh Theorem Assume that D lr n is a convex domain and f : D lr a strictly convex, twice continuously differentiable functional. Let F (x = f (x T and F (x = f (x. Consider the minimization problem and the associated optimality condition min f(x (1.28 x D F (x = grad f(x = 0, x D. (1.29 Note that (1.28 has a unique solution x D. Let x 0 D be an initial guess and assume that the following conditions are satisfied: F (z (F 1/2 (y F (x (y x ω F (z 1/2 (y x 2 (1.30 for collinear x, y, z D and h 0 := ω F (x 0 1/2 x 0 < 2, (1.31 L 0 := {x D f(x < f(x 0 } is compact. (1.32 Then, for the Newton iterates x k, k ln 0, there holds: (i x k L 0, k ln 0, and x k x (k with F (x k+1 1/2 x k ω F (x k 1/2 x k 2. (1.33 (ii For ε k := F (x k 1/2 x k 2 and the Kantorovich quantities h k := ω ε 1/2 k have 1 6 h k ε k f(x k f(x k ε k 1 6 h k ε k, ( ε k f(x k f(x k ε k. (1.35 (iii We have the a priori estimate f(x 0 f(x we 5 6 ε 0 1 h 0 2. (1.36

15 Num. Meth. Large-Scale Nonlinear Systems 15 Proof: Assertion (i and (1.33 can be verified as in the proof of the affine contravariant version of the Newton-Mysovskikh theorem. For the proof of (1.34 in (ii, observing F (x k x k = F (x k, we obtain f(x k+1 f(x k F (x k 1/2 x k 2 = = 1 s=0 < F (x k + s x k, x k > ds < F (x k, x k > < F (x k x k, x k > < F (x k x k, x k > = = 1 < F (x k + s x k F (x k, x k > ds 1 2 < F (x k x k, x k > = = s=0 1 1 < F (x k + st x k x k, x k > dt ds 1 2 < F (x k x k, x k >= = = s=0 t=0 1 1 s=0 1 s=0 1 s=0 s s t=0 1 s t=0 1 t=0 < (F (x k + st x k F (x k x k =: w k, x k > dt ds = < F (x k 1/2 w k, F (x k 1/2 x k > dt ds F (x k 1/2 w k ωst F (x k 1/2 x k 2 F (x k 1/2 x k 2 ω F (x k 1/2 x k = ε k which proves (1.34. = h k F (x k 1/2 x k dt ds 1 s=0 1 s 2 t=0 Using the right-hand side of (1.34 and h k < 2 yields f(x k f(x k+1 ( h k ε k < 5 6 ε k. t dt 1 6 h k ε k,

16 Num. Meth. Large-Scale Nonlinear Systems 16 Likewise, using the left-hand side of (1.34 and h k < 2 f(x k f(x k+1 ( h k ε k > 1 6 ε k. Together, this proves (1.35. In order to prove (iii, we use (1.34 and obtain 0 ω 2 (f(x 0 f(x ω 2 (f(x k f(x k+1 < 5 6 ω2 ε k = = 5 6 h2 k = (1 2 h k 2. k=0 Using we further get 1 2 h k+1 ( 1 2 h k h k < 1, ( 1 2 h ( 1 2 h ( 1 2 h ( 1 2 h ( 1 2 h h2 0 ( 1 2 h 0 k = k=0 1 4 h2 0 1 h 0 2, which proves (1.36.

17 Num. Meth. Large-Scale Nonlinear Systems Inexact Newton Methods We recall that Newton s method computes iterates successively as the solution of linear algebraic systems F (x k x k = F (x k, k ln 0, (1.37 x k+1 = x k + x k. The classical convergence theorems of Newton-Kantorovich and Newton-Mysovskikh and its affine covariant, affine contravariant, and affine conjugate versions assume the exact solution of (1.37. In practice however, in particular if the dimension is large, (1.37 will be solved by an iterative method. In this case, we end up with an outer/inner iteration, where the outer iterations are the Newton steps and the inner iterations result from the application of an iterative scheme to (1.37. It is important to tune the outer and inner iterations and to keep track of the iteration errors. With regard to affine covariance, affine contravariance, and affine conjugacy the iterative scheme for the inner iterations has to be chosen in such a way, that it easily provides information about the error norm in case of affine covariance, residual norm in case of affine contravariance, and energy norm in case of affine conjugacy. Except for convex optimization, we cannot expect F (x, x D, to be symmetric positive definite. Hence, for affine covariance and affine contravariance we have to pick iterative solvers that are designed for nonsymmetric matrices. Appropriate candidates are CGNE (Conjugate Gradient for the Normal Equations in case of affine covariance, GMRES (Generalized Minimum RESidual in case of affine contravariance, and PCG (Preconditioned Conjugate Gradient in case of affine conjugacy.

18 Num. Meth. Large-Scale Nonlinear Systems Affine Covariant Inexact Newton Methods CGNE (Conjugate Gradient for the Normal Equations We assume A lr n n to be a regular, nonsymmetric matrix and b lr n to be given and look for y lr n as the unique solution of the linear algebraic system Ay = b. (1.38 As the name already suggests, CGNE is the conjugate gradient method applied to the normal equations: It solves the system for z and then computes y according to The implementation of CGNE is as follows: CGNE Initialization: AA T z = b, (1.39 y = A T z. (1.40 Given an initial guess y 0 lr n, compute the residual r 0 = b Ay 0 and set p 0 = r 0, p 0 = 0, β 0 = 0, σ 0 = r 0 2. CGNE Iteration Loop: For 1 i i max compute p i = A T r i 1 + β i 1 p i 1, α i = σ i 1 p i 2, y i = y i 1 α i p i, γ 2 i 1 = α i σ i 1, r i = r i 1 α i Ap i, σ i = r i 2, β i = σ i σ i 1. CGNE has the error minimizing property y y i = min y v, (1.41 v K i (A T r 0,A T A where K i (A T r 0, A T A stands for the Krylov subspace K i (A T r 0, A T A := span{a T r 0, (A T AA T r 0,..., (A T A i 1 A T r 0 }. (1.42

19 Num. Meth. Large-Scale Nonlinear Systems 19 Lemma 3.1 Representation of the iteration error Let ε i := y y i 2 be the square of the CGNE iteration error with respect to the i-th iterate. Then, there holds ε i = n 1 γj 2. (1.43 j=i Proof. CGNE has the Galerkin orthogonality (y i y 0, y i+m y i = 0, m ln. (1.44 Setting m = 1, this implies the orthogonal decomposition y i+1 y 0 2 = y i+1 y i 2 + y i y 0 2, (1.45 which readily gives y i y 0 2 = i 1 y j+1 y j 2 = j=0 i 1 γj 2. (1.46 j=0 On the other hand, observing y n = y, for m = n i the Galerkin orthogonality yields y y 0 2 = n 1 γj 2 j=0 = y y i 2 = ε 2 i + y i y 0 2 = i 1 γj 2 j=0. (1.47 Computable lower bound for the iteration error It follows readily from Lemma 3.1 that the computable quantity [ε i ] := i+m γj 2, m ln, (1.48 j=i provides a lower bound for the iteration error. In practice, we will test the relative error norm according to δ i := y y i y i [εi ] y i δ, (1.49 where δ is a user specified accuracy.

20 Num. Meth. Large-Scale Nonlinear Systems Convergence of affine covariant inexact Newton methods We denote by δx k lr n the result of an inner iteration, e.g., CGNE, for the solution of (1.37. Then, it is easy to see that the iteration error δx k x k satisfies the error equation F (x k (δx k x k = F (x k + F (x k δx k =: r k. (1.50 We will measure the impact of the inexact solution of (1.37 by the relative error δ k := δxk x k δx k. (1.51 Theorem 3.1 Affine covariant convergence theorem for the inexact Newton method. Part I: Linear convergence Suppose that that F : D lr n lr n is continuously differentiable on D with invertible Fréchet derivatives F (x, x lr n. Assume further that the following affine covariant Lipschitz condition is satisfied F (z (F 1 (y F (x v ω y x v, (1.52 where x, y, z D, v lr n. Assume that x 0 D is an initial guess for the outer Newton iteration and that δx 0 = 0 is chosen as the startiterate for the inner iteration. Consider the Kantorovich quantities h k := ω x k, h δ k := ω δx k = h k 1 + δ 2 k (1.53 associated with the outer and inner iteration. Assume that h 0 < 2 Θ, 0 Θ < 1, (1.54 and control the inner iterations according to ϑ(h k, δ k := 1 2 hδ k + δ k(1 + h δ k Θ < 1, ( δ 2 k which implies linear convergence. Note that a necessary condition for ϑ(h k, δ k Θ is that it holds true for δ k = 0, which is satisfied due to assumption (1.37.

21 Num. Meth. Large-Scale Nonlinear Systems 21 Then, there holds: (i The Newton CGNE iterates x k, k ln 0 stay in B(x 0, ρ, ρ := δx0 1 Θ (1.56 and converge linearly to some x B(x 0, ρ with F (x = 0. (ii The exact Newton increments decrease monotonically according to x k+1 x k Θ, (1.57 whereas for the inexact Newton increments we have δx k+1 δx k 1 + δ 2 k 1 + δ 2 k+1 Θ Θ. (1.58 Proof. By elementary calculations we find x k+1 = F (x k+1 1 F (x k+1 = (1.59 ] = F (x k+1 [F 1 (x k+1 F (x k + F (x k+1 1 F (x k = r k F (x k δx k = F (x k+1 1 [ F (x k+1 F (x k F (x k δx k ] + + F (x k+1 1 r k = F (x k (δx k x k 1 ] F (x k+1 [F 1 (x k + tδx k F (x k δx k dt + 0 } {{ } =: I + F (x k+1 1 F (x k (δx k x k =: II.

22 Num. Meth. Large-Scale Nonlinear Systems 22 Using the affine covariant Lipschitz condition (1.52, the first term on the right-hand side in (1.59 can be estimated according to I ω δx k 2 1 t dt = 1 2 ω δxk 2. ( For the second term we obtain by the same argument ] II = F (x k+1 [F 1 (x k (δx k x k ± F (x k+1 (δx k x k (1.61 F (x k+1 1 (F (x k+1 F (x k (δx k x k + + F (x k+1 1 F (x k+1 (δx k x k 1 2 ω δxk δx k x k + δx k x k 2. Combining (1.60 and (1.61 yields δx k+1 δx k 1 2 ω δxk = h δ k h δ k + δ k (1 + h δ k. Observing (1.53, we finally get x k+1 x k ϑ(h k, δ k = ω δxk δxk x k δx k = δ k h δ k + δxk x k δx k = δ k 1 2 hδ k + δ k(1 + h δ k Θ < 1, ( δ 2 k which implies linear convergence. Note that a necessary condition for ϑ(h k, δ k Θ is that it holds true for δ k = 0, which is satisfied due to assumption (1.54. For the contraction of the inexact Newton increments we get δx k+1 δx k = 1 + δ 2 k 1 + δ 2 k+1 x k+1 x k 1 + δ 2 k 1 + δ 2 k+1 Θ Θ. (1.63 It can be easily shown that {x k } ln0 is a Cauchy sequence in B(x 0, ρ. Consequently, there exists x B(x 0, ρ such that x k x (k. Since we conclude F (x = 0. F (x k δx k 0 = F (x k + r k, F (x

23 Num. Meth. Large-Scale Nonlinear Systems 23 Theorem 3.2 Affine covariant convergence theorem for the inexact Newton method. Part II: Quadratic convergence Under the same assumptions on F : D lr n lr n as in Theorem 3.1 suppose that the initial guess x 0 D satisfies h 0 < ρ for some appropriate ρ > 0 and control the inner iterations such that Then, there holds: (i δ k ρ 2 h δ k 1 + h δ k The Newton CGNE iterates x k, k ln 0 stay in B(x 0, ρ, ρ := (1.64. (1.65 δx ρ 2 h 0 (1.66 and converge quadratically to some x B(x 0, ρ with F (x = 0. (ii The exact Newton increments and the inexact Newton increments decrease quadratically according to x k ρ 2 δx k ρ 2 ω x k 2, (1.67 ω δx k 2. (1.68 Proof. We proceed as in the proof of Theorem 3.1 to obtain x k+1 x k ϑ(h k, δ k = 1 2 hδ k + δ k(1 + h δ k. 1 + δ 2 k and δx k+1 δx k = 1 + δ 2 k 1 + δ 2 k+1 x k+1 x k. In view of (1.65 we get the further estimates x k+1 x k 1 + ρ 2 h k 1 + δ 2 k 1 + ρ 2 h k.

24 Num. Meth. Large-Scale Nonlinear Systems 24 and δx k+1 δx k 1 + ρ 2 h δ k 1 + δ 2 k ρ 2 h δ k, from which (1.67 and (1.68 follow by the definition of the Kantorovich quantities. In order to deduce quadratic convergence we have to make sure that the initial increments (k = 0 are small enough, i.e., 1 + ρ 2 h δ ρ 2 h 0 < 1. (1.69 Furthermore, (1.68 and (1.69 allow us to show that the iterates x k, k ln stay in B(x 0, ρ. Indeed, (1.68 implies and hence, δx j 1 + ρ 2 x k x h j 1 δx j ρ 2 k δx j δx 0 j=0 k j=0 h 0 δx j 1, j ln, ( 1 + ρ 2 h 0 j δx ρ 2 h Algorithmic aspects of affine covariant inexact Newton methods (i Convergence monitor Let us assume that the quantity Θ < 1 in both the linear convergence mode and the quadratic convergence mode has been specified and let us further assume that we use CGNE with δx k 0 = 0 in the inner iteration. Then, (1.58 suggests the monotonicity test Θ k := 1 + δ2 k δ 2 k δx k+1 δx k Θ, (1.70 where δ 2 k and δ 2 k+1 are computationally available estimates of δ 2 k and δ2 k+1. (ii Termination criterion We recall that the termination criterion for the exact Newton iteration with respect to a user specified accuracy XT OL is given by x k 1 Θ 2 k 1 XTOL.

25 Num. Meth. Large-Scale Nonlinear Systems 25 According to (1.53 we have x k = 1 + δ 2 k δxk. Consequently, replacing Θ k 1 and δ k by the computable quantities Θ k 1 and δ k, we arrive at the termination criterion 1 + δ 2 k 1 Θ 2 k 1 XTOL. (1.71 (iii Balancing outer and inner iterations According to (1.55 of Theorem 3.1, in the linear convergence mode the adaptive termination criterion for the inner iteration is ϑ(h k, δ k := 1 2 hδ k + δ k(1 + h δ k Θ < δ 2 k On the other hand, in view of (1.65 of Theorem 3.2, in the quadratic convergence mode the termination criterion is δ k ρ 2 h δ k 1 + h δ k. Since the theoretical Kantorovich quantities (cf. (1.53 h δ k = ω δx k = h k 1 + δ 2 k are not directly accessible, we have to replace them by computationally available estimates [h δ k ]. We recall that for h k we have the a priori estimate [h k ] = 2 Θ 2 k 1 h k. Consequently, replacing δ k by δ k, h k by [h k ], and Θ k 1 by Θ k 1 (cf. (1.70, we get the a priori estimates [h δ k] = [h k ] 1 + δ 2 k, [h k ] = 2 Θ 2 k 1, k ln. (1.72 For k = 0, we choose δ 0 = δ 0 = 1 4. In practice, for k 1 we begin with the quadratic convergence mode and switch

26 Num. Meth. Large-Scale Nonlinear Systems 26 to the linear convergence mode as soon as the approximate contraction factor Θ k is below some prespecified threshold value Θ 1 2. (iii 1 Quadratic convergence mode The computationally realizable termination criterion for the inner iteration in the quadratic convergence mode is δ k ρ 2 [h δ k ] 1 + [h δ k ]. (1.73 Inserting (1.72 into (1.73, we obtain a simple nonlinear equation in δ k. Remark 3.1 Validity of the approximate termination criterion Observing that the right-hand side in (1.73 is a monotonically increasing function of [h δ k ], and taking [hδ k ] hδ k into account, it follows that for δ k δ k the approximate termination criterion (1.73 implies the exact termination criterion (1.65. Remark 3.2 Computational work in the quadratic convergence mode Since δ k 0 (k is enforced, it follows that: The more the iterates x k approach the solution x, the more computational work is required for the inner iterations to guarantee quadratic convergence of the outer iteration. (iii 2 Linear convergence mode We switch to the linear convergence mode, once the criterion Θ k < Θ (1.74 is met. The computationally realizable termination criterion for the inner iteration in the linear convergence mode is [ϑ(h k, δ k ] := ϑ([h k ], δ k = 1 2 [hδ k ] + δ k(1 + [h δ k ] 1 + δ 2 k Θ. (1.75 Remark 3.3 Validity of the approximate termination criterion Since the right-hand side in (1.75 is a monotonically increasing function in [h δ k ] and [h δ k ] hδ k, the estimate provided by (1.75 may be too small and thus result in an overestimation of δ k. However, since the exact quantities and their a priori estimates both tend to zero as k approaches infinity, asymptotically we may rely on (1.75.

27 Num. Meth. Large-Scale Nonlinear Systems 27 In practice, we require the monotonicity test (1.70 in CGNE and run the inner iterations until δ k satisfies (1.75 or divergence occurs, i.e., Remark 3.4 Θ k > 2 Θ. Computational work in the linear convergence mode As opposed to the quadratic convergence mode, we observe The more the iterates x k approach the solution x, the less computational work is required for the inner iterations to guarantee linear convergence of the outer iteration.

28 Num. Meth. Large-Scale Nonlinear Systems Affine Contravariant Inexact Newton Methods GMRES (Generalized Minimum RESidual The Generalized Minimum RESidual Method (GMRES is an iterative solver for nonsymmetric linear algebraic systems which generates an orthogonal basis of the Krylov subspace K i (r 0, A := span{r 0, Ar 0,..., A i 1 r 0 }. (1.76 by a modified Gram-Schmidt orthogonalization called the Arnoldi method. The inner product coefficients are stored in an upper Hessenberg matrix so that an approximate solution can be obtained by the solution of a leastsquares problem in terms of that Hessenberg matrix: GMRES Initialization: Given an initial guess y 0 lr n, compute the residual r 0 = b Ay 0 and set β := r 0, v 1 := r 0 β, V 1 := v 1. (1.77 GMRES Iteration Loop: For 1 i i max : I. Orthogonalization: II. Normalization: ˆv i+1 = Av i V i h i, (1.78 where h i = V T i Av i. (1.79 ˆv i+1 = ˆv i+1 ˆv i+1. (1.80 III. Update: V i+1 = (V i v i+1. (1.81 H i = ( hi ˆv i+1, i = 1, (1.82 H i = ( Hi 1 h i 0 ˆv i+1, i > 1. (1.83

29 Num. Meth. Large-Scale Nonlinear Systems 29 IV. Least squares problem: Compute z i as the solution of β e 1 V. Approximate solution: H i z i = min z lr n β e 1 H i z. (1.84 y i = V i z i + y 0. (1.85 GMRES has the residual norm minimizing property b Ay i = min b Az. (1.86 z K i (r 0,A Moreover, the inner residuals decrease monotonically r i+1 r i, i ln 0. (1.87 Termination criterion for the GMRES iteration The residuals satisfy the orthogonality relation from which we readily deduce (r i, r i r 0 = 0, i ln, (1.88 r 0 2 = r i r r i 2, i ln. (1.89 We define the relative residual norm error Clearly, η i < 1, i ln, and η i := r i r 0. (1.90 η i+1 < η i if η i 0. (1.91 Consequently, given a user specified accuracy η, an appropriate adaptive termination criterion is We note that, in terms of η i, (1.89 can be written as η i η. (1.92 r i r 0 2 = (1 η 2 i r 0 2. (1.93

30 Num. Meth. Large-Scale Nonlinear Systems Convergence of affine contravariant inexact Newton methods We denote by δx k lr n the result of the inner GMRES iteration. As initial values for GMRES we choose δx k 0 = 0, r k 0 = F (x k. (1.94 Consequently, during the inner GMRES iteration the relative error η i, i ln 0, in the residuals satisfies η i = rk i F (x k 1, η i+1 < η i, if η i 0. (1.95 In the sequel, we drop the subindices i for the inner iterations and refer to η k as the final value of the inner iterations at each outer iteration step k. Theorem 3.3 Affine contravariant convergence theorem for the inexact Newton GMRES method. Part I: Linear convergence Suppose that F : D lr n lr n is continuously differentiable on D and let x 0 D be some initial guess. Let further the following affine contravariant Lipschitz condition be satisfied (F (y F (x(y x ω F (x(y x 2, x, y D, ω 0. (1.96 Assume further that the level set L 0 := {x lr n F (x F (x 0 } (1.97 is a compact subset of D. In terms of the Kantorovich quantities h k := ω F (x k, k ln 0. (1.98 the outer residual norms can be bounded according to F (x k+1 (η k (1 η2k h k F (x k. (1.99 Assume that and control the inner iterations according to h 0 < 2 (1.100 η k Θ 1 2 h k, (1.101

31 Num. Meth. Large-Scale Nonlinear Systems 31 for some h 0 2 < Θ < 1. Then, the Newton GMRES iterates x k, k ln 0 stay in L 0 and converge linearly to some x L 0 with F (x = 0 at an estimated rate F (x k+1 Θ F (x k. (1.102 Proof. We recall that the Newton GMRES iterates satisfy F (x k δx k = F (x k + r k, (1.103 x k+1 = x k + δx k. (1.104 It follows from the generalized mean value theorem that F (x k+1 = F (x k + 1 F (x k + tδx k δx k dt. (1.105 Consequently, replacing F (x k in (1.105 by (1.103, we obtain 1 F (x k+1 = 0 ( F (x k + tδx k F (x k δx k dt + r k 0 1 ( F (x k + tδx k F (x k δx k dt + r k ω F (x k δx k 2 + r k 1 2 ω F (xk r k 2 + r k. We recall (1.93 r k F (x k 2 = (1 η 2 k F (x k 2, from which (1.99 can be immediately deduced. Now, in view of (1.101, (1.99 yields ( F (x k+1 η k (1 η2 kh k F (x k Θ 1 2 h k (Θ 1 2 η2 k h k F (x k Θ F (x k. Taking advantage of the previous inequality, by induction on k it follows that x k L 0 D, k ln 0.

32 Num. Meth. Large-Scale Nonlinear Systems 32 Hence, there exists a subsequence ln ln and an x L 0 such that x k x (k ln and F (x = 0. Moreover, since F (x k+l F (x k F (x k+l + F (x k (1 + Θ l F (x k (1 + Θ l Θ k F (x 0 0 (k ln, the whole sequence must converge to x. Theorem 3.4 Affine contravariant convergence theorem for the inexact Newton GMRES method. Part II: Quadratic convergence Under the same assumptions on F : D lr n lr n as in Theorem 3.3 suppose that the initial guess x 0 D satisfies h 0 < ρ for some appropriate ρ > 0 and control the inner iterations such that (1.106 η k 1 η 2 k ρ 2 h k. (1.107 Then, the Newton GMRES iterates x k, k ln 0 stay in L 0 and converge quadratically to some x B(x 0, ρ with F (x = 0 at an estimated rate F (x k ω (1 + ρ (1 η2 k F (x k 2. (1.108 Proof. Inserting (1.107 into (1.99 and observing h k = ω F (x k gives the assertion Algorithmic aspects of affine contravariant inexact Newton methods (i Convergence monitor Throughout the inexact Newton GMRES iteration we use the residual monotonicity test Θ k := F (xk+1 F (x k Θ < 1. (1.109 The iteration is considered as divergent, if Θ k > Θ. (1.110

33 Num. Meth. Large-Scale Nonlinear Systems 33 (ii Termination criterion As in the exact Newton iteration, specifying a residual accuracy F T OL, the termination criterion for the inexact Newton GMRES iteration is (iii Balancing outer and inner iterations F (x k FTOL. (1.111 With regard to (1.101 of Theorem 3.3, in the linear convergence mode the adaptive termination criterion for the inner GMRES iteration is η k Θ 1 2 h k, whereas, in view of (1.107 of Theorem 3.4, in the quadratic convergence mode the termination criterion is η k 1 η 2 k ρ 2 h k. Again, we replace the theoretical Kantorovich quantities h k by some computationally easily available a priori estimates. We distinguish between the quadratic and the linear convergence mode: (iii 1 Quadratic convergence mode We recall the termination criterion (1.107 for the quadratic convergence mode η k 1 η 2 k ρ 2 h k. It suggests the a posteriori estimate [h k ] 2 := 2 Θ k (1 + ρ (1 η 2 k h k. In view of h k+1 = Θ k h k, this implies the a priori estimate [h k+1 ] := Θ k [h k ] 2 Θ k h k = h k+1. (1.112 Using (1.112 in (1.107 results in the computationally feasible termination criterion η k 1 η 2 k 1 2 ρ [h k], ρ 1.0. (1.113

34 Num. Meth. Large-Scale Nonlinear Systems 34 (iii 2 Linear convergence mode We switch from the quadratic to the linear convergence mode, if the local contraction factor satisfies The proof of the previous theorems reveals Θ k < Θ. (1.114 F (x k+1 r k ω 2 F (xk r k 2 = 1 2 (1 η2 k h k F (x k. (1.115 The above inequality (1.115 implies the a posteriori estimate [h k ] 1 := 2 F (xk+1 r k (1 η 2 k F (xk h k (1.116 and the a priori estimate Based on (1.117 we define If we find [h k+1 ] := Θ k [h k ] 1 h k+1. (1.117 η k+1 := Θ 1 2 [h k+1]. (1.118 η k+1 < η k (1.119 with η k from (1.113, we continue the iteration in the quadratic convergence mode. Otherwise, we realize the linear convergence mode with some η k+1 η k+1. (1.120

35 Num. Meth. Large-Scale Nonlinear Systems Affine Conjugate Inexact Newton Methods PCG (Preconditioned Conjugate Gradient The Preconditioned Conjugate Gradient Method (PCG is an iterative solver for linear algebraic systems with a symmetric positive definite coefficient matrix A lr n n. We recall that any symmetric positive definite matrix C lr n n defines an energy inner product (, C according to (u, v C := (u, Cv, u, v lr n. The associated energy norm is denoted by C. The PCG Method with a symmetric positive definite preconditioner B lr n n corresponds to the CG Method applied to the transformed linear algebraic system B 1/2 AB 1/2 (B 1/2 y = B 1/2 b. The PCG Method is implemented as follows: PCG Initialization: Given an initial guess y 0 lr n, compute the residual r 0 = b Ay 0 and the preconditioned residual r 0 = Br 0 and set p 0 := r 0, σ 0 := (r 0, r 0 = r 0 2 B. PCG Iteration Loop: For 0 i i max compute: y i+1 = y i + 1 α i p i, r i+1 = r i 1 α i Ap i, r i+1 = Br i+1, α i = p i 2 A σ i γ 2 i = σ i α i (= y i+1 y i 2 A, p i+1 = r i+1 + σ i+1 σ i p i, σ i+1 = r i+1 2 B.

36 Num. Meth. Large-Scale Nonlinear Systems 36 PCG minimizes the energy error norm y y i A = min z K i (r 0,A y z A, (1.121 where K i (r 0, A denotes the Krylov subspace K i (r 0, A := span{r 0,..., A i 1 r 0 }. (1.122 PCG satisfies the Galerkin orthogonality (y i y 0, y i+m y i A = 0, m ln. (1.123 Denoting by y lr n the unique solution of Ay = b and by ε i := y y i 2 A the square of the iteration error in the energy norm, we have the following error representation: Lemma 3.2 Representation of the iteration error The PCG iteration error satisfies ε i = n 1 γj 2. (1.124 j=i Proof. For m = 1 the Galerkin orthogonality implies the orthogonal decompositions y i+1 y 0 2 A = y i+1 y i 2 A = γ 2 i + y i y 0 2 A, (1.125 y i y 0 2 A = i 1 y j+1 y j 2 A = j=0 i 1 γj 2. (1.126 j=0 On the other hand, observing y n = y, for m = n i the Galerkin orthogonality yields y y 0 2 A = n 1 γj 2 j=0 = y y i 2 A = ε 2 i + y i y 0 2 A = i 1 γj 2 j=0. (1.127

37 Num. Meth. Large-Scale Nonlinear Systems 37 Computable lower bound for the iteration error A lower bound for the iteration error in the energy norm is obviously given by [ε i ] = i+m γj 2. (1.128 j=0 In the inexact Newton PCG method we will control the inner PCG iterations by the relative energy error norms δ i = y y i A y i A [εi ] y i A (1.129 and use the termination criterion where δ is a user specified accuracy. δ i δ, ( Convergence of affine conjugate inexact Newton methods We denote by δx k lr n the result of the inner PCG iteration. As initial value for PCG we choose δx k 0 = 0. (1.131 Again, we will drop the subindices i for the inner PCG iterations and refer to η k as the final value of the inner iterations at each outer iteration step k. We recall the Galerkin orthogonality (cf. (1.123 (δx k, F (x k (δx k x k = (δx k, r k = 0. (1.132 Theorem 3.5 Affine conjugate convergence theorem for the inexact Newton PCG method. Part I: Linear convergence Suppose that f : D lr n lr is a twice continuously differentiable strictly convex functional on D with the first derivative F := f and the Hessian F = f which is symmetric and uniformly positive definite. Assume that x 0 D is some initial guess such that the level set L 0 := {x D f(x f(x 0 }

38 Num. Meth. Large-Scale Nonlinear Systems 38 is compact. Let further the following affine conjugate Lipschitz condition be satisfied F (z (F 1/2 (y F (x v (1.133 ω F (x 1/2 (y x F (x 1/2 v, x, y, z D, ω 0. For the inner Newton PCG iterations consider the exact error terms ε k := F (x k 1/2 x k 2 and the Kantorovich quantities h k := ω F (x k 1/2 x k as well as their inexact analogues ε δ k := F (x k 1/2 δx k 2 = ε k 1 + δ 2 k and h δ k := ω F (x k 1/2 δx k = h k 1 + δ 2 k, where δ k characterizes the inner PCG iteration error ( F (x k 1/2 δx k x k δ k :=. F (x k 1/2 δx k Assume that for some Θ < 1 and that h 0 < 2 Θ < 2 (1.134 δ k+1 δ k, k ln 0 (1.135 holds true throughout the outer Newton iterations. Control the inner iterations according to h δ ϑ(h δ k + δ k (h δk (h δk 2 k, δ k := δk 2 θ. (1.136

39 Num. Meth. Large-Scale Nonlinear Systems 39 Then, the inexact Newton PCG iterates x k, k ln 0 stay in L 0 and converge linearly to some x L 0 with f(x = min x D f(x. The following estimates hold true F (x k+1 1/2 x k+1 Θ F (x k 1/2 x k, k ln 0, (1.137 F (x k+1 1/2 δx k+1 Θ F (x k 1/2 δx k, k ln 0. (1.138 Moreover, the objective functional is reduced according to Proof hδ k ε δ k f(x k f(x k εδ k 1 10 hδ k ε δ k. (1.139 Observing for λ [0, 1] we obtain r k = F (x k + F (x k δx k, k ln 0, f(x k + λδx k f(x k = λ λ (δx k, F (x k + sδx k ds = (1.140 s=0 = λ λ s=0 (δx k, F (x k + sδx k F (x k ds + λ λ s=0 (δx k, F (x k ds = = λ λ s s (δx k, F (x k + stδx k δx k dt ds + λ λ (δx k, F (x k ds = s=0 t=0 s=0 = λ λ s s ( (δx k, F (x k + stδx k F (x k δx k dt ds + s=0 t=0 + λ λ s=0 s s t=0 (δx k, F (x k δx k dt ds + λ λ s=0 (δx k, F (x k r k F (x k δx k ds =

40 Num. Meth. Large-Scale Nonlinear Systems 40 = λ λ s=0 s s t=0 (F (x k 1/2 δx k, F (x k (F 1/2 (x k + stδx k F (x k δx k F (x k 1/2 δx k ω s t F (x k 1/2 δx k 2 = s t h δ k εδ k dt ds + λ λ s s (δx k, F (x k δx k dt ds λ λ (δx k, F (x k δx k ds + s=0 t=0 s=0 + λ λ s=0 (δx k, r k = 0 due to (1.123 It readily follows from (1.140 that ds 1 10 λ6 h δ k ε δ k λ4 ε δ k λ 2 ε δ k. f(x k + λδx k f(x k + λ 2 ( 1 10 hδ k ε δ k + ( 1 3 λ2 1 ε δ k. (1.141 Denoting by L k the level set by induction on k we prove L k := { x D f(x f(x k }, h k < 2 and hence, x k+1 L k. (1.142 For k = 0, we have h 0 < 2 by assumption ( Since h δ 0 h 0, (1.141 readily shows f(x 1 < f(x 0, whence x 1 L 0. Now, assuming (1.142 to hold true for some k ln, again taking advantage of h δ k h k < 2, (1.141 yields f(x k+1 < f(x k and thus x k+1 L k. Moreover, choosing λ = 1 in (1.141, we obtain the left-hand side of the functional descent property ( We note that we get the right-hand side of (1.139, if in (1.140 we estimate by the other direction of the Cauchy-Schwarz inequality. Finally, in order to prove the contraction properties (1.137,(1.138 and linear convergence, we estimate the local energy norms as follows: F (x k+1 1/2 x k+1 = F (x k+1 1/2 F (x k+1 x k+1 = = F (x k+1 = F (x k+1 (F 1/2 (x k+1 ± F (x k =

41 Num. Meth. Large-Scale Nonlinear Systems 41 = F (x k+1 (F 1/2 (x k+1 F (x k + F (x k+1 1/2 F (x k. Observing F (x k = F (x k δx k + r k, and using the affine conjugate Lipschitz condition we obtain F (x k+1 1/2 x k+1 = (1.143 ( 1 = F (x k+1 1/2 0 ( F (x k + tδx k F (x k δx k dt + r k 1 2 ω F (x k 1/2 δx k 2 + F (x k+1 1/2 r k. Setting z = δx k x k, for the second term on the right-hand side of the previous inequality we get the implicit estimate F (x k+1 1/2 r k 2 F (x k 1/2 z 2 + h δ k F (x k 1/2 z F (x k+1 1/2 r k, which gives the explicit bound F (x k+1 1/2 r k 1 2 ( h δ k (h δk 2 F (x k z. (1.144 Using (1.144 in (1.143 results in ω F (x k+1 1/2 x k ω2 F (x k 1/2 δx k 2 + = (h δ k ( h δ k (h δk 2 ω F (x k 1/2 z = δ k h δ k Taking (1.136 into account, we thus get the contraction factor estimate. Θ k := ω F (x k+1 1/2 x k+1 ω F (x k 1/2 x k = h k = 1+δ 2 k hδ k ϑ(h δ k, δ k Θ, (1.145

42 Num. Meth. Large-Scale Nonlinear Systems 42 which proves (1.137 and linear convergence. For the proof of (1.138 we observe F (x l 1/2 x l 2 = (1 + δ 2 l F (x l 1/2 δx l 2, l = k, k + 1, as well as δ k+1 δ k and obtain F (x k+1 1/2 δx k+1 F (x k 1/2 δx k 1 + δ 2 k 1 + δ 2 k+1 Θ k Θ k Θ. (1.146 By standard arguments we further show that the sequence {x k } ln0 of inexact Newton PCG iterates is a Cauchy sequence in L 0 and there exists an x L 0 such that x k x (k with F (x = 0. Theorem 3.6 Affine conjugate convergence theorem for the inexact Newton PCG method. Part II: Quadratic convergence Under the same assumptions on F : D lr n lr n as in Theorem 3.5 suppose that the initial guess x 0 D satisfies h δ 0 < ρ (1.147 for some appropriate ρ > 0 and control the inner iterations such that δ k ρ 2 h δ k h δ k +. ( (h δ k 2 Then, there holds: (i The Newton CGNE iterates x k, k ln 0 stay in L 0 and converge quadratically to some x L 0 with F (x = 0. (ii The exact Newton increments and the inexact Newton increments decrease quadratically according to F (x k+1 1/2 x k ρ 2 ω F (x k 1/2 x k 2, (1.149 F (x k+1 1/2 δx k ρ 2 ω F (x k 1/2 δx k 2. (1.150

43 Num. Meth. Large-Scale Nonlinear Systems 43 Proof. Using (1.148 in (1.145 yields F (x k+1 1/2 x k+1 F (x k 1/2 x k hδ k + δ k (h δ k (h δ k δ 2 k 1 2 (1 + ρ hδ k, which proves (1.149 in view of h δ k h k h 0 < 2Θ. The proof of (1.150 follows along the same line by using (1.148 in ( Algorithmic aspects of the affine conjugate inexact Newton PCG method (i Convergence monitor Let us assume that the quantity Θ < 1 in both the linear convergence mode and the quadratic convergence mode has been specified and let us further assume that we use the startiterate δx k 0 = 0 in the inner PCG iteration. Denoting by δ k an easily computable estimate of the relative energy norm iteration error δ k, we accept a new iterate x k+1, if the condition f(x k+1 f(x k 1 10 ε k = 1 10 (1 + δ2 kε δ k (1.151 or the monotonicity test Θ k := ( εk+1 2 k+1 ε δ k+1 1/2 ( (1 + δ = ε k (1 + δ 2 k ε δ k 1/2 Θ < 1 (1.152 is satisfied. We consider the outer iteration as divergent, if neither (1.151 nor (1.152 hold true. (ii Termination criterion With respect to a user specified accuracy ETOL, the inexact Newton PCG iteration will be terminated, if either or ε k = (1 + δ 2 k ε δ k ETOL 2. (1.153 f(x k f(x k ETOL2. (1.154 (iii Balancing outer and inner iterations For k = 0, we choose δ 0 = δ 0 = 1 4. As in case of the inexact Newton CGNE iteration, for k 1 we begin with the

44 Num. Meth. Large-Scale Nonlinear Systems 44 quadratic convergence mode and switch to the linear convergence mode as soon as the approximate contraction factor Θ k is below some prespecified threshold value Θ 1 2. (iii 1 Quadratic convergence mode A computationally realizable termination criterion for the inner PCG iteration in the quadratic convergence mode is given by δ k ρ [h δ k ] [h δ k ] +, ( [h δ k ]2 where [h δ k ] is an appropriate a priori estimate of the inexact Kantorovich quantity h δ k. In view of (1.145, we have the a posteriori estimates [h δ k] 2 := 10 ε δ k f(x k+1 f(x k εδ k (1.156 and [h k ] 2 := 1 + δ 2 k [h δ k] 2. (1.157 We note that (1.157 yields the a priori estimate [h k ] := Θ k 1 [h k 1 ] 2. (1.158 Using (1.158 in (1.157, for the inexact Kantorovich quantity we obtain the following a priori estimate [h δ k] := [h k ] 1 + δ 2 k. (1.159 Inserting (1.159 into (1.155, we obtain a simple nonlinear equation in δ k. Remark 3.5 Computational work in the quadratic convergence mode Since δ k 0 (k is enforced, it follows that: The more the iterates x k approach the solution x, the more computational work is required for the inner iterations to guarantee quadratic convergence of the outer iteration. (iii 2 Linear convergence mode We switch to the linear convergence mode, if Θ k < Θ (1.160

45 Num. Meth. Large-Scale Nonlinear Systems 45 is satisfied. The computationally realizable termination criterion for the inner iteration in the linear convergence mode is Since asymptotically there holds [ϑ(h δ k, δ k ] := ϑ([h δ k], δ k Θ. (1.161 δ k Θ 1 Θ 2 (k, we observe: Remark 3.6 Computational work in the linear convergence mode The more the iterates x k approach the solution x, the less computational work is required for the inner iterations to guarantee linear convergence of the outer iteration.

46 Num. Meth. Large-Scale Nonlinear Systems Quasi-Newton Methods 4.1 Introduction Given F : D lr n lr n as well as x k, x k+1 D, x k x k+1, the idea is to approximate F locally around x k+1 by an affine function such that S k+1 (x := F (x k+1 + J k+1 (x x k+1, J k+1 lr n n, (1.162 S k+1 (x k = F (x k. (1.163 The requirement (1.163 gives rise to the so-called secant condition J (x k+1 x k = F (x k+1 F (x k. (1.164 =: δx k =: y k The matrix J is not uniquely determined by (1.164, since where dim S k+1 = (n 1n, (1.165 S k+1 := {J lr n n Jδx k = y k }. (1.166 There are different criteria to select an appropriate J S k The Good Broyden rank 1 update Let us consider the change in the affine model as given by S k+1 (x S k (x = (J k+1 J k (x x k. (1.167 An appropriate idea is to choose J k+1 S k+1 such that there is a least change in the affine model in the sense J k+1 J k F = min J S k+1 J J k F, (1.168 where F stands for the Frobenius norm (observe J = (J ik n i,k=1 J F := ( n 1/2 Jik 2. (1.169 i,k=1

47 Num. Meth. Large-Scale Nonlinear Systems 47 The solution of (1.169 can be heuristically motivated as follows: Choose t k δx k such that Then, (1.167 reads x x k = αδx k + t k. S k+1 (x S k (x = α(j k+1 J k δx k = α(y k J k δx k Now, choose J k+1 S k+1 such that It follows that (J k+1 J k t k = 0. + (J k+1 J k t k. (1.170 rank (J k+1 J k = 1, J k+1 J k = v k (δx k T. (1.171 Inserting (1.171 into (1.170 yields which results in α v k (δx k T δx k = α (y k J k δx k, v k = yk J k δx k (δx k T δx k. Altogether, this gives us Broyden s rank 1 update (Good Broyden J k+1 = J k + [ ] F (x k+1 F (x k J k δx k (δx k T (δx k T δx k. (1.172 For the solution of nonlinear systems, we are more interested in updates of the inverse of J k. Such an update can be provided by the Sherman-Morrison- Woodbury formula Setting (A + uv T 1 = A 1 A 1 uv T A v T A 1 u. (1.173 A := J k, u := F (x k+1 F (x k J k δx k, v := (δxk T (δx k T δx k, we obtain J 1 k+1 = J 1 k + [ δx k J 1 ] k (F (xk+1 F (x k (δx k T J 1 k [ ]. (1.174 F (x k+1 F (x k (δx k T J 1 k

48 Num. Meth. Large-Scale Nonlinear Systems The Bad Broyden rank 1 update Instead of (1.168, an alternative to choose J k+1 S k+1 such that there is a least change in the solution of the affine model, i.e., J 1 k+1 J 1 k F = min J S k+1 J 1 J 1 k F. (1.175 Similar considerations as before lead us to the Broyden s alternative rank 1 update (Bad Broyden J 1 k+1 = J 1 k + [ ( ]( T δx k J 1 k F (x k+1 F (x k F (x k+1 F (x k ( T (.(1.176 F (x k+1 F (x k F (x k+1 F (x k 4.2 Affine covariant Quasi-Newton method Affine covariant Quasi-Newton convergence theory Affine covariant Quasi-Newton methods require the secant condition (1.164 to be stated by means of affine covariant terms in the domain of definition of the nonlinear mapping F. Observing that we compute the Quasi-Newton increment δx k as the solution of we can rewrite (1.164 according to J k δx k = F (x k, (1.177 (J k Jδx k = F (x k+1. Multiplication by J 1 k yields the affine covariant secant condition δx k+1 := (I J 1 k }{{ J δx k } =: E k (J = J 1 k F (xk+1. (1.178 we note that any rank 1 update of the form J k+1 = J k ( I δxk+1 v T v T δx k satisfies the affine covariant secant condition ( In particular, for v = δx k we recover the Good Broyden., v lr n \ {0} (1.179

49 Num. Meth. Large-Scale Nonlinear Systems 49 Theorem 4.1 Properties of the affine covariant Quasi-Newton method For Broyden s affine covariant rank 1 update (Good Broyden J k+1 = J k ( I δxk+1 (δx k T δx k 2 (1.180 assume that the local contraction condition Θ k = δxk+1 δx k < 1 2 (1.181 is satisfied. Then, there holds: (i The update matrix J k+1 is a least change update in the sense that E k (J k+1 E k (J, J S k+1, (1.182 E k (J k+1 Θ k. (1.183 (ii If J k is regular, then J k+1 is regular as well with the inverse given by J 1 k+1 = (I + δx k+1 (δx k T J 1 (1 α k+1 δx k 2 k, (1.184 where α k+1 = (δxk T δx k+1 δx k 2 < 1 2. (iii The Quasi-Newton increment δx k+1 is given by δx k+1 = J 1 k+1 F (xk+1 = δxk+1 1 α k+1. (1.185 (iv The Quasi-Newton increments decrease according to δx k+1 δx k Θ k 1 α k+1 < 1. (1.186 Proof. In view of (1.178 we have E k (J k+1 = δxk+1 (δx k T δx k 2 = E k (J δxk (δx k T δx k 2 E k (J,

50 Num. Meth. Large-Scale Nonlinear Systems 50 which proves ( Moreover, (1.183 follows readily from E k (J k+1 = δxk+1 (δx k T δx k 2 δxk+1 δx k = Θ k. The same argument shows and hence, (1.186 follows from α k+1 Θ k < 1 2, δx k+1 δx k = Θ k 1 α k+1 Θ k 1 Θ k < 1. Finally, the proofs of (ii and (iii are direct consequences of the Sherman- Morrison-Woodbury formula ( Theorem 4.2 Convergence of the affine covariant Quasi-Newton method Suppose that that F : D lr n lr n, D lr n convex, is continuously differentiable on D. Let x D be the unique solution of F (x = 0 in D with invertible Jacobian F (x. Assume that the following affine covariant Lipschitz condition is satisfied F (x (F 1 (x F (x v ω x x v, (1.187 where x, x + v D, v lr n. For some 0 < Θ < 1 assume further that: (a (b The initial approximate Jacobian J 0 satisfies δ 0 := F (x 1 (J 0 F (x 0 < The initial guess x 0 D satisfies t 0 Then, there holds: (i := ω x 0 x + 1 Θ 2 Θ Θ 1 + Θ. (1.188 ( Θ 1 + Θ δ 0. (1.189 The Quasi-Newton iterates x k, k ln 0 converge to x according to x k+1 x < Θ x k x, (1.190

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

Local Inexact Newton Multilevel FEM for Nonlinear Elliptic Problems

Local Inexact Newton Multilevel FEM for Nonlinear Elliptic Problems Konrad-Zuse-Zentrum für Informationstechnik Berlin Heilbronner Str. 10, D-10711 Berlin-Wilmersdorf Peter Deuflhard Martin Weiser Local Inexact Newton Multilevel FEM for Nonlinear Elliptic Problems Preprint

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

M.A. Botchev. September 5, 2014

M.A. Botchev. September 5, 2014 Rome-Moscow school of Matrix Methods and Applied Linear Algebra 2014 A short introduction to Krylov subspaces for linear systems, matrix functions and inexact Newton methods. Plan and exercises. M.A. Botchev

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

Numerical solutions of nonlinear systems of equations

Numerical solutions of nonlinear systems of equations Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Chapter 7 Iterative Techniques in Matrix Algebra

Chapter 7 Iterative Techniques in Matrix Algebra Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition

More information

NORMS ON SPACE OF MATRICES

NORMS ON SPACE OF MATRICES NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 4: Iterative Methods PD

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate

More information

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294) Conjugate gradient method Descent method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps

More information

Affine covariant Semi-smooth Newton in function space

Affine covariant Semi-smooth Newton in function space Affine covariant Semi-smooth Newton in function space Anton Schiela March 14, 2018 These are lecture notes of my talks given for the Winter School Modern Methods in Nonsmooth Optimization that was held

More information

An improved convergence theorem for the Newton method under relaxed continuity assumptions

An improved convergence theorem for the Newton method under relaxed continuity assumptions An improved convergence theorem for the Newton method under relaxed continuity assumptions Andrei Dubin ITEP, 117218, BCheremushinsaya 25, Moscow, Russia Abstract In the framewor of the majorization technique,

More information

Bindel, Spring 2016 Numerical Analysis (CS 4220) Notes for

Bindel, Spring 2016 Numerical Analysis (CS 4220) Notes for Life beyond Newton Notes for 2016-04-08 Newton s method has many attractive properties, particularly when we combine it with a globalization strategy. Unfortunately, Newton steps are not cheap. At each

More information

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method Leslie Foster 11-5-2012 We will discuss the FOM (full orthogonalization method), CG,

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

The Newton-ADI Method for Large-Scale Algebraic Riccati Equations. Peter Benner.

The Newton-ADI Method for Large-Scale Algebraic Riccati Equations. Peter Benner. The Newton-ADI Method for Large-Scale Algebraic Riccati Equations Mathematik in Industrie und Technik Fakultät für Mathematik Peter Benner benner@mathematik.tu-chemnitz.de Sonderforschungsbereich 393 S

More information

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES ITERATIVE METHODS BASED ON KRYLOV SUBSPACES LONG CHEN We shall present iterative methods for solving linear algebraic equation Au = b based on Krylov subspaces We derive conjugate gradient (CG) method

More information

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

Nonlinear equations. Norms for R n. Convergence orders for iterative methods Nonlinear equations Norms for R n Assume that X is a vector space. A norm is a mapping X R with x such that for all x, y X, α R x = = x = αx = α x x + y x + y We define the following norms on the vector

More information

c 2007 Society for Industrial and Applied Mathematics

c 2007 Society for Industrial and Applied Mathematics SIAM J. OPTIM. Vol. 18, No. 1, pp. 106 13 c 007 Society for Industrial and Applied Mathematics APPROXIMATE GAUSS NEWTON METHODS FOR NONLINEAR LEAST SQUARES PROBLEMS S. GRATTON, A. S. LAWLESS, AND N. K.

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large

More information

Maria Cameron. f(x) = 1 n

Maria Cameron. f(x) = 1 n Maria Cameron 1. Local algorithms for solving nonlinear equations Here we discuss local methods for nonlinear equations r(x) =. These methods are Newton, inexact Newton and quasi-newton. We will show that

More information

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering A short course on: Preconditioned Krylov subspace methods Yousef Saad University of Minnesota Dept. of Computer Science and Engineering Universite du Littoral, Jan 19-3, 25 Outline Part 1 Introd., discretization

More information

MATH 4211/6211 Optimization Quasi-Newton Method

MATH 4211/6211 Optimization Quasi-Newton Method MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

Termination criteria for inexact fixed point methods

Termination criteria for inexact fixed point methods Termination criteria for inexact fixed point methods Philipp Birken 1 October 1, 2013 1 Institute of Mathematics, University of Kassel, Heinrich-Plett-Str. 40, D-34132 Kassel, Germany Department of Mathematics/Computer

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University Lecture Note 7: Iterative methods for solving linear systems Xiaoqun Zhang Shanghai Jiao Tong University Last updated: December 24, 2014 1.1 Review on linear algebra Norms of vectors and matrices vector

More information

X. Linearization and Newton s Method

X. Linearization and Newton s Method 163 X. Linearization and Newton s Method ** linearization ** X, Y nls s, f : G X Y. Given y Y, find z G s.t. fz = y. Since there is no assumption about f being linear, we might as well assume that y =.

More information

Basic Concepts of Adaptive Finite Element Methods for Elliptic Boundary Value Problems

Basic Concepts of Adaptive Finite Element Methods for Elliptic Boundary Value Problems Basic Concepts of Adaptive Finite lement Methods for lliptic Boundary Value Problems Ronald H.W. Hoppe 1,2 1 Department of Mathematics, University of Houston 2 Institute of Mathematics, University of Augsburg

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations

Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations Boris T. Polyak Andrey A. Tremba V.A. Trapeznikov Institute of Control Sciences RAS, Moscow, Russia Profsoyuznaya, 65, 117997

More information

Matrix Secant Methods

Matrix Secant Methods Equation Solving g(x) = 0 Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). 3700 years ago the Babylonians used the secant method in 1D:

More information

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books. Applied Analysis APPM 44: Final exam 1:3pm 4:pm, Dec. 14, 29. Closed books. Problem 1: 2p Set I = [, 1]. Prove that there is a continuous function u on I such that 1 ux 1 x sin ut 2 dt = cosx, x I. Define

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

Chapter 3. Differentiable Mappings. 1. Differentiable Mappings

Chapter 3. Differentiable Mappings. 1. Differentiable Mappings Chapter 3 Differentiable Mappings 1 Differentiable Mappings Let V and W be two linear spaces over IR A mapping L from V to W is called a linear mapping if L(u + v) = Lu + Lv for all u, v V and L(λv) =

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Chapter 1 Foundations of Elliptic Boundary Value Problems 1.1 Euler equations of variational problems

Chapter 1 Foundations of Elliptic Boundary Value Problems 1.1 Euler equations of variational problems Chapter 1 Foundations of Elliptic Boundary Value Problems 1.1 Euler equations of variational problems Elliptic boundary value problems often occur as the Euler equations of variational problems the latter

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright. Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright. John L. Weatherwax July 7, 2010 wax@alum.mit.edu 1 Chapter 5 (Conjugate Gradient Methods) Notes

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT 11-14 On the convergence of inexact Newton methods R. Idema, D.J.P. Lahaye, and C. Vuik ISSN 1389-6520 Reports of the Department of Applied Mathematical Analysis Delft

More information

Quasi-Newton Methods

Quasi-Newton Methods Quasi-Newton Methods Werner C. Rheinboldt These are excerpts of material relating to the boos [OR00 and [Rhe98 and of write-ups prepared for courses held at the University of Pittsburgh. Some further references

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts Some definitions Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization M. M. Sussman sussmanm@math.pitt.edu Office Hours: MW 1:45PM-2:45PM, Thack 622 A matrix A is SPD (Symmetric

More information

FEM and sparse linear system solving

FEM and sparse linear system solving FEM & sparse linear system solving, Lecture 9, Nov 19, 2017 1/36 Lecture 9, Nov 17, 2017: Krylov space methods http://people.inf.ethz.ch/arbenz/fem17 Peter Arbenz Computer Science Department, ETH Zürich

More information

An analysis for the DIIS acceleration method used in quantum chemistry calculations

An analysis for the DIIS acceleration method used in quantum chemistry calculations An analysis for the DIIS acceleration method used in quantum chemistry calculations Thorsten Rohwedder and Reinhold Schneider Abstract. This work features an analysis for the acceleration technique DIIS

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 20 1 / 20 Overview

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν

1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν 1 Search Directions In this chapter we again focus on the unconstrained optimization problem P min f(x), x R n where f : R n R is assumed to be twice continuously differentiable, and consider the selection

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

Static unconstrained optimization

Static unconstrained optimization Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R

More information

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION by Darin Griffin Mohr An Abstract Of a thesis submitted in partial fulfillment of the requirements for the Doctor of

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method The minimization problem We are given a symmetric positive definite matrix R n n and a right hand side vector b R n We want to solve the linear system Find u R n such that

More information

Iterative Methods for Linear Systems of Equations

Iterative Methods for Linear Systems of Equations Iterative Methods for Linear Systems of Equations Projection methods (3) ITMAN PhD-course DTU 20-10-08 till 24-10-08 Martin van Gijzen 1 Delft University of Technology Overview day 4 Bi-Lanczos method

More information

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices Chapter 7 Iterative methods for large sparse linear systems In this chapter we revisit the problem of solving linear systems of equations, but now in the context of large sparse systems. The price to pay

More information

Preconditioned inverse iteration and shift-invert Arnoldi method

Preconditioned inverse iteration and shift-invert Arnoldi method Preconditioned inverse iteration and shift-invert Arnoldi method Melina Freitag Department of Mathematical Sciences University of Bath CSC Seminar Max-Planck-Institute for Dynamics of Complex Technical

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Iterative Solution of a Matrix Riccati Equation Arising in Stochastic Control

Iterative Solution of a Matrix Riccati Equation Arising in Stochastic Control Iterative Solution of a Matrix Riccati Equation Arising in Stochastic Control Chun-Hua Guo Dedicated to Peter Lancaster on the occasion of his 70th birthday We consider iterative methods for finding the

More information

Inexact Newton Methods Applied to Under Determined Systems. Joseph P. Simonis. A Dissertation. Submitted to the Faculty

Inexact Newton Methods Applied to Under Determined Systems. Joseph P. Simonis. A Dissertation. Submitted to the Faculty Inexact Newton Methods Applied to Under Determined Systems by Joseph P. Simonis A Dissertation Submitted to the Faculty of WORCESTER POLYTECHNIC INSTITUTE in Partial Fulfillment of the Requirements for

More information

Simple Iteration, cont d

Simple Iteration, cont d Jim Lambers MAT 772 Fall Semester 2010-11 Lecture 2 Notes These notes correspond to Section 1.2 in the text. Simple Iteration, cont d In general, nonlinear equations cannot be solved in a finite sequence

More information

Lecture 3: Inexact inverse iteration with preconditioning

Lecture 3: Inexact inverse iteration with preconditioning Lecture 3: Department of Mathematical Sciences CLAPDE, Durham, July 2008 Joint work with M. Freitag (Bath), and M. Robbé & M. Sadkane (Brest) 1 Introduction 2 Preconditioned GMRES for Inverse Power Method

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc. Lecture 11: CMSC 878R/AMSC698R Iterative Methods An introduction Outline Direct Solution of Linear Systems Inverse, LU decomposition, Cholesky, SVD, etc. Iterative methods for linear systems Why? Matrix

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 3: Iterative Methods PD

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

Levenberg-Marquardt methods based on probabilistic gradient models and inexact subproblem solution, with application to data assimilation

Levenberg-Marquardt methods based on probabilistic gradient models and inexact subproblem solution, with application to data assimilation Levenberg-Marquardt methods based on probabilistic gradient models and inexact subproblem solution, with application to data assimilation E. Bergou S. Gratton L. N. Vicente June 26, 204 Abstract The Levenberg-Marquardt

More information

Numerical Methods for Differential Equations Mathematical and Computational Tools

Numerical Methods for Differential Equations Mathematical and Computational Tools Numerical Methods for Differential Equations Mathematical and Computational Tools Gustaf Söderlind Numerical Analysis, Lund University Contents V4.16 Part 1. Vector norms, matrix norms and logarithmic

More information

A nonlinear equation is any equation of the form. f(x) = 0. A nonlinear equation can have any number of solutions (finite, countable, uncountable)

A nonlinear equation is any equation of the form. f(x) = 0. A nonlinear equation can have any number of solutions (finite, countable, uncountable) Nonlinear equations Definition A nonlinear equation is any equation of the form where f is a nonlinear function. Nonlinear equations x 2 + x + 1 = 0 (f : R R) f(x) = 0 (x cos y, 2y sin x) = (0, 0) (f :

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

1 Conjugate gradients

1 Conjugate gradients Notes for 2016-11-18 1 Conjugate gradients We now turn to the method of conjugate gradients (CG), perhaps the best known of the Krylov subspace solvers. The CG iteration can be characterized as the iteration

More information

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization

More information

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS RALPH HOWARD DEPARTMENT OF MATHEMATICS UNIVERSITY OF SOUTH CAROLINA COLUMBIA, S.C. 29208, USA HOWARD@MATH.SC.EDU Abstract. This is an edited version of a

More information

An Iteratively Regularized Projection Method with Quadratic Convergence for Nonlinear Ill-posed Problems

An Iteratively Regularized Projection Method with Quadratic Convergence for Nonlinear Ill-posed Problems Int. Journal of Math. Analysis, Vol. 4, 1, no. 45, 11-8 An Iteratively Regularized Projection Method with Quadratic Convergence for Nonlinear Ill-posed Problems Santhosh George Department of Mathematical

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,

More information

STOP, a i+ 1 is the desired root. )f(a i) > 0. Else If f(a i+ 1. Set a i+1 = a i+ 1 and b i+1 = b Else Set a i+1 = a i and b i+1 = a i+ 1

STOP, a i+ 1 is the desired root. )f(a i) > 0. Else If f(a i+ 1. Set a i+1 = a i+ 1 and b i+1 = b Else Set a i+1 = a i and b i+1 = a i+ 1 53 17. Lecture 17 Nonlinear Equations Essentially, the only way that one can solve nonlinear equations is by iteration. The quadratic formula enables one to compute the roots of p(x) = 0 when p P. Formulas

More information

Trust-Region SQP Methods with Inexact Linear System Solves for Large-Scale Optimization

Trust-Region SQP Methods with Inexact Linear System Solves for Large-Scale Optimization Trust-Region SQP Methods with Inexact Linear System Solves for Large-Scale Optimization Denis Ridzal Department of Computational and Applied Mathematics Rice University, Houston, Texas dridzal@caam.rice.edu

More information

Numerical Methods I Solving Nonlinear Equations

Numerical Methods I Solving Nonlinear Equations Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give

More information

Iterative methods for Linear System

Iterative methods for Linear System Iterative methods for Linear System JASS 2009 Student: Rishi Patil Advisor: Prof. Thomas Huckle Outline Basics: Matrices and their properties Eigenvalues, Condition Number Iterative Methods Direct and

More information

Levenberg-Marquardt methods based on probabilistic gradient models and inexact subproblem solution, with application to data assimilation

Levenberg-Marquardt methods based on probabilistic gradient models and inexact subproblem solution, with application to data assimilation Levenberg-Marquardt methods based on probabilistic gradient models and inexact subproblem solution, with application to data assimilation E. Bergou S. Gratton L. N. Vicente May 24, 206 Abstract The Levenberg-Marquardt

More information

Chapter 8 Gradient Methods

Chapter 8 Gradient Methods Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point

More information

Self-Concordant Barrier Functions for Convex Optimization

Self-Concordant Barrier Functions for Convex Optimization Appendix F Self-Concordant Barrier Functions for Convex Optimization F.1 Introduction In this Appendix we present a framework for developing polynomial-time algorithms for the solution of convex optimization

More information

A full-newton step infeasible interior-point algorithm for linear programming based on a kernel function

A full-newton step infeasible interior-point algorithm for linear programming based on a kernel function A full-newton step infeasible interior-point algorithm for linear programming based on a kernel function Zhongyi Liu, Wenyu Sun Abstract This paper proposes an infeasible interior-point algorithm with

More information

Mathematics Department Stanford University Math 61CM/DM Inner products

Mathematics Department Stanford University Math 61CM/DM Inner products Mathematics Department Stanford University Math 61CM/DM Inner products Recall the definition of an inner product space; see Appendix A.8 of the textbook. Definition 1 An inner product space V is a vector

More information