Numerical Methods for Large-Scale Nonlinear Systems
|
|
- Ambrose Owens
- 6 years ago
- Views:
Transcription
1 Numerical Methods for Large-Scale Nonlinear Systems Handouts by Ronald H.W. Hoppe following the monograph P. Deuflhard Newton Methods for Nonlinear Problems Springer, Berlin-Heidelberg-New York, 2004
2 Num. Meth. Large-Scale Nonlinear Systems 2 1. Classical Newton Convergence Theorems 1.1 Classical Newton-Kantorovich Theorem Theorem 1.1 Classical Newton-Kantorovich Theorem Let X and Y be Banach spaces, D X a convex subset, and suppose that that F : D X Y is continuously Fréchet differentiable on D with an invertible Fréchet derivative F (x 0 for some initial guess x 0 D. Assume further that the following conditions hold true: F (x 0 1 F (x 0 α, (1.1 F (y F (x γ y x, x, y D, (1.2 h 0 := α γ F (x 0 1 < 1 2, (1.3 B(x 0, ρ 0 D, ρ 0 := 1 1 2h 0 γ F (x 0 1. (1.4 Then, for the sequence {x k } ln0 of Newton iterates F (x k x k = F (x k, x k+1 = x k + x k there holds (i F (x is invertible for all Newton iterates x = x k, k ln 0, (ii The sequence {x k } ln of Newton iterates is well defined with x k B(x 0, ρ 0, k ln 0, and x k x B(x 0, ρ 0, k ln 0 (k, where F (x = 0, (iii (iv The convergence x k x (k is quadratic, The solution x of F (x = 0 is unique in B(x 0, ρ 0 (D B(x 0, ρ 0, ρ 0 := h 0 γ F (x 0 1. Proof. We have F (x k F (x 0 γ x k x 0 t k for some upper bound t k, k ln. If we can prove x k B(x 0, ρ 0 and t k := F (x 0 1 t k < 1, k ln, then by the
3 Num. Meth. Large-Scale Nonlinear Systems 3 Banach perturbation lemma F (x k is invertible with F (x k 1 F (x F (x 0 1 F (x k F (x 0 (1.5 F (x γ F (x 0 1 x k x 0 F (x 0 1 =: β k. 1 t k We prove x k B(x 0, ρ 0 and t k < 1, k ln, by induction on k: For k = 1 we have x 1 x 0 = F (x 0 1 F (x 0 α = since h 0 < 1 1 2h 0, and h 0 γ F (x 0 1 < ρ 0, t 1 := F (x 0 1 t 1 = γ F (x 0 1 x 1 x 0 = = γ F (x 0 1 F (x 0 1 F (x 0 α γ F (x 0 1 < = h 0 Assuming the assertion to be true for some k ln, for k + 1, using (1.2 we obtain Setting 1 2. x k+1 x k = F (x k 1 F (x k = (1.6 = F (x k 1 (F (x k F (x k 1 F (x k 1 x k 1 = = F (x k 1 F (x k β k γ x k x k 1 2. we thus get the recursion In view of the relationship 0 ( F (x k 1 + s x k 1 F (x k 1 x k 1 ds F (x k 1 + s x k 1 F (x k 1 s γ x k 1 h k := γ x k+1 x k, x k 1 ds h k = 1 2 β k h 2 k 1, k ln. (1.7 γ x k+1 x 0 t k+1 γ x k+1 x k + γ x k x 0 = h k t k,
4 Num. Meth. Large-Scale Nonlinear Systems 4 we consider the recursion Observing (1.5 and (1.7, we find t k+1 = t k + h k. (1.8 t k+1 t k = 1 2 F (x t k (t k t k 1 2. Hence, multiplying both sides with F (x 0 1, we end up with the following three-term recursion for t k : t k+1 t k = t k ( t k t k 1 2, t 0 = 0, t 1 = h 0. (1.9 The famous Ortega-trick allows to reduce (1.9 to a two-term recursion which can be interpreted as a Newton method in lr 1 : Multiplying both sides in (1.9 by 1 t k results in from which we deduce ( t k+1 t k (1 t k = 1 2 ( t k t k 1 2, t k+1 t k+1 t k t 2 k = ψ( t k+1, t k = t k t k t k t 2 k 1 = ψ( t k, t k 1. It follows that from which we deduce ψ( t k+1, t k = ψ( t 1, t 0 = h 0, t k+1 t k = h 0 t k t 2 k 1 t k = ϕ( t k ϕ ( t k, where ϕ : lr lr is given by ϕ( t := h 0 t t 2. Obviously, ϕ has the zeroes t 1 := 1 1 2h 0, t 2 := h 0.
5 Num. Meth. Large-Scale Nonlinear Systems 5 Since ϕ is convex, the Newton method converges for t 1 to t 1. It follows from the definition of t k that x k B(x 0, ρ 0. Moreover, as a consequence of (1.6 we readily find that {x k } ln is a Cauchy sequence in B(x 0, ρ 0. Hence, there exists x B(x 0, ρ 0 such that x k x (k and x k = F (x k 1 F (x k F (x 1 F (x = 0, whence F (x = 0. The quadratic convergence can be deduced from (1.6 as well. Finally, the uniqueness of x in B(x 0, ρ 0 (D B(x 0, ρ 0 follows readily from the properties of the function ϕ. 1.2 Classical Newton-Mysovskikh Theorem Theorem 1.2 Classical Newton-Mysovskikh Theorem Let X and Y be Banach spaces, D X a convex subset, and suppose that that F : D X Y is continuously Fréchet differentiable on D with invertible Fréchet derivatives F (x, x D, and let x 0 D be some initial guess. Assume further that the following conditions hold true: Then, for the sequence {x k } ln0 there holds F (x 0 1 F (x 0 α, (1.10 F (x 1 β, x D, (1.11 F (y F (x ω y x, x, y D, (1.12 h 0 := 1 2 β ω F (x 0 1 F (x α β ω < 1, (1.13 B(x 0, ρ D, ρ := α h 2j 1 α 0. ( h 0 j=0 of Newton iterates F (x k x k = F (x k, x k+1 = x k + x k (i x k B(x 0, ρ, k ln 0, and there exists x B(x 0, ρ such that F (x = 0 and x k x (k, (ii x k+1 x k 1 2 β ω xk x k 1, k ln, (iii x k x ε k x k x k 1 2, where ε k := 1 2 β ω (1 + j=1 (h 2k 0 2j 1 2 β ω 1 h 2k 0.
6 Num. Meth. Large-Scale Nonlinear Systems 6 Proof. Observing F (x k 1 x k 1 + F (x k 1 = 0, we obtain x k = F (x k 1 F (x k = = F (x k 1 ( F (x k F (x k 1 F (x k 1 x k 1 1 F (x k β ω xk 1 2, 0 ( F (x k 1 + s x k 1 F (x k 1 x k 1 ds which gives the assertion (ii. We now prove that {x k } ln0 is a Cauchy sequence in B(x 0, ρ. By induction on k we show For k = 0, we have in view of (1.10 x k+1 x k 2 βω h2k 0, k ln 0. (1.15 x 0 α = 2 βω h 0. Assuming (1.15 to be true for some k ln, we get x k+2 x k+1 = x k β ω xk 2 1 ( β ω 2 βω h2k 0 = h2k+1 0. βω It follows readily from (1.15 that x k+1 B(x 0, ρ: x k+1 x 0 x k+1 x k x 1 x 0 2 ( h 2k h 0 = 2 βω βω h 0 h 2j 1 0 = ρ. j=0 = α Similarly, it can be shown that x m+k x m 0 (,.
7 Num. Meth. Large-Scale Nonlinear Systems 7 Since {x k } ln0 is a Cauchy sequence in B(x 0, ρ, there exists x B(x 0, ρ such that x k x (k. Hence, x k = F (x k 1 F (x k F (x F (x = 0, and thus F (x = 0 which proves (i. The assertion (iii is shown as follows: Setting we obtain h k := 1 2 β ω xk, x k x = lim k<m xk x m [ ] lim x m x m x k+1 x k k<m 2 [ ] lim h m h k = βω k<m = 2h k βω lim k<m [ 1 + h k+1 h k h m 1 h k ]. On the other hand, taking (ii into account whence h k ( βω 2 2 x k 1 2 = h 2 k 1, h k+l h 2l k, k ln 0. We conclude x k x 1 2 β ω xk 1 2 [ 1 2 β ω (1 + j=1 h 2j k x k 1 2, ] 1 + h 2 k +... which proves (iii.
8 Num. Meth. Large-Scale Nonlinear Systems 8 2. Affine Invariant/Conjugate Newton Convergence Theorems 2.1 Affine Covariant Newton Convergence Theorems Theorem 2.1 Affine Covariant Newton-Kantorovich Theorem Let F : D lr n lr n be continuously differentiable on D with an invertible Jacobian F (x 0 for some initial guess x 0 D. Assume further that the following conditions hold true: F (x 0 1 F (x 0 α, (1.16 F (x 0 1 (F (y F (x γ y x, x, y D, (1.17 h 0 := α γ < 1 2, (1.18 B(x 0, ρ 0 D, ρ 0 := 1 1 2h 0 γ. (1.19 Then, for the sequence {x k } ln0 of Newton iterates F (x k x k = F (x k, x k+1 = x k + x k there holds (i F (x is invertible for all Newton iterates x = x k, k ln 0, (ii The sequence {x k } ln of Newton iterates is well defined with x k B(x 0, ρ 0, k ln 0, and x k x B(x 0, ρ 0, k ln 0 (k, where F (x = 0, (iii (iv The convergence x k x (k is quadratic, The solution x of F (x = 0 is unique in B(x 0, ρ 0 (D B(x 0, ρ 0, ρ 0 := h 0 γ. Proof. First homework assignment.
9 Num. Meth. Large-Scale Nonlinear Systems 9 Theorem 2.2 Affine Covariant Newton-Mysovskikh Theorem Let F : D lr n lr n, d lr n convex, be continuously differentiable on D with invertible Jacobians F (x, x D, and let x 0 D be some initial guess. Assume further that the following conditions hold true: F (x 0 1 F (x 0 α, (1.20 F (z (F 1 (y F (x (y x ω y x 2, x, y, z D (1.21, h 0 := ω x 0 α ω < 2, (1.22 B(x 0, ρ D, ρ := x0. ( h 0 2 Then, for the sequence {x k } ln0 of Newton iterates F (x k x k = F (x k, x k+1 = x k + x k there holds x k B(x 0, ρ, k ln 0, and there exists x B(x 0, ρ such that F (x = 0 and x k x (k with x k+1 x k 1 2 ω xk x k 1 2, x k x x k x k ω xk x k 1. Proof. slight modification of the Classical Newton-Mysovskikh Theorem. 2.2 Affine Contravariant Newton Convergence Theorem Theorem 2.3 Affine Contravariant Newton-Mysovskikh Theorem Let F : D lr n lr n, d lr n convex, be continuously differentiable on D with invertible Jacobians F (x, x D, and let x 0 D be some initial guess. Assume further that the following conditions hold true: ( F (y F (x (y x ω F (x(y x 2, x, y D,(1.24 L ω D, L ω := {x D F (x < 2 ω, (1.25 h 0 := ω F (x 0 < 2. (1.26
10 Num. Meth. Large-Scale Nonlinear Systems 10 Then, the sequence {x k } ln0 of Newton iterates stays in L ω, and there exists an x L ω such that x k x for some subsequence N ln and F (x = 0. Moreover, for the residuals F (x k there holds F (x k 1 2 ω F (xk 2. Proof. We first prove x k L ω by induction on k: (i k = 0: in view of (1.26 F (x 0 < 2 ω = x 0 L ω. (ii Assume that the assertion holds true for some k ln. (iii For any λ [0, 1] such that x k + t x k L ω, t [0, λ], we have F (x k + λ x k = F (x k + λ F (x k + t x k x k dt. 0 Since F (x k = F (x k x k, F (x k = λ F (x k x k + (1 λ F (x k, and hence, F (x k + λ x k = (1.27 λ = 0 [( ] F (x k + t x k F (x k x k + (1 λ F (x k dt λ 0 dt + (1 λ F (x ( F (x k + t x k F (x k x k k ωt F (x k x k 2 } {{ } λ ω F (x k x k 2 t dt + (1 λ F (x k = 0 = F (x k
11 Num. Meth. Large-Scale Nonlinear Systems 11 = (1 λ ωλ2 F (x k F (x k. We assume x k+1 = x k + x k / L ω. Then there exists λ := min{λ (0, 1] x k + λ x k L ω }. It follows from (1.27 F (x k + λ x k (1 λ ωλ2 F (x k F (x k < < 2 ω < (1 λ + λ 2 < 1 F (x k < 2 ω < 2 ω, and hence, x k + λ x k L ω which is a contradiction. For λ = 1, (1.27 gives the asserted residual estimate. For the proof of the rest of the assertion, we define the residual oriented Kantorovich quantities then, (1.26 implies i.e., Since h 0 < 2, for k = 0 we obtain h k := ω F (x k. ω F (x k ω2 F (x k, h k h2 k = 1 2 h k h k. h h 0 < 2 h 0 < h 0,
12 Num. Meth. Large-Scale Nonlinear Systems 12 and an induction argument shows Moreover, h k+1 < h k < 2, k ln 0. F (x k+1 < F (x k < 2 ω and lim k F (xk = 0, which implies x k L ω D, k ln. Since L ω is bounded, there exist x L ω and a subsequence ln ln such that x k x (k ln and F (x = 0. Affine conjugacy Assume that D lr n is a convex set and that f : D lr is a strictly convex functional. Consider the minimization problem min f(x. x D Then, a necessary and sufficient optimality condition is given by the nonlinear equation F (x = grad f(x = f (x T = 0, x D. We note that the Jacobian F (x = f (x is symmetric and uniformly positive definite on D. In particular, F (x 1/2 is well defined and symmetric, positive definite as well. Consequently, the energy product (u, v E := u T F (xv, u, v, x D defines locally an inner product with associated norm u 2 E = u T F (xu = F (x 1/2 u 2 which is referred to as a local energy norm. For regular B lr n n, we consider the transformed minimization problem min y g(y, g(y := f(by, x = By.
13 Num. Meth. Large-Scale Nonlinear Systems 13 We obtain the optimality condition G(y = grad g(y = (f (ByB T = B T f (x T = B T F (By = 0 with the transformed Jacobian G (y = B T F (xb. Hence, the Jacobian transformation is conjugate which motivates the notion of affine conjugacy.
14 Num. Meth. Large-Scale Nonlinear Systems 14 An appropriate affine conjugate Lipschitz condition is as follows F (z (F 1/2 (y F (x (y x ω F (z 1/2 (y x Affine Conjugate Newton Convergence Theorem Theorem 2.4 Affine Conjugate Newton-Mysovskikh Theorem Assume that D lr n is a convex domain and f : D lr a strictly convex, twice continuously differentiable functional. Let F (x = f (x T and F (x = f (x. Consider the minimization problem and the associated optimality condition min f(x (1.28 x D F (x = grad f(x = 0, x D. (1.29 Note that (1.28 has a unique solution x D. Let x 0 D be an initial guess and assume that the following conditions are satisfied: F (z (F 1/2 (y F (x (y x ω F (z 1/2 (y x 2 (1.30 for collinear x, y, z D and h 0 := ω F (x 0 1/2 x 0 < 2, (1.31 L 0 := {x D f(x < f(x 0 } is compact. (1.32 Then, for the Newton iterates x k, k ln 0, there holds: (i x k L 0, k ln 0, and x k x (k with F (x k+1 1/2 x k ω F (x k 1/2 x k 2. (1.33 (ii For ε k := F (x k 1/2 x k 2 and the Kantorovich quantities h k := ω ε 1/2 k have 1 6 h k ε k f(x k f(x k ε k 1 6 h k ε k, ( ε k f(x k f(x k ε k. (1.35 (iii We have the a priori estimate f(x 0 f(x we 5 6 ε 0 1 h 0 2. (1.36
15 Num. Meth. Large-Scale Nonlinear Systems 15 Proof: Assertion (i and (1.33 can be verified as in the proof of the affine contravariant version of the Newton-Mysovskikh theorem. For the proof of (1.34 in (ii, observing F (x k x k = F (x k, we obtain f(x k+1 f(x k F (x k 1/2 x k 2 = = 1 s=0 < F (x k + s x k, x k > ds < F (x k, x k > < F (x k x k, x k > < F (x k x k, x k > = = 1 < F (x k + s x k F (x k, x k > ds 1 2 < F (x k x k, x k > = = s=0 1 1 < F (x k + st x k x k, x k > dt ds 1 2 < F (x k x k, x k >= = = s=0 t=0 1 1 s=0 1 s=0 1 s=0 s s t=0 1 s t=0 1 t=0 < (F (x k + st x k F (x k x k =: w k, x k > dt ds = < F (x k 1/2 w k, F (x k 1/2 x k > dt ds F (x k 1/2 w k ωst F (x k 1/2 x k 2 F (x k 1/2 x k 2 ω F (x k 1/2 x k = ε k which proves (1.34. = h k F (x k 1/2 x k dt ds 1 s=0 1 s 2 t=0 Using the right-hand side of (1.34 and h k < 2 yields f(x k f(x k+1 ( h k ε k < 5 6 ε k. t dt 1 6 h k ε k,
16 Num. Meth. Large-Scale Nonlinear Systems 16 Likewise, using the left-hand side of (1.34 and h k < 2 f(x k f(x k+1 ( h k ε k > 1 6 ε k. Together, this proves (1.35. In order to prove (iii, we use (1.34 and obtain 0 ω 2 (f(x 0 f(x ω 2 (f(x k f(x k+1 < 5 6 ω2 ε k = = 5 6 h2 k = (1 2 h k 2. k=0 Using we further get 1 2 h k+1 ( 1 2 h k h k < 1, ( 1 2 h ( 1 2 h ( 1 2 h ( 1 2 h ( 1 2 h h2 0 ( 1 2 h 0 k = k=0 1 4 h2 0 1 h 0 2, which proves (1.36.
17 Num. Meth. Large-Scale Nonlinear Systems Inexact Newton Methods We recall that Newton s method computes iterates successively as the solution of linear algebraic systems F (x k x k = F (x k, k ln 0, (1.37 x k+1 = x k + x k. The classical convergence theorems of Newton-Kantorovich and Newton-Mysovskikh and its affine covariant, affine contravariant, and affine conjugate versions assume the exact solution of (1.37. In practice however, in particular if the dimension is large, (1.37 will be solved by an iterative method. In this case, we end up with an outer/inner iteration, where the outer iterations are the Newton steps and the inner iterations result from the application of an iterative scheme to (1.37. It is important to tune the outer and inner iterations and to keep track of the iteration errors. With regard to affine covariance, affine contravariance, and affine conjugacy the iterative scheme for the inner iterations has to be chosen in such a way, that it easily provides information about the error norm in case of affine covariance, residual norm in case of affine contravariance, and energy norm in case of affine conjugacy. Except for convex optimization, we cannot expect F (x, x D, to be symmetric positive definite. Hence, for affine covariance and affine contravariance we have to pick iterative solvers that are designed for nonsymmetric matrices. Appropriate candidates are CGNE (Conjugate Gradient for the Normal Equations in case of affine covariance, GMRES (Generalized Minimum RESidual in case of affine contravariance, and PCG (Preconditioned Conjugate Gradient in case of affine conjugacy.
18 Num. Meth. Large-Scale Nonlinear Systems Affine Covariant Inexact Newton Methods CGNE (Conjugate Gradient for the Normal Equations We assume A lr n n to be a regular, nonsymmetric matrix and b lr n to be given and look for y lr n as the unique solution of the linear algebraic system Ay = b. (1.38 As the name already suggests, CGNE is the conjugate gradient method applied to the normal equations: It solves the system for z and then computes y according to The implementation of CGNE is as follows: CGNE Initialization: AA T z = b, (1.39 y = A T z. (1.40 Given an initial guess y 0 lr n, compute the residual r 0 = b Ay 0 and set p 0 = r 0, p 0 = 0, β 0 = 0, σ 0 = r 0 2. CGNE Iteration Loop: For 1 i i max compute p i = A T r i 1 + β i 1 p i 1, α i = σ i 1 p i 2, y i = y i 1 α i p i, γ 2 i 1 = α i σ i 1, r i = r i 1 α i Ap i, σ i = r i 2, β i = σ i σ i 1. CGNE has the error minimizing property y y i = min y v, (1.41 v K i (A T r 0,A T A where K i (A T r 0, A T A stands for the Krylov subspace K i (A T r 0, A T A := span{a T r 0, (A T AA T r 0,..., (A T A i 1 A T r 0 }. (1.42
19 Num. Meth. Large-Scale Nonlinear Systems 19 Lemma 3.1 Representation of the iteration error Let ε i := y y i 2 be the square of the CGNE iteration error with respect to the i-th iterate. Then, there holds ε i = n 1 γj 2. (1.43 j=i Proof. CGNE has the Galerkin orthogonality (y i y 0, y i+m y i = 0, m ln. (1.44 Setting m = 1, this implies the orthogonal decomposition y i+1 y 0 2 = y i+1 y i 2 + y i y 0 2, (1.45 which readily gives y i y 0 2 = i 1 y j+1 y j 2 = j=0 i 1 γj 2. (1.46 j=0 On the other hand, observing y n = y, for m = n i the Galerkin orthogonality yields y y 0 2 = n 1 γj 2 j=0 = y y i 2 = ε 2 i + y i y 0 2 = i 1 γj 2 j=0. (1.47 Computable lower bound for the iteration error It follows readily from Lemma 3.1 that the computable quantity [ε i ] := i+m γj 2, m ln, (1.48 j=i provides a lower bound for the iteration error. In practice, we will test the relative error norm according to δ i := y y i y i [εi ] y i δ, (1.49 where δ is a user specified accuracy.
20 Num. Meth. Large-Scale Nonlinear Systems Convergence of affine covariant inexact Newton methods We denote by δx k lr n the result of an inner iteration, e.g., CGNE, for the solution of (1.37. Then, it is easy to see that the iteration error δx k x k satisfies the error equation F (x k (δx k x k = F (x k + F (x k δx k =: r k. (1.50 We will measure the impact of the inexact solution of (1.37 by the relative error δ k := δxk x k δx k. (1.51 Theorem 3.1 Affine covariant convergence theorem for the inexact Newton method. Part I: Linear convergence Suppose that that F : D lr n lr n is continuously differentiable on D with invertible Fréchet derivatives F (x, x lr n. Assume further that the following affine covariant Lipschitz condition is satisfied F (z (F 1 (y F (x v ω y x v, (1.52 where x, y, z D, v lr n. Assume that x 0 D is an initial guess for the outer Newton iteration and that δx 0 = 0 is chosen as the startiterate for the inner iteration. Consider the Kantorovich quantities h k := ω x k, h δ k := ω δx k = h k 1 + δ 2 k (1.53 associated with the outer and inner iteration. Assume that h 0 < 2 Θ, 0 Θ < 1, (1.54 and control the inner iterations according to ϑ(h k, δ k := 1 2 hδ k + δ k(1 + h δ k Θ < 1, ( δ 2 k which implies linear convergence. Note that a necessary condition for ϑ(h k, δ k Θ is that it holds true for δ k = 0, which is satisfied due to assumption (1.37.
21 Num. Meth. Large-Scale Nonlinear Systems 21 Then, there holds: (i The Newton CGNE iterates x k, k ln 0 stay in B(x 0, ρ, ρ := δx0 1 Θ (1.56 and converge linearly to some x B(x 0, ρ with F (x = 0. (ii The exact Newton increments decrease monotonically according to x k+1 x k Θ, (1.57 whereas for the inexact Newton increments we have δx k+1 δx k 1 + δ 2 k 1 + δ 2 k+1 Θ Θ. (1.58 Proof. By elementary calculations we find x k+1 = F (x k+1 1 F (x k+1 = (1.59 ] = F (x k+1 [F 1 (x k+1 F (x k + F (x k+1 1 F (x k = r k F (x k δx k = F (x k+1 1 [ F (x k+1 F (x k F (x k δx k ] + + F (x k+1 1 r k = F (x k (δx k x k 1 ] F (x k+1 [F 1 (x k + tδx k F (x k δx k dt + 0 } {{ } =: I + F (x k+1 1 F (x k (δx k x k =: II.
22 Num. Meth. Large-Scale Nonlinear Systems 22 Using the affine covariant Lipschitz condition (1.52, the first term on the right-hand side in (1.59 can be estimated according to I ω δx k 2 1 t dt = 1 2 ω δxk 2. ( For the second term we obtain by the same argument ] II = F (x k+1 [F 1 (x k (δx k x k ± F (x k+1 (δx k x k (1.61 F (x k+1 1 (F (x k+1 F (x k (δx k x k + + F (x k+1 1 F (x k+1 (δx k x k 1 2 ω δxk δx k x k + δx k x k 2. Combining (1.60 and (1.61 yields δx k+1 δx k 1 2 ω δxk = h δ k h δ k + δ k (1 + h δ k. Observing (1.53, we finally get x k+1 x k ϑ(h k, δ k = ω δxk δxk x k δx k = δ k h δ k + δxk x k δx k = δ k 1 2 hδ k + δ k(1 + h δ k Θ < 1, ( δ 2 k which implies linear convergence. Note that a necessary condition for ϑ(h k, δ k Θ is that it holds true for δ k = 0, which is satisfied due to assumption (1.54. For the contraction of the inexact Newton increments we get δx k+1 δx k = 1 + δ 2 k 1 + δ 2 k+1 x k+1 x k 1 + δ 2 k 1 + δ 2 k+1 Θ Θ. (1.63 It can be easily shown that {x k } ln0 is a Cauchy sequence in B(x 0, ρ. Consequently, there exists x B(x 0, ρ such that x k x (k. Since we conclude F (x = 0. F (x k δx k 0 = F (x k + r k, F (x
23 Num. Meth. Large-Scale Nonlinear Systems 23 Theorem 3.2 Affine covariant convergence theorem for the inexact Newton method. Part II: Quadratic convergence Under the same assumptions on F : D lr n lr n as in Theorem 3.1 suppose that the initial guess x 0 D satisfies h 0 < ρ for some appropriate ρ > 0 and control the inner iterations such that Then, there holds: (i δ k ρ 2 h δ k 1 + h δ k The Newton CGNE iterates x k, k ln 0 stay in B(x 0, ρ, ρ := (1.64. (1.65 δx ρ 2 h 0 (1.66 and converge quadratically to some x B(x 0, ρ with F (x = 0. (ii The exact Newton increments and the inexact Newton increments decrease quadratically according to x k ρ 2 δx k ρ 2 ω x k 2, (1.67 ω δx k 2. (1.68 Proof. We proceed as in the proof of Theorem 3.1 to obtain x k+1 x k ϑ(h k, δ k = 1 2 hδ k + δ k(1 + h δ k. 1 + δ 2 k and δx k+1 δx k = 1 + δ 2 k 1 + δ 2 k+1 x k+1 x k. In view of (1.65 we get the further estimates x k+1 x k 1 + ρ 2 h k 1 + δ 2 k 1 + ρ 2 h k.
24 Num. Meth. Large-Scale Nonlinear Systems 24 and δx k+1 δx k 1 + ρ 2 h δ k 1 + δ 2 k ρ 2 h δ k, from which (1.67 and (1.68 follow by the definition of the Kantorovich quantities. In order to deduce quadratic convergence we have to make sure that the initial increments (k = 0 are small enough, i.e., 1 + ρ 2 h δ ρ 2 h 0 < 1. (1.69 Furthermore, (1.68 and (1.69 allow us to show that the iterates x k, k ln stay in B(x 0, ρ. Indeed, (1.68 implies and hence, δx j 1 + ρ 2 x k x h j 1 δx j ρ 2 k δx j δx 0 j=0 k j=0 h 0 δx j 1, j ln, ( 1 + ρ 2 h 0 j δx ρ 2 h Algorithmic aspects of affine covariant inexact Newton methods (i Convergence monitor Let us assume that the quantity Θ < 1 in both the linear convergence mode and the quadratic convergence mode has been specified and let us further assume that we use CGNE with δx k 0 = 0 in the inner iteration. Then, (1.58 suggests the monotonicity test Θ k := 1 + δ2 k δ 2 k δx k+1 δx k Θ, (1.70 where δ 2 k and δ 2 k+1 are computationally available estimates of δ 2 k and δ2 k+1. (ii Termination criterion We recall that the termination criterion for the exact Newton iteration with respect to a user specified accuracy XT OL is given by x k 1 Θ 2 k 1 XTOL.
25 Num. Meth. Large-Scale Nonlinear Systems 25 According to (1.53 we have x k = 1 + δ 2 k δxk. Consequently, replacing Θ k 1 and δ k by the computable quantities Θ k 1 and δ k, we arrive at the termination criterion 1 + δ 2 k 1 Θ 2 k 1 XTOL. (1.71 (iii Balancing outer and inner iterations According to (1.55 of Theorem 3.1, in the linear convergence mode the adaptive termination criterion for the inner iteration is ϑ(h k, δ k := 1 2 hδ k + δ k(1 + h δ k Θ < δ 2 k On the other hand, in view of (1.65 of Theorem 3.2, in the quadratic convergence mode the termination criterion is δ k ρ 2 h δ k 1 + h δ k. Since the theoretical Kantorovich quantities (cf. (1.53 h δ k = ω δx k = h k 1 + δ 2 k are not directly accessible, we have to replace them by computationally available estimates [h δ k ]. We recall that for h k we have the a priori estimate [h k ] = 2 Θ 2 k 1 h k. Consequently, replacing δ k by δ k, h k by [h k ], and Θ k 1 by Θ k 1 (cf. (1.70, we get the a priori estimates [h δ k] = [h k ] 1 + δ 2 k, [h k ] = 2 Θ 2 k 1, k ln. (1.72 For k = 0, we choose δ 0 = δ 0 = 1 4. In practice, for k 1 we begin with the quadratic convergence mode and switch
26 Num. Meth. Large-Scale Nonlinear Systems 26 to the linear convergence mode as soon as the approximate contraction factor Θ k is below some prespecified threshold value Θ 1 2. (iii 1 Quadratic convergence mode The computationally realizable termination criterion for the inner iteration in the quadratic convergence mode is δ k ρ 2 [h δ k ] 1 + [h δ k ]. (1.73 Inserting (1.72 into (1.73, we obtain a simple nonlinear equation in δ k. Remark 3.1 Validity of the approximate termination criterion Observing that the right-hand side in (1.73 is a monotonically increasing function of [h δ k ], and taking [hδ k ] hδ k into account, it follows that for δ k δ k the approximate termination criterion (1.73 implies the exact termination criterion (1.65. Remark 3.2 Computational work in the quadratic convergence mode Since δ k 0 (k is enforced, it follows that: The more the iterates x k approach the solution x, the more computational work is required for the inner iterations to guarantee quadratic convergence of the outer iteration. (iii 2 Linear convergence mode We switch to the linear convergence mode, once the criterion Θ k < Θ (1.74 is met. The computationally realizable termination criterion for the inner iteration in the linear convergence mode is [ϑ(h k, δ k ] := ϑ([h k ], δ k = 1 2 [hδ k ] + δ k(1 + [h δ k ] 1 + δ 2 k Θ. (1.75 Remark 3.3 Validity of the approximate termination criterion Since the right-hand side in (1.75 is a monotonically increasing function in [h δ k ] and [h δ k ] hδ k, the estimate provided by (1.75 may be too small and thus result in an overestimation of δ k. However, since the exact quantities and their a priori estimates both tend to zero as k approaches infinity, asymptotically we may rely on (1.75.
27 Num. Meth. Large-Scale Nonlinear Systems 27 In practice, we require the monotonicity test (1.70 in CGNE and run the inner iterations until δ k satisfies (1.75 or divergence occurs, i.e., Remark 3.4 Θ k > 2 Θ. Computational work in the linear convergence mode As opposed to the quadratic convergence mode, we observe The more the iterates x k approach the solution x, the less computational work is required for the inner iterations to guarantee linear convergence of the outer iteration.
28 Num. Meth. Large-Scale Nonlinear Systems Affine Contravariant Inexact Newton Methods GMRES (Generalized Minimum RESidual The Generalized Minimum RESidual Method (GMRES is an iterative solver for nonsymmetric linear algebraic systems which generates an orthogonal basis of the Krylov subspace K i (r 0, A := span{r 0, Ar 0,..., A i 1 r 0 }. (1.76 by a modified Gram-Schmidt orthogonalization called the Arnoldi method. The inner product coefficients are stored in an upper Hessenberg matrix so that an approximate solution can be obtained by the solution of a leastsquares problem in terms of that Hessenberg matrix: GMRES Initialization: Given an initial guess y 0 lr n, compute the residual r 0 = b Ay 0 and set β := r 0, v 1 := r 0 β, V 1 := v 1. (1.77 GMRES Iteration Loop: For 1 i i max : I. Orthogonalization: II. Normalization: ˆv i+1 = Av i V i h i, (1.78 where h i = V T i Av i. (1.79 ˆv i+1 = ˆv i+1 ˆv i+1. (1.80 III. Update: V i+1 = (V i v i+1. (1.81 H i = ( hi ˆv i+1, i = 1, (1.82 H i = ( Hi 1 h i 0 ˆv i+1, i > 1. (1.83
29 Num. Meth. Large-Scale Nonlinear Systems 29 IV. Least squares problem: Compute z i as the solution of β e 1 V. Approximate solution: H i z i = min z lr n β e 1 H i z. (1.84 y i = V i z i + y 0. (1.85 GMRES has the residual norm minimizing property b Ay i = min b Az. (1.86 z K i (r 0,A Moreover, the inner residuals decrease monotonically r i+1 r i, i ln 0. (1.87 Termination criterion for the GMRES iteration The residuals satisfy the orthogonality relation from which we readily deduce (r i, r i r 0 = 0, i ln, (1.88 r 0 2 = r i r r i 2, i ln. (1.89 We define the relative residual norm error Clearly, η i < 1, i ln, and η i := r i r 0. (1.90 η i+1 < η i if η i 0. (1.91 Consequently, given a user specified accuracy η, an appropriate adaptive termination criterion is We note that, in terms of η i, (1.89 can be written as η i η. (1.92 r i r 0 2 = (1 η 2 i r 0 2. (1.93
30 Num. Meth. Large-Scale Nonlinear Systems Convergence of affine contravariant inexact Newton methods We denote by δx k lr n the result of the inner GMRES iteration. As initial values for GMRES we choose δx k 0 = 0, r k 0 = F (x k. (1.94 Consequently, during the inner GMRES iteration the relative error η i, i ln 0, in the residuals satisfies η i = rk i F (x k 1, η i+1 < η i, if η i 0. (1.95 In the sequel, we drop the subindices i for the inner iterations and refer to η k as the final value of the inner iterations at each outer iteration step k. Theorem 3.3 Affine contravariant convergence theorem for the inexact Newton GMRES method. Part I: Linear convergence Suppose that F : D lr n lr n is continuously differentiable on D and let x 0 D be some initial guess. Let further the following affine contravariant Lipschitz condition be satisfied (F (y F (x(y x ω F (x(y x 2, x, y D, ω 0. (1.96 Assume further that the level set L 0 := {x lr n F (x F (x 0 } (1.97 is a compact subset of D. In terms of the Kantorovich quantities h k := ω F (x k, k ln 0. (1.98 the outer residual norms can be bounded according to F (x k+1 (η k (1 η2k h k F (x k. (1.99 Assume that and control the inner iterations according to h 0 < 2 (1.100 η k Θ 1 2 h k, (1.101
31 Num. Meth. Large-Scale Nonlinear Systems 31 for some h 0 2 < Θ < 1. Then, the Newton GMRES iterates x k, k ln 0 stay in L 0 and converge linearly to some x L 0 with F (x = 0 at an estimated rate F (x k+1 Θ F (x k. (1.102 Proof. We recall that the Newton GMRES iterates satisfy F (x k δx k = F (x k + r k, (1.103 x k+1 = x k + δx k. (1.104 It follows from the generalized mean value theorem that F (x k+1 = F (x k + 1 F (x k + tδx k δx k dt. (1.105 Consequently, replacing F (x k in (1.105 by (1.103, we obtain 1 F (x k+1 = 0 ( F (x k + tδx k F (x k δx k dt + r k 0 1 ( F (x k + tδx k F (x k δx k dt + r k ω F (x k δx k 2 + r k 1 2 ω F (xk r k 2 + r k. We recall (1.93 r k F (x k 2 = (1 η 2 k F (x k 2, from which (1.99 can be immediately deduced. Now, in view of (1.101, (1.99 yields ( F (x k+1 η k (1 η2 kh k F (x k Θ 1 2 h k (Θ 1 2 η2 k h k F (x k Θ F (x k. Taking advantage of the previous inequality, by induction on k it follows that x k L 0 D, k ln 0.
32 Num. Meth. Large-Scale Nonlinear Systems 32 Hence, there exists a subsequence ln ln and an x L 0 such that x k x (k ln and F (x = 0. Moreover, since F (x k+l F (x k F (x k+l + F (x k (1 + Θ l F (x k (1 + Θ l Θ k F (x 0 0 (k ln, the whole sequence must converge to x. Theorem 3.4 Affine contravariant convergence theorem for the inexact Newton GMRES method. Part II: Quadratic convergence Under the same assumptions on F : D lr n lr n as in Theorem 3.3 suppose that the initial guess x 0 D satisfies h 0 < ρ for some appropriate ρ > 0 and control the inner iterations such that (1.106 η k 1 η 2 k ρ 2 h k. (1.107 Then, the Newton GMRES iterates x k, k ln 0 stay in L 0 and converge quadratically to some x B(x 0, ρ with F (x = 0 at an estimated rate F (x k ω (1 + ρ (1 η2 k F (x k 2. (1.108 Proof. Inserting (1.107 into (1.99 and observing h k = ω F (x k gives the assertion Algorithmic aspects of affine contravariant inexact Newton methods (i Convergence monitor Throughout the inexact Newton GMRES iteration we use the residual monotonicity test Θ k := F (xk+1 F (x k Θ < 1. (1.109 The iteration is considered as divergent, if Θ k > Θ. (1.110
33 Num. Meth. Large-Scale Nonlinear Systems 33 (ii Termination criterion As in the exact Newton iteration, specifying a residual accuracy F T OL, the termination criterion for the inexact Newton GMRES iteration is (iii Balancing outer and inner iterations F (x k FTOL. (1.111 With regard to (1.101 of Theorem 3.3, in the linear convergence mode the adaptive termination criterion for the inner GMRES iteration is η k Θ 1 2 h k, whereas, in view of (1.107 of Theorem 3.4, in the quadratic convergence mode the termination criterion is η k 1 η 2 k ρ 2 h k. Again, we replace the theoretical Kantorovich quantities h k by some computationally easily available a priori estimates. We distinguish between the quadratic and the linear convergence mode: (iii 1 Quadratic convergence mode We recall the termination criterion (1.107 for the quadratic convergence mode η k 1 η 2 k ρ 2 h k. It suggests the a posteriori estimate [h k ] 2 := 2 Θ k (1 + ρ (1 η 2 k h k. In view of h k+1 = Θ k h k, this implies the a priori estimate [h k+1 ] := Θ k [h k ] 2 Θ k h k = h k+1. (1.112 Using (1.112 in (1.107 results in the computationally feasible termination criterion η k 1 η 2 k 1 2 ρ [h k], ρ 1.0. (1.113
34 Num. Meth. Large-Scale Nonlinear Systems 34 (iii 2 Linear convergence mode We switch from the quadratic to the linear convergence mode, if the local contraction factor satisfies The proof of the previous theorems reveals Θ k < Θ. (1.114 F (x k+1 r k ω 2 F (xk r k 2 = 1 2 (1 η2 k h k F (x k. (1.115 The above inequality (1.115 implies the a posteriori estimate [h k ] 1 := 2 F (xk+1 r k (1 η 2 k F (xk h k (1.116 and the a priori estimate Based on (1.117 we define If we find [h k+1 ] := Θ k [h k ] 1 h k+1. (1.117 η k+1 := Θ 1 2 [h k+1]. (1.118 η k+1 < η k (1.119 with η k from (1.113, we continue the iteration in the quadratic convergence mode. Otherwise, we realize the linear convergence mode with some η k+1 η k+1. (1.120
35 Num. Meth. Large-Scale Nonlinear Systems Affine Conjugate Inexact Newton Methods PCG (Preconditioned Conjugate Gradient The Preconditioned Conjugate Gradient Method (PCG is an iterative solver for linear algebraic systems with a symmetric positive definite coefficient matrix A lr n n. We recall that any symmetric positive definite matrix C lr n n defines an energy inner product (, C according to (u, v C := (u, Cv, u, v lr n. The associated energy norm is denoted by C. The PCG Method with a symmetric positive definite preconditioner B lr n n corresponds to the CG Method applied to the transformed linear algebraic system B 1/2 AB 1/2 (B 1/2 y = B 1/2 b. The PCG Method is implemented as follows: PCG Initialization: Given an initial guess y 0 lr n, compute the residual r 0 = b Ay 0 and the preconditioned residual r 0 = Br 0 and set p 0 := r 0, σ 0 := (r 0, r 0 = r 0 2 B. PCG Iteration Loop: For 0 i i max compute: y i+1 = y i + 1 α i p i, r i+1 = r i 1 α i Ap i, r i+1 = Br i+1, α i = p i 2 A σ i γ 2 i = σ i α i (= y i+1 y i 2 A, p i+1 = r i+1 + σ i+1 σ i p i, σ i+1 = r i+1 2 B.
36 Num. Meth. Large-Scale Nonlinear Systems 36 PCG minimizes the energy error norm y y i A = min z K i (r 0,A y z A, (1.121 where K i (r 0, A denotes the Krylov subspace K i (r 0, A := span{r 0,..., A i 1 r 0 }. (1.122 PCG satisfies the Galerkin orthogonality (y i y 0, y i+m y i A = 0, m ln. (1.123 Denoting by y lr n the unique solution of Ay = b and by ε i := y y i 2 A the square of the iteration error in the energy norm, we have the following error representation: Lemma 3.2 Representation of the iteration error The PCG iteration error satisfies ε i = n 1 γj 2. (1.124 j=i Proof. For m = 1 the Galerkin orthogonality implies the orthogonal decompositions y i+1 y 0 2 A = y i+1 y i 2 A = γ 2 i + y i y 0 2 A, (1.125 y i y 0 2 A = i 1 y j+1 y j 2 A = j=0 i 1 γj 2. (1.126 j=0 On the other hand, observing y n = y, for m = n i the Galerkin orthogonality yields y y 0 2 A = n 1 γj 2 j=0 = y y i 2 A = ε 2 i + y i y 0 2 A = i 1 γj 2 j=0. (1.127
37 Num. Meth. Large-Scale Nonlinear Systems 37 Computable lower bound for the iteration error A lower bound for the iteration error in the energy norm is obviously given by [ε i ] = i+m γj 2. (1.128 j=0 In the inexact Newton PCG method we will control the inner PCG iterations by the relative energy error norms δ i = y y i A y i A [εi ] y i A (1.129 and use the termination criterion where δ is a user specified accuracy. δ i δ, ( Convergence of affine conjugate inexact Newton methods We denote by δx k lr n the result of the inner PCG iteration. As initial value for PCG we choose δx k 0 = 0. (1.131 Again, we will drop the subindices i for the inner PCG iterations and refer to η k as the final value of the inner iterations at each outer iteration step k. We recall the Galerkin orthogonality (cf. (1.123 (δx k, F (x k (δx k x k = (δx k, r k = 0. (1.132 Theorem 3.5 Affine conjugate convergence theorem for the inexact Newton PCG method. Part I: Linear convergence Suppose that f : D lr n lr is a twice continuously differentiable strictly convex functional on D with the first derivative F := f and the Hessian F = f which is symmetric and uniformly positive definite. Assume that x 0 D is some initial guess such that the level set L 0 := {x D f(x f(x 0 }
38 Num. Meth. Large-Scale Nonlinear Systems 38 is compact. Let further the following affine conjugate Lipschitz condition be satisfied F (z (F 1/2 (y F (x v (1.133 ω F (x 1/2 (y x F (x 1/2 v, x, y, z D, ω 0. For the inner Newton PCG iterations consider the exact error terms ε k := F (x k 1/2 x k 2 and the Kantorovich quantities h k := ω F (x k 1/2 x k as well as their inexact analogues ε δ k := F (x k 1/2 δx k 2 = ε k 1 + δ 2 k and h δ k := ω F (x k 1/2 δx k = h k 1 + δ 2 k, where δ k characterizes the inner PCG iteration error ( F (x k 1/2 δx k x k δ k :=. F (x k 1/2 δx k Assume that for some Θ < 1 and that h 0 < 2 Θ < 2 (1.134 δ k+1 δ k, k ln 0 (1.135 holds true throughout the outer Newton iterations. Control the inner iterations according to h δ ϑ(h δ k + δ k (h δk (h δk 2 k, δ k := δk 2 θ. (1.136
39 Num. Meth. Large-Scale Nonlinear Systems 39 Then, the inexact Newton PCG iterates x k, k ln 0 stay in L 0 and converge linearly to some x L 0 with f(x = min x D f(x. The following estimates hold true F (x k+1 1/2 x k+1 Θ F (x k 1/2 x k, k ln 0, (1.137 F (x k+1 1/2 δx k+1 Θ F (x k 1/2 δx k, k ln 0. (1.138 Moreover, the objective functional is reduced according to Proof hδ k ε δ k f(x k f(x k εδ k 1 10 hδ k ε δ k. (1.139 Observing for λ [0, 1] we obtain r k = F (x k + F (x k δx k, k ln 0, f(x k + λδx k f(x k = λ λ (δx k, F (x k + sδx k ds = (1.140 s=0 = λ λ s=0 (δx k, F (x k + sδx k F (x k ds + λ λ s=0 (δx k, F (x k ds = = λ λ s s (δx k, F (x k + stδx k δx k dt ds + λ λ (δx k, F (x k ds = s=0 t=0 s=0 = λ λ s s ( (δx k, F (x k + stδx k F (x k δx k dt ds + s=0 t=0 + λ λ s=0 s s t=0 (δx k, F (x k δx k dt ds + λ λ s=0 (δx k, F (x k r k F (x k δx k ds =
40 Num. Meth. Large-Scale Nonlinear Systems 40 = λ λ s=0 s s t=0 (F (x k 1/2 δx k, F (x k (F 1/2 (x k + stδx k F (x k δx k F (x k 1/2 δx k ω s t F (x k 1/2 δx k 2 = s t h δ k εδ k dt ds + λ λ s s (δx k, F (x k δx k dt ds λ λ (δx k, F (x k δx k ds + s=0 t=0 s=0 + λ λ s=0 (δx k, r k = 0 due to (1.123 It readily follows from (1.140 that ds 1 10 λ6 h δ k ε δ k λ4 ε δ k λ 2 ε δ k. f(x k + λδx k f(x k + λ 2 ( 1 10 hδ k ε δ k + ( 1 3 λ2 1 ε δ k. (1.141 Denoting by L k the level set by induction on k we prove L k := { x D f(x f(x k }, h k < 2 and hence, x k+1 L k. (1.142 For k = 0, we have h 0 < 2 by assumption ( Since h δ 0 h 0, (1.141 readily shows f(x 1 < f(x 0, whence x 1 L 0. Now, assuming (1.142 to hold true for some k ln, again taking advantage of h δ k h k < 2, (1.141 yields f(x k+1 < f(x k and thus x k+1 L k. Moreover, choosing λ = 1 in (1.141, we obtain the left-hand side of the functional descent property ( We note that we get the right-hand side of (1.139, if in (1.140 we estimate by the other direction of the Cauchy-Schwarz inequality. Finally, in order to prove the contraction properties (1.137,(1.138 and linear convergence, we estimate the local energy norms as follows: F (x k+1 1/2 x k+1 = F (x k+1 1/2 F (x k+1 x k+1 = = F (x k+1 = F (x k+1 (F 1/2 (x k+1 ± F (x k =
41 Num. Meth. Large-Scale Nonlinear Systems 41 = F (x k+1 (F 1/2 (x k+1 F (x k + F (x k+1 1/2 F (x k. Observing F (x k = F (x k δx k + r k, and using the affine conjugate Lipschitz condition we obtain F (x k+1 1/2 x k+1 = (1.143 ( 1 = F (x k+1 1/2 0 ( F (x k + tδx k F (x k δx k dt + r k 1 2 ω F (x k 1/2 δx k 2 + F (x k+1 1/2 r k. Setting z = δx k x k, for the second term on the right-hand side of the previous inequality we get the implicit estimate F (x k+1 1/2 r k 2 F (x k 1/2 z 2 + h δ k F (x k 1/2 z F (x k+1 1/2 r k, which gives the explicit bound F (x k+1 1/2 r k 1 2 ( h δ k (h δk 2 F (x k z. (1.144 Using (1.144 in (1.143 results in ω F (x k+1 1/2 x k ω2 F (x k 1/2 δx k 2 + = (h δ k ( h δ k (h δk 2 ω F (x k 1/2 z = δ k h δ k Taking (1.136 into account, we thus get the contraction factor estimate. Θ k := ω F (x k+1 1/2 x k+1 ω F (x k 1/2 x k = h k = 1+δ 2 k hδ k ϑ(h δ k, δ k Θ, (1.145
42 Num. Meth. Large-Scale Nonlinear Systems 42 which proves (1.137 and linear convergence. For the proof of (1.138 we observe F (x l 1/2 x l 2 = (1 + δ 2 l F (x l 1/2 δx l 2, l = k, k + 1, as well as δ k+1 δ k and obtain F (x k+1 1/2 δx k+1 F (x k 1/2 δx k 1 + δ 2 k 1 + δ 2 k+1 Θ k Θ k Θ. (1.146 By standard arguments we further show that the sequence {x k } ln0 of inexact Newton PCG iterates is a Cauchy sequence in L 0 and there exists an x L 0 such that x k x (k with F (x = 0. Theorem 3.6 Affine conjugate convergence theorem for the inexact Newton PCG method. Part II: Quadratic convergence Under the same assumptions on F : D lr n lr n as in Theorem 3.5 suppose that the initial guess x 0 D satisfies h δ 0 < ρ (1.147 for some appropriate ρ > 0 and control the inner iterations such that δ k ρ 2 h δ k h δ k +. ( (h δ k 2 Then, there holds: (i The Newton CGNE iterates x k, k ln 0 stay in L 0 and converge quadratically to some x L 0 with F (x = 0. (ii The exact Newton increments and the inexact Newton increments decrease quadratically according to F (x k+1 1/2 x k ρ 2 ω F (x k 1/2 x k 2, (1.149 F (x k+1 1/2 δx k ρ 2 ω F (x k 1/2 δx k 2. (1.150
43 Num. Meth. Large-Scale Nonlinear Systems 43 Proof. Using (1.148 in (1.145 yields F (x k+1 1/2 x k+1 F (x k 1/2 x k hδ k + δ k (h δ k (h δ k δ 2 k 1 2 (1 + ρ hδ k, which proves (1.149 in view of h δ k h k h 0 < 2Θ. The proof of (1.150 follows along the same line by using (1.148 in ( Algorithmic aspects of the affine conjugate inexact Newton PCG method (i Convergence monitor Let us assume that the quantity Θ < 1 in both the linear convergence mode and the quadratic convergence mode has been specified and let us further assume that we use the startiterate δx k 0 = 0 in the inner PCG iteration. Denoting by δ k an easily computable estimate of the relative energy norm iteration error δ k, we accept a new iterate x k+1, if the condition f(x k+1 f(x k 1 10 ε k = 1 10 (1 + δ2 kε δ k (1.151 or the monotonicity test Θ k := ( εk+1 2 k+1 ε δ k+1 1/2 ( (1 + δ = ε k (1 + δ 2 k ε δ k 1/2 Θ < 1 (1.152 is satisfied. We consider the outer iteration as divergent, if neither (1.151 nor (1.152 hold true. (ii Termination criterion With respect to a user specified accuracy ETOL, the inexact Newton PCG iteration will be terminated, if either or ε k = (1 + δ 2 k ε δ k ETOL 2. (1.153 f(x k f(x k ETOL2. (1.154 (iii Balancing outer and inner iterations For k = 0, we choose δ 0 = δ 0 = 1 4. As in case of the inexact Newton CGNE iteration, for k 1 we begin with the
44 Num. Meth. Large-Scale Nonlinear Systems 44 quadratic convergence mode and switch to the linear convergence mode as soon as the approximate contraction factor Θ k is below some prespecified threshold value Θ 1 2. (iii 1 Quadratic convergence mode A computationally realizable termination criterion for the inner PCG iteration in the quadratic convergence mode is given by δ k ρ [h δ k ] [h δ k ] +, ( [h δ k ]2 where [h δ k ] is an appropriate a priori estimate of the inexact Kantorovich quantity h δ k. In view of (1.145, we have the a posteriori estimates [h δ k] 2 := 10 ε δ k f(x k+1 f(x k εδ k (1.156 and [h k ] 2 := 1 + δ 2 k [h δ k] 2. (1.157 We note that (1.157 yields the a priori estimate [h k ] := Θ k 1 [h k 1 ] 2. (1.158 Using (1.158 in (1.157, for the inexact Kantorovich quantity we obtain the following a priori estimate [h δ k] := [h k ] 1 + δ 2 k. (1.159 Inserting (1.159 into (1.155, we obtain a simple nonlinear equation in δ k. Remark 3.5 Computational work in the quadratic convergence mode Since δ k 0 (k is enforced, it follows that: The more the iterates x k approach the solution x, the more computational work is required for the inner iterations to guarantee quadratic convergence of the outer iteration. (iii 2 Linear convergence mode We switch to the linear convergence mode, if Θ k < Θ (1.160
45 Num. Meth. Large-Scale Nonlinear Systems 45 is satisfied. The computationally realizable termination criterion for the inner iteration in the linear convergence mode is Since asymptotically there holds [ϑ(h δ k, δ k ] := ϑ([h δ k], δ k Θ. (1.161 δ k Θ 1 Θ 2 (k, we observe: Remark 3.6 Computational work in the linear convergence mode The more the iterates x k approach the solution x, the less computational work is required for the inner iterations to guarantee linear convergence of the outer iteration.
46 Num. Meth. Large-Scale Nonlinear Systems Quasi-Newton Methods 4.1 Introduction Given F : D lr n lr n as well as x k, x k+1 D, x k x k+1, the idea is to approximate F locally around x k+1 by an affine function such that S k+1 (x := F (x k+1 + J k+1 (x x k+1, J k+1 lr n n, (1.162 S k+1 (x k = F (x k. (1.163 The requirement (1.163 gives rise to the so-called secant condition J (x k+1 x k = F (x k+1 F (x k. (1.164 =: δx k =: y k The matrix J is not uniquely determined by (1.164, since where dim S k+1 = (n 1n, (1.165 S k+1 := {J lr n n Jδx k = y k }. (1.166 There are different criteria to select an appropriate J S k The Good Broyden rank 1 update Let us consider the change in the affine model as given by S k+1 (x S k (x = (J k+1 J k (x x k. (1.167 An appropriate idea is to choose J k+1 S k+1 such that there is a least change in the affine model in the sense J k+1 J k F = min J S k+1 J J k F, (1.168 where F stands for the Frobenius norm (observe J = (J ik n i,k=1 J F := ( n 1/2 Jik 2. (1.169 i,k=1
47 Num. Meth. Large-Scale Nonlinear Systems 47 The solution of (1.169 can be heuristically motivated as follows: Choose t k δx k such that Then, (1.167 reads x x k = αδx k + t k. S k+1 (x S k (x = α(j k+1 J k δx k = α(y k J k δx k Now, choose J k+1 S k+1 such that It follows that (J k+1 J k t k = 0. + (J k+1 J k t k. (1.170 rank (J k+1 J k = 1, J k+1 J k = v k (δx k T. (1.171 Inserting (1.171 into (1.170 yields which results in α v k (δx k T δx k = α (y k J k δx k, v k = yk J k δx k (δx k T δx k. Altogether, this gives us Broyden s rank 1 update (Good Broyden J k+1 = J k + [ ] F (x k+1 F (x k J k δx k (δx k T (δx k T δx k. (1.172 For the solution of nonlinear systems, we are more interested in updates of the inverse of J k. Such an update can be provided by the Sherman-Morrison- Woodbury formula Setting (A + uv T 1 = A 1 A 1 uv T A v T A 1 u. (1.173 A := J k, u := F (x k+1 F (x k J k δx k, v := (δxk T (δx k T δx k, we obtain J 1 k+1 = J 1 k + [ δx k J 1 ] k (F (xk+1 F (x k (δx k T J 1 k [ ]. (1.174 F (x k+1 F (x k (δx k T J 1 k
48 Num. Meth. Large-Scale Nonlinear Systems The Bad Broyden rank 1 update Instead of (1.168, an alternative to choose J k+1 S k+1 such that there is a least change in the solution of the affine model, i.e., J 1 k+1 J 1 k F = min J S k+1 J 1 J 1 k F. (1.175 Similar considerations as before lead us to the Broyden s alternative rank 1 update (Bad Broyden J 1 k+1 = J 1 k + [ ( ]( T δx k J 1 k F (x k+1 F (x k F (x k+1 F (x k ( T (.(1.176 F (x k+1 F (x k F (x k+1 F (x k 4.2 Affine covariant Quasi-Newton method Affine covariant Quasi-Newton convergence theory Affine covariant Quasi-Newton methods require the secant condition (1.164 to be stated by means of affine covariant terms in the domain of definition of the nonlinear mapping F. Observing that we compute the Quasi-Newton increment δx k as the solution of we can rewrite (1.164 according to J k δx k = F (x k, (1.177 (J k Jδx k = F (x k+1. Multiplication by J 1 k yields the affine covariant secant condition δx k+1 := (I J 1 k }{{ J δx k } =: E k (J = J 1 k F (xk+1. (1.178 we note that any rank 1 update of the form J k+1 = J k ( I δxk+1 v T v T δx k satisfies the affine covariant secant condition ( In particular, for v = δx k we recover the Good Broyden., v lr n \ {0} (1.179
49 Num. Meth. Large-Scale Nonlinear Systems 49 Theorem 4.1 Properties of the affine covariant Quasi-Newton method For Broyden s affine covariant rank 1 update (Good Broyden J k+1 = J k ( I δxk+1 (δx k T δx k 2 (1.180 assume that the local contraction condition Θ k = δxk+1 δx k < 1 2 (1.181 is satisfied. Then, there holds: (i The update matrix J k+1 is a least change update in the sense that E k (J k+1 E k (J, J S k+1, (1.182 E k (J k+1 Θ k. (1.183 (ii If J k is regular, then J k+1 is regular as well with the inverse given by J 1 k+1 = (I + δx k+1 (δx k T J 1 (1 α k+1 δx k 2 k, (1.184 where α k+1 = (δxk T δx k+1 δx k 2 < 1 2. (iii The Quasi-Newton increment δx k+1 is given by δx k+1 = J 1 k+1 F (xk+1 = δxk+1 1 α k+1. (1.185 (iv The Quasi-Newton increments decrease according to δx k+1 δx k Θ k 1 α k+1 < 1. (1.186 Proof. In view of (1.178 we have E k (J k+1 = δxk+1 (δx k T δx k 2 = E k (J δxk (δx k T δx k 2 E k (J,
50 Num. Meth. Large-Scale Nonlinear Systems 50 which proves ( Moreover, (1.183 follows readily from E k (J k+1 = δxk+1 (δx k T δx k 2 δxk+1 δx k = Θ k. The same argument shows and hence, (1.186 follows from α k+1 Θ k < 1 2, δx k+1 δx k = Θ k 1 α k+1 Θ k 1 Θ k < 1. Finally, the proofs of (ii and (iii are direct consequences of the Sherman- Morrison-Woodbury formula ( Theorem 4.2 Convergence of the affine covariant Quasi-Newton method Suppose that that F : D lr n lr n, D lr n convex, is continuously differentiable on D. Let x D be the unique solution of F (x = 0 in D with invertible Jacobian F (x. Assume that the following affine covariant Lipschitz condition is satisfied F (x (F 1 (x F (x v ω x x v, (1.187 where x, x + v D, v lr n. For some 0 < Θ < 1 assume further that: (a (b The initial approximate Jacobian J 0 satisfies δ 0 := F (x 1 (J 0 F (x 0 < The initial guess x 0 D satisfies t 0 Then, there holds: (i := ω x 0 x + 1 Θ 2 Θ Θ 1 + Θ. (1.188 ( Θ 1 + Θ δ 0. (1.189 The Quasi-Newton iterates x k, k ln 0 converge to x according to x k+1 x < Θ x k x, (1.190
University of Houston, Department of Mathematics Numerical Analysis, Fall 2005
3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider
More informationLocal Inexact Newton Multilevel FEM for Nonlinear Elliptic Problems
Konrad-Zuse-Zentrum für Informationstechnik Berlin Heilbronner Str. 10, D-10711 Berlin-Wilmersdorf Peter Deuflhard Martin Weiser Local Inexact Newton Multilevel FEM for Nonlinear Elliptic Problems Preprint
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationQuasi-Newton methods for minimization
Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1
More information17 Solution of Nonlinear Systems
17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m
More informationM.A. Botchev. September 5, 2014
Rome-Moscow school of Matrix Methods and Applied Linear Algebra 2014 A short introduction to Krylov subspaces for linear systems, matrix functions and inexact Newton methods. Plan and exercises. M.A. Botchev
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)
More informationNumerical solutions of nonlinear systems of equations
Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points
More informationIterative Methods for Solving A x = b
Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationChapter 7 Iterative Techniques in Matrix Algebra
Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition
More informationNORMS ON SPACE OF MATRICES
NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationComputational Linear Algebra
Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 4: Iterative Methods PD
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationSearch Directions for Unconstrained Optimization
8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All
More informationTopics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems
Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate
More informationConjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)
Conjugate gradient method Descent method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps
More informationAffine covariant Semi-smooth Newton in function space
Affine covariant Semi-smooth Newton in function space Anton Schiela March 14, 2018 These are lecture notes of my talks given for the Winter School Modern Methods in Nonsmooth Optimization that was held
More informationAn improved convergence theorem for the Newton method under relaxed continuity assumptions
An improved convergence theorem for the Newton method under relaxed continuity assumptions Andrei Dubin ITEP, 117218, BCheremushinsaya 25, Moscow, Russia Abstract In the framewor of the majorization technique,
More informationBindel, Spring 2016 Numerical Analysis (CS 4220) Notes for
Life beyond Newton Notes for 2016-04-08 Newton s method has many attractive properties, particularly when we combine it with a globalization strategy. Unfortunately, Newton steps are not cheap. At each
More informationSummary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method
Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method Leslie Foster 11-5-2012 We will discuss the FOM (full orthogonalization method), CG,
More informationOptimization and Root Finding. Kurt Hornik
Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding
More informationThe Newton-ADI Method for Large-Scale Algebraic Riccati Equations. Peter Benner.
The Newton-ADI Method for Large-Scale Algebraic Riccati Equations Mathematik in Industrie und Technik Fakultät für Mathematik Peter Benner benner@mathematik.tu-chemnitz.de Sonderforschungsbereich 393 S
More informationITERATIVE METHODS BASED ON KRYLOV SUBSPACES
ITERATIVE METHODS BASED ON KRYLOV SUBSPACES LONG CHEN We shall present iterative methods for solving linear algebraic equation Au = b based on Krylov subspaces We derive conjugate gradient (CG) method
More informationNonlinear equations. Norms for R n. Convergence orders for iterative methods
Nonlinear equations Norms for R n Assume that X is a vector space. A norm is a mapping X R with x such that for all x, y X, α R x = = x = αx = α x x + y x + y We define the following norms on the vector
More informationc 2007 Society for Industrial and Applied Mathematics
SIAM J. OPTIM. Vol. 18, No. 1, pp. 106 13 c 007 Society for Industrial and Applied Mathematics APPROXIMATE GAUSS NEWTON METHODS FOR NONLINEAR LEAST SQUARES PROBLEMS S. GRATTON, A. S. LAWLESS, AND N. K.
More information1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0
Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large
More informationMaria Cameron. f(x) = 1 n
Maria Cameron 1. Local algorithms for solving nonlinear equations Here we discuss local methods for nonlinear equations r(x) =. These methods are Newton, inexact Newton and quasi-newton. We will show that
More informationA short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering
A short course on: Preconditioned Krylov subspace methods Yousef Saad University of Minnesota Dept. of Computer Science and Engineering Universite du Littoral, Jan 19-3, 25 Outline Part 1 Introd., discretization
More informationMATH 4211/6211 Optimization Quasi-Newton Method
MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationTermination criteria for inexact fixed point methods
Termination criteria for inexact fixed point methods Philipp Birken 1 October 1, 2013 1 Institute of Mathematics, University of Kassel, Heinrich-Plett-Str. 40, D-34132 Kassel, Germany Department of Mathematics/Computer
More informationOn fast trust region methods for quadratic models with linear constraints. M.J.D. Powell
DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many
More informationAlgorithms for Constrained Optimization
1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic
More information5 Quasi-Newton Methods
Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min
More informationLecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University
Lecture Note 7: Iterative methods for solving linear systems Xiaoqun Zhang Shanghai Jiao Tong University Last updated: December 24, 2014 1.1 Review on linear algebra Norms of vectors and matrices vector
More informationX. Linearization and Newton s Method
163 X. Linearization and Newton s Method ** linearization ** X, Y nls s, f : G X Y. Given y Y, find z G s.t. fz = y. Since there is no assumption about f being linear, we might as well assume that y =.
More informationBasic Concepts of Adaptive Finite Element Methods for Elliptic Boundary Value Problems
Basic Concepts of Adaptive Finite lement Methods for lliptic Boundary Value Problems Ronald H.W. Hoppe 1,2 1 Department of Mathematics, University of Houston 2 Institute of Mathematics, University of Augsburg
More informationThe Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More informationNewton Method with Adaptive Step-Size for Under-Determined Systems of Equations
Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations Boris T. Polyak Andrey A. Tremba V.A. Trapeznikov Institute of Control Sciences RAS, Moscow, Russia Profsoyuznaya, 65, 117997
More informationMatrix Secant Methods
Equation Solving g(x) = 0 Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). 3700 years ago the Babylonians used the secant method in 1D:
More informationApplied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.
Applied Analysis APPM 44: Final exam 1:3pm 4:pm, Dec. 14, 29. Closed books. Problem 1: 2p Set I = [, 1]. Prove that there is a continuous function u on I such that 1 ux 1 x sin ut 2 dt = cosx, x I. Define
More informationLinear Solvers. Andrew Hazel
Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction
More informationChapter 3. Differentiable Mappings. 1. Differentiable Mappings
Chapter 3 Differentiable Mappings 1 Differentiable Mappings Let V and W be two linear spaces over IR A mapping L from V to W is called a linear mapping if L(u + v) = Lu + Lv for all u, v V and L(λv) =
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationChapter 1 Foundations of Elliptic Boundary Value Problems 1.1 Euler equations of variational problems
Chapter 1 Foundations of Elliptic Boundary Value Problems 1.1 Euler equations of variational problems Elliptic boundary value problems often occur as the Euler equations of variational problems the latter
More informationFunctional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...
Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................
More informationQuasi-Newton Methods
Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications
More informationSolutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.
Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright. John L. Weatherwax July 7, 2010 wax@alum.mit.edu 1 Chapter 5 (Conjugate Gradient Methods) Notes
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationDELFT UNIVERSITY OF TECHNOLOGY
DELFT UNIVERSITY OF TECHNOLOGY REPORT 11-14 On the convergence of inexact Newton methods R. Idema, D.J.P. Lahaye, and C. Vuik ISSN 1389-6520 Reports of the Department of Applied Mathematical Analysis Delft
More informationQuasi-Newton Methods
Quasi-Newton Methods Werner C. Rheinboldt These are excerpts of material relating to the boos [OR00 and [Rhe98 and of write-ups prepared for courses held at the University of Pittsburgh. Some further references
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationSome definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts
Some definitions Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization M. M. Sussman sussmanm@math.pitt.edu Office Hours: MW 1:45PM-2:45PM, Thack 622 A matrix A is SPD (Symmetric
More informationFEM and sparse linear system solving
FEM & sparse linear system solving, Lecture 9, Nov 19, 2017 1/36 Lecture 9, Nov 17, 2017: Krylov space methods http://people.inf.ethz.ch/arbenz/fem17 Peter Arbenz Computer Science Department, ETH Zürich
More informationAn analysis for the DIIS acceleration method used in quantum chemistry calculations
An analysis for the DIIS acceleration method used in quantum chemistry calculations Thorsten Rohwedder and Reinhold Schneider Abstract. This work features an analysis for the acceleration technique DIIS
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 20 1 / 20 Overview
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationSuppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.
Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of
More information1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν
1 Search Directions In this chapter we again focus on the unconstrained optimization problem P min f(x), x R n where f : R n R is assumed to be twice continuously differentiable, and consider the selection
More informationExercise Solutions to Functional Analysis
Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n
More informationStatic unconstrained optimization
Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R
More informationHYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract
HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION by Darin Griffin Mohr An Abstract Of a thesis submitted in partial fulfillment of the requirements for the Doctor of
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method The minimization problem We are given a symmetric positive definite matrix R n n and a right hand side vector b R n We want to solve the linear system Find u R n such that
More informationIterative Methods for Linear Systems of Equations
Iterative Methods for Linear Systems of Equations Projection methods (3) ITMAN PhD-course DTU 20-10-08 till 24-10-08 Martin van Gijzen 1 Delft University of Technology Overview day 4 Bi-Lanczos method
More informationChapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices
Chapter 7 Iterative methods for large sparse linear systems In this chapter we revisit the problem of solving linear systems of equations, but now in the context of large sparse systems. The price to pay
More informationPreconditioned inverse iteration and shift-invert Arnoldi method
Preconditioned inverse iteration and shift-invert Arnoldi method Melina Freitag Department of Mathematical Sciences University of Bath CSC Seminar Max-Planck-Institute for Dynamics of Complex Technical
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationIterative Solution of a Matrix Riccati Equation Arising in Stochastic Control
Iterative Solution of a Matrix Riccati Equation Arising in Stochastic Control Chun-Hua Guo Dedicated to Peter Lancaster on the occasion of his 70th birthday We consider iterative methods for finding the
More informationInexact Newton Methods Applied to Under Determined Systems. Joseph P. Simonis. A Dissertation. Submitted to the Faculty
Inexact Newton Methods Applied to Under Determined Systems by Joseph P. Simonis A Dissertation Submitted to the Faculty of WORCESTER POLYTECHNIC INSTITUTE in Partial Fulfillment of the Requirements for
More informationSimple Iteration, cont d
Jim Lambers MAT 772 Fall Semester 2010-11 Lecture 2 Notes These notes correspond to Section 1.2 in the text. Simple Iteration, cont d In general, nonlinear equations cannot be solved in a finite sequence
More informationLecture 3: Inexact inverse iteration with preconditioning
Lecture 3: Department of Mathematical Sciences CLAPDE, Durham, July 2008 Joint work with M. Freitag (Bath), and M. Robbé & M. Sadkane (Brest) 1 Introduction 2 Preconditioned GMRES for Inverse Power Method
More informationSECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS
SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss
More informationLecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.
Lecture 11: CMSC 878R/AMSC698R Iterative Methods An introduction Outline Direct Solution of Linear Systems Inverse, LU decomposition, Cholesky, SVD, etc. Iterative methods for linear systems Why? Matrix
More informationComputational Linear Algebra
Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 3: Iterative Methods PD
More informationA derivative-free nonmonotone line search and its application to the spectral residual method
IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral
More informationLevenberg-Marquardt methods based on probabilistic gradient models and inexact subproblem solution, with application to data assimilation
Levenberg-Marquardt methods based on probabilistic gradient models and inexact subproblem solution, with application to data assimilation E. Bergou S. Gratton L. N. Vicente June 26, 204 Abstract The Levenberg-Marquardt
More informationNumerical Methods for Differential Equations Mathematical and Computational Tools
Numerical Methods for Differential Equations Mathematical and Computational Tools Gustaf Söderlind Numerical Analysis, Lund University Contents V4.16 Part 1. Vector norms, matrix norms and logarithmic
More informationA nonlinear equation is any equation of the form. f(x) = 0. A nonlinear equation can have any number of solutions (finite, countable, uncountable)
Nonlinear equations Definition A nonlinear equation is any equation of the form where f is a nonlinear function. Nonlinear equations x 2 + x + 1 = 0 (f : R R) f(x) = 0 (x cos y, 2y sin x) = (0, 0) (f :
More information1 Directional Derivatives and Differentiability
Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=
More information1 Conjugate gradients
Notes for 2016-11-18 1 Conjugate gradients We now turn to the method of conjugate gradients (CG), perhaps the best known of the Krylov subspace solvers. The CG iteration can be characterized as the iteration
More informationSOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA
1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization
More informationTHE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS
THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS RALPH HOWARD DEPARTMENT OF MATHEMATICS UNIVERSITY OF SOUTH CAROLINA COLUMBIA, S.C. 29208, USA HOWARD@MATH.SC.EDU Abstract. This is an edited version of a
More informationAn Iteratively Regularized Projection Method with Quadratic Convergence for Nonlinear Ill-posed Problems
Int. Journal of Math. Analysis, Vol. 4, 1, no. 45, 11-8 An Iteratively Regularized Projection Method with Quadratic Convergence for Nonlinear Ill-posed Problems Santhosh George Department of Mathematical
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationStability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games
Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,
More informationSTOP, a i+ 1 is the desired root. )f(a i) > 0. Else If f(a i+ 1. Set a i+1 = a i+ 1 and b i+1 = b Else Set a i+1 = a i and b i+1 = a i+ 1
53 17. Lecture 17 Nonlinear Equations Essentially, the only way that one can solve nonlinear equations is by iteration. The quadratic formula enables one to compute the roots of p(x) = 0 when p P. Formulas
More informationTrust-Region SQP Methods with Inexact Linear System Solves for Large-Scale Optimization
Trust-Region SQP Methods with Inexact Linear System Solves for Large-Scale Optimization Denis Ridzal Department of Computational and Applied Mathematics Rice University, Houston, Texas dridzal@caam.rice.edu
More informationNumerical Methods I Solving Nonlinear Equations
Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)
More informationQuasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb
More informationApplied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give
More informationIterative methods for Linear System
Iterative methods for Linear System JASS 2009 Student: Rishi Patil Advisor: Prof. Thomas Huckle Outline Basics: Matrices and their properties Eigenvalues, Condition Number Iterative Methods Direct and
More informationLevenberg-Marquardt methods based on probabilistic gradient models and inexact subproblem solution, with application to data assimilation
Levenberg-Marquardt methods based on probabilistic gradient models and inexact subproblem solution, with application to data assimilation E. Bergou S. Gratton L. N. Vicente May 24, 206 Abstract The Levenberg-Marquardt
More informationChapter 8 Gradient Methods
Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point
More informationSelf-Concordant Barrier Functions for Convex Optimization
Appendix F Self-Concordant Barrier Functions for Convex Optimization F.1 Introduction In this Appendix we present a framework for developing polynomial-time algorithms for the solution of convex optimization
More informationA full-newton step infeasible interior-point algorithm for linear programming based on a kernel function
A full-newton step infeasible interior-point algorithm for linear programming based on a kernel function Zhongyi Liu, Wenyu Sun Abstract This paper proposes an infeasible interior-point algorithm with
More informationMathematics Department Stanford University Math 61CM/DM Inner products
Mathematics Department Stanford University Math 61CM/DM Inner products Recall the definition of an inner product space; see Appendix A.8 of the textbook. Definition 1 An inner product space V is a vector
More information