Nonlinear equations. Norms for R n. Convergence orders for iterative methods

Nonlinear equations Norms for R n Assume that X is a vector space. A norm is a mapping X R with x such that for all x, y X, α R x = = x = αx = α x x + y x + y We define the following norms on the vector space R n : x 1 = x 1 + + x n x 2 = ( x 1 2 + + x n 2) 1/2 x = max { x 1,..., x n } A matrix A R n n corresponds to a linear mapping from R n to R n. If x denotes a vector norm for x R n we can define the matrix norm A as follows: Ax A = sup x R n x = sup Ax x R n x =1 Note that S = {x R n x = 1} is a compact set, hence there exists a point x S with Ax = sup x R n Ax, x =1 and we can write max instead of sup. Lemma 1. For E R n n with E < 1 the matrix I + E is invertible and (I + E) 1 1 1 E. Proof: x = (1 + E)x Ex (1 + E)x + E x, hence (I + E)x (1 E ) x. Convergence orders for iterative methods We want to approximate x R n. An iterative method gives a sequence x, x 1, x 2, x 3,.... We say the method converges if looing at quotients lim x = x. We can specify more precisely how quicly it converges by x +1 x x x α for α 1. This leads to so-called q-orders of convergence (where q stands for quotient): the method converges with q-order 1 or q-linearly if there exists C < 1 such that for all x +1 x C x x the method converges with q-order 2 or q-quadratically if lim x = x and there exists C such that for all x +1 x C x x 2 the method converges q-superlinearly if x +1 x lim x x = Example: We consider a nonlinear equation f(x) = where x D R. Assume 1

f(x ) = the derivatives f and f exist around x and are continuous f (x ) Then there exists ɛ > such that for x x < ɛ the Newton method converges q-quadratically for x x < ɛ and x 1 x < ɛ the secant method converges with q-order α = 5+1 2 1.618. Hence it converges q-superlinearly. In many cases we cannot prove that the error improves for every single step from to + 1, but we still have upper bounds x x E where E converges with a certain q-order to zero. This leads to so-called r-orders of convergence: a method converges with r-order 1 or r-linearly if x x E where E converges with q-order 1 to a method converges with r-order 2 or r-quadratically if x x E where E converges with q-order 2 to a method converges r-superlinearly if x x E where E converges with superlinearly to Example: We consider a nonlinear equation f(x) = where f is continuous on the interval [a, b ] f(a ) f(b ) < Then the bisection method gives a sequence of intervals [a, b ]. Let c = (a + b )/2. Then the sequence c converges r-linearly to a point x with f(x ) = : c x E = b a 2 +1 Note that in general we do not have q-linear convergence of c for the bisection method. Derivatives Let X and Y denote normed vector spaces. Consider a mapping f from D X to Y. We say that f is Frechet differentiable at a point x D if there exists a linear mapping A: X Y such that f(x) f(x ) A(x x ) x x x x Lipschitz conditions We say a function f satisfies a Lipschitz condition with constant L on D if f(x) f(y) y x for all x, y D Lemma 2. Assume D is convex f(x) exists and is continuous on D f(x) L for x D Then f satisfies a Lipschitz condition on D with constant L on D. 2

Taylor remainder terms For a function f : D R with D R we have estimates ( ) f(x + h) f(x) max f (u) h u conv(x,y) if f C 1 (D) ( ) 1 f(x + h) f(x) f (x)h max f (u) u conv(x,y) 2 h 2 if f C 2 (D). We would lie to have analogous estimates for a function f : D R n with D R n. Assume thatf C 1 (D). Then the derivative exists in the Frechet sense: f(x + h) = f(x) + Df(x) h + R, R = o( h ) for h where Df(x): R n R n is a linear mapping h Df(x) h. Since linear mappings correspond to matrices in R n n we use the notation f 1 (x) 1 f 1 (x) n f 1 (x) f(x) =., Df(x) = f (x) =.. f n (x) 1 f n (x) n f n (x) y = Df(x) h y i = j f i (x)h j j=1 y i j f i (x) h, y max j=1 Df(x) = max i=1,...,n i=1,...,n j f i (x) h j=1 j f i (x) with i = x i. Assume that D is convex and x, y D. Let h := y x and conv(x, y) D denote the line segment betwen x and y. We have by the fundamental theorem of calculus and the chain rule for j=1 F (t) := f(x + th), F (t) = Df(x + th) h f(y) f(x) = F (1) F () = f(y) f(x) F (t)dt = f (x + th)h dt ( ) f (x + th h dt max f (u) y x. u conv(x,y) This shows that the Lipschitz constant for f in D is bounded by max u D f (u). Note that we first choose a vector norm, e.g., v = max j=1,...,n v j. For A R n n this induces a matrix norm: A = max Av = max A ij. v =1 i f (x) = max f (x)v = max j f i. v =1 i We can read this as an estimate for the Taylor remainder term of order 1: f(y) = f(x) + R 1, R 1. Now we want to consider the next Taylor term and remainder: f(y) = f(x) + f (x)(y x) + R 2 : R 2 = f(y) f(x) f (x)(y x) = Assume that f satisfies a Lipschitz condition for x, y D: j j [f (x + t(y x)) f (x)] (y x) dt. f (y) f (x) γ y x for x, y D, 3

then we obtain R 2 γt y x 2 dt = γ 2 y x 2. Now we assume that f C 2 (D): For a fixed u R n let G(x) := Df(x) u. Then DG(x) v = D 2 f(x) u, v where y = D 2 f(x) u, v y i = ( j f i ) u j v y i j,=1 Now we have for any u R n and h = y x Df(y) Df(x) u j,=1 j f i u v, y max D 2 f(x) = max i=1,...,n i=1,...,n j,=1 j,=1 j f i. G(t) := Df(x + th) u, G (t) = D 2 f(x + th) u, h [Df(y) Df(x)] u = G(1) G() = D 2 f(x + th) u h dt, G (t)dt = Local convergence results for fixed point iteration Assume that D is open so that x D implies that a ball B δ D. j f i u v D 2 f(x + th) u, h dt ( Df(y) Df(x) max z D D 2 f(z) ) y x. Theorem 3. Assume g : D R n with D R n, and g(x ) = x for x D. If g C 1 (D) and g (x ) < 1 then the fixed point iteration x +1 := g(x ) is locally convergent: There exists δ > so that for x x < δ there holds lim x = x. Proof: Since g is continuous there exists δ >, q < 1 so that g (x) q for x B δ := {x x x < δ}. Assume that x B δ. Then x +1 x = g(x ) g(x ) ( ) max g (u) x x q x x. u B δ Hence x +1 B δ. Then x B δ implies x x q x x and therefore lim x = x. Remar: For a chosen vector norm (e.g., ) and the induced matrix norm one may have g (x ) > 1, but for a different vector norm and the induced matrix norm one could still have g (x ) > 1. The spectral radius ρ(a) := max{ λ j (A) } denotes the maximum absolute values of eigenvalues of A and satisfies [see e.g. Stoer-Bulirsch Theorem (6.9.2)] ρ(a) A for any choice of vector norm and induced matrix norm for any ɛ > there exists a vector norm so that its induced matrix norm satisfies A ρ(a) + ɛ. Therefore we can replace the condition g (x ) < 1 by ρ(g (x )) < 1 and the conclusion of the theorem still holds. Theorem: Assume g : D R n with D R n, and g(x ) = x for x D. If g (x ) = and g is Lipschitz in D then the fixed point iteration x +1 := g(x ) is locally convergent of order 2: There exists δ > so that for x x < δ there holds lim x = x and x +1 x C x x 2. Proof: Assume that we have f (y) f (x) γ y x for x, y D. x +1 x = g(x ) g(x ) g (x )(x x ) γ x x 2. 2 4

Newton method We have local convergence of order 2 as long as f(x ) is a nonsingular matrix: Theorem 4. Assume that f(x ) = and f exists and satisfies a Lipschitz condition in a neighborhood of x f(x ) is nonsingular Then there exists δ > such that for an initial guess x with x x δ the Newton iterates x exist for = 1, 2, 3,... since f(x 1 ) is nonsingular lim x = x the convergence is of q-order 2 Note that we have with d = x +1 x = f(x ) 1 f(x ) and using f(x )d = f(x ) that implying f(x +1 ) = f(x +1 ) f(x +1 ) f(x ) = f(x +1 ) = f(x + t d )d dt [ f(x + t d ) f(x )] d dt f(x + t d ) f(x ) d dt L }{{} 2 d 2 Lt d [ f(x + t d ) f(x )] f(x ) 1 f(x )dt = f(x +1 ) = Newton-Kantorovich theorem The Newton-Kantorovich theorem does not assume that a solution x with f(x ) exists. Theorem 5. Assume [ f(x + t d ) f(x ) 1 I ] f(x )dt f(x ) C, f(x ) 1 C1 f(x) is Lipschitz with constant L for x x ρ where Then h := C C 2 1L 1 2, ρ := 1 1 2h C 1 L there exists a unique solution x B ρ (x ) with f(x ) = the Newton iterates x are well-defined and converge to x the convergence is r-quadratic for h < 1 2, and r-linear for h = 1 2. 5

In preparation for the proof we consider the Newton method for the quadratic equation f(t) = t 2 2t + 2h = : t t 1 t 2 t 3 t t + Lemma 6. Assume ω and h 1 2 and define ω, h for = 1, 2,... by Then ω +1 := ω h 2, h +1 := 1 h 2(1 h ) 2 (1) j= h = 1 1 2h ω ω and the sum converges q-quadratically for h < 1 2, and q-linearly for h = 1 2. Proof: It is sufficient to consider the case ω = 1. Consider the quadratic equation f(t) = t 2 2t + 2h = which has the two roots t ± = 1 ± 1 2h with < t < 1 < t + for h < 1 2 and < t = t + = 1 for h = 1 2. We define the sequence t using the Newton method with t =. We have t < t +1 < t for all, this implies that lim = t. In the case of h < 1 2 we have quadratic convergence, in the case of h = 1 2 we have linear convergence. Note that for a quadratic function f(t) = t 2 + a 1 t + a and its tangent line p(t) at t = c we have f(t) p(t) = (t c) 2 (since it must be a quadratic function with leading coefficient 1 and minimum at t = c). Hence for c = t we have by the definition of the Newton method p(t +1 ) = and (t +1 t ) 2 = f(t +1 ) p(t +1 ) = f(t +1 ) for all so that We now want to show that for =, 1, 2,... we have We use induction: For = we have t =, and 1 t = 1 = ω 1 as ω = 1. t +1 t = f(t ) f (t ) = (t t 1 ) 2 = 1 (t t 1 ) 2. (2) 2t 2 2 1 t t +1 t = ω 1 h, 1 t = ω 1 (3) t 1 t = f()/f () = 2h /( 2) = h = ω h 6

Induction step: Assume that (3) holds. We have ω 1 (D) +1 = ω 1 (1 h ) (I) = ω 1 which is the second equation in (3) with + 1 instead of. We get from (2) that (t +1 t ) (I) = (1 t ) (t +1 t ) = 1 t +1 t +2 t +1 = 1 (t +1 t ) 2 = 1 (h /ω ) 2 2 1 t +1 2 ω 1 = ω 1 1 h 2 +1 2 (ω +1 /ω +1 ) 2 = 1 h 2 ω 1 +1 2 (1 h ) 2 = ω 1 +1 h +1 Note that N = h ω = N (t +1 t ) = t N+1 t = 1 1 2h as N = We use the following notation for open and closed balls: B r (x ) := {x x x < r}, B r (x ) := {x x x r} The Newton method uses the iteration x +1 = x + d with d := f (x ) 1 f(x ) Assumption: There are constants α, ω > such that h := αω 1 2 and f (x ) 1 f(x ) α (4) f (x ) 1 (f (y) f (x)) ω y x for x, y D (5) Then: B ρ (x ) D with ρ := 1 1 2h ω there exists a unique solution x B ρ (x ) with f(x ) = the Newton iterates x are well-defined and converge to x the convergence is r-quadratic for h < 1 2, and r-linear for h = 1 2. Induction: We claim that for =, 1, 2,...: (B ) : (A ) : ω d h (6) f (x ) 1 (f (y) f (x))) ω y x for x, y D (7) where ω +1 := x B ρ (x ) (8) ω h 2, h +1 := 1 h 2(1 h ) 2 (9) Proof: For = the statements (6), (7) are just the assumptions (4), (5). Induction step: Assume that (6), (7), (8) and we have to prove the corresponding statements for + 1. We first prove (8): From (A ) and the Lemma we obtain that x +1 x h j d j = ρ. (1) ω j j= j= 7

We then prove (6) for + 1: Let M := f (x ) 1 f (x +1 ), then f (x +1 ) 1 = M 1 f (x ) 1 and since f (x +1 ) 1 (f (y) f (x)) M 1 f (x ) 1 (f (y) f (x)) (B ) M 1 ω y x 1 ω y x = ω +1 y x 1 h M = I + E, E := f (x ) 1 (f (x +1 ) f (x )), E (B ) ω d (A ) h M 1 = (I + E) 1 1 1 E 1 1 h We now prove (6) for + 1: With x +1 = x + d and f(x ) = f (x )d we have d +1 = f (x +1 ) 1 f(x +1 ) = f (x +1 ) 1 [f(x +1 ) f(x ) + f(x )] = d +1 (B +1) ω +1 t d d dt = 1 2 ω +1 h 2 ω 2 f (x +1 ) 1 [f (x + td ) f (x )] d dt h 2 ω +1 d +1 1 2 (ω /ω +1 ) 2 = 1 2 (1 h ) 2 = h +1. This concludes the induction proof. We now obtain from (1) that x is a Cauchy sequence and therefore has a limit x B ρ (x ) D. By taing the limit in f (x )(x +1 x ) = f(x ) we obtain by the continuity of f and f that f(x ) =. We sip the proof of the uniqueness. h 2 8