Krylov Subspace Methods for the Evaluation of Matrix Functions. Applications and Algorithms

Krylov Subspace Methods for the Evaluation of Matrix Functions. Applications and Algorithms 4. Monotonicity of the Lanczos Method Michael Eiermann Institut für Numerische Mathematik und Optimierung Technische Universität Bergakademie Freiberg, Germany Wintersemester 2/ Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 / 24

Outline An observation 2 A first result 3 Strict monotonicity 4 M-matrices 5 Special functions, Stieltjes functions 6 The main theorem Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 2 / 24

An observation We solve our model problem (the -D heat equation) whose semi-discrete version reads as u (t) = Au(t), t >, u() = b given, where A = h 2 tridiag(, 2, ) R n n, h = /(n + ). Its solution is u(t) = exp(ta)b. First step: Hermitian Lanczos process Given A C n n Hermitian, b C n, f such that f (A) is defined. w = b, v = For m =, 2,... β m = w ( := 2 ) v m = w/β m w = Av m β mv m α m = v H mw w = w α mv m End Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 3 / 24

The columns of V m = [v v 2 v m ] are an ON basis of K m (A, b) and the tridiagonal matrix 2 3 α β 2 β 2 α 2 β 3 β 3 α 3 T m =. R m m.. 6 7 4 5 α m β m β m α m represents the compression of A onto K m (A, b), i.e., T m = V H m AV m. Note that T m is real, α m = v H m(av m β m v m ) = v H mav m [λ min (A), λ max (A)] because A is Hermitian. Second step: Lanczos approximation to f (A)b, f m = β V m exp(t m )e = V m exp(v H m AV m )V H m b. We use expm (scaling and squaring) to calculate exp(t m ). Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 4 / 24

The model problem 2 4 6 exp(a)b f m exp(a)b f m 8 2 2 4 6 8 n = 99, b = rand(n, ): We observe monotone convergence. Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 5 / 24

A first result Theorem [Druskin (28)]. Let A C n n be Hermitian and b C n. For the Lanczos approximants f m, m =, 2,..., L, to f = exp(a)b, there holds f f 2 f L = f, f f f f 2 f f L =. Proof. First assume that A is positive definite. Then T m O (entrywise). This implies O T m := T m. T m. β m β m α m = T m. Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 6 / 24

Thus O T k m = [ T k m ] Tm k for all k =,, 2,... Since exp(t ) = I + T + 2 T 2 + + k! T k +, I m exp( T m ) = [ exp(tm ) ] exp(t m ). In particular e exp( T m )e = [ exp(tm )e ] exp(t m )e. Finally, exp(t m )e exp(t m )e and, since V m has orthonormal columns and β >, f m = β V m exp(t m )e = β exp(t m )e β exp(t m )e = β V m exp(t m )e = f m. Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 7 / 24

The monotonicity of the errors follows immediately: We have [ ] [ ] exp(tm )e exp(tm )e exp(t L )e, which implies exp(t L )e exp(t L )e [ exp(tm )e ] [ exp(tm )e exp(t L )e ] and thus exp(t L)e [ exp(tm )e ] exp(t L)e [ exp(tm )e ]. The assertion now follows from the observation f f m = β V L exp(t L )e V m f (T m )e [ f (Tm )e = β V L exp(t L )e V L ] [ f (Tm )e = β exp(t L )e ]. Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 8 / 24

If A is an arbitrary Hermitian matrix we choose a shift µ such that B = A + µi is positive definite. The Arnoldi approximations f (B) m to are given by f (B) m = β V (B) m exp ( T (B) m (easy exercise). This shows f (A) m exp(b)b = exp(µ) exp(a)b ) ( ) e = β V (A) m exp T (A) m + µi e = exp(µ)f (A) m = exp( µ)f (B) m, exp(a)b f (A) m = exp( µ) ( exp(b)b f (B) m ) which proves the theorem. Note we showed more than we claimed: We have not only normwise but componentwise (with respect to the basis V L ) monotonicity. Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 9 / 24

Strict monotonicity The monotoncity results described in the previous theorem can be sharpened: (exercise!). < f < f 2 < < f L = f, f f > f f 2 > > f f L = Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 / 24

M-matrices T = [t i, j ] R m m is a (nonsingular) M-matrix (Hermann Minkowski) if t i, j for all i j, T exists and T O. We need the following properties of M-matrices. Let A R n n have nonpositive off-diagonal entries. Then A is an M-matrix all eigenvalues of A have positive real parts. (M ) If A, B R n n are two M-matrices, then A B O B A. (M 2 ) For A, E R n n, let A be an M-matrix and let A + E have nonpositive off-diagonal entries, then E O A + E is an M-matrix. (M 3 ) Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 / 24

Special functions We consider functions f : (, ) R which can be represented as f (z) = dµ(t), z >. (t + z) k Here k N and µ is a nonnegative measure for which t k dµ(t) is finite. Example. Let δ x denote the Dirac measure (i.e., δ x (M) = if x M and δ x (M) = otherwise). Then j= z k = (t + z) k dδ (t). More generally, for x j >, π j >, (j =, 2,..., m), m π j (z + x j ) k = m (t + z) k d π j δ xj (t). j= Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 2 / 24

Stieltjes integrals and Stieltjes transformation Let [α, β] be a real finite closed interval and ψ : [α, β] R. Let : α = τ < τ < < τ m = β a subdivision of [α, β] with norm := max j m (τ j τ j ). A set of pivotal elements, Θ : τ < τ 2 < < τ m, consistent with consists of numbers τ j with τ j τ j τ j (j =, 2,..., m). For any (complex valued) function f defined on [α, β], set m S(, Θ) := f (τ j )(ψ(τ j) ψ(τ j )). j= If there is a complex number S such that, given any ε >, a number δ = ε(δ) exists such that S(, Θ) S ε for all subdivisions with δ and all consistent Θ, then S = β α f (t) dψ(t) is called the Stieltjes integral of f with respect to ψ on [α, β]. Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 3 / 24

If ψ(t) = t + γ for some constant γ the Stieltjes integral is the Riemann integral. If ψ is continuously differentiable on [α, β], then β α f (t) dψ(t) = β α f (t)ψ (t) dt. If ψ is a step function with finitely many jumps at ζ, ζ 2,..., ζ m, i.e.,, α t ζ, ψ(t) = k j= π j, ζ k < t ζ k+, m j= π j, ζ m < t β, then β α f (t) dψ(t) = m π j f (ζ j ). j= Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 4 / 24

If f is continuous and ψ is nondecreasing on [α, β] then β α f (t) dψ(t) exists. If f is continuous and ψ is nondecreasing on [α, ] we set α f (t) dψ(t) = lim β β α f (t) dψ(t) provided the limit exists. If f is continuous and bounded on [α, ] and if ψ is nondecreasing and bounded on [α, ] then α f (t) dψ(t) exists. Let ψ : [, ) R be nondecreasing and bounded. We call ζ > a point of increase of ψ if ψ is not constant on any interval [ζ ε, ζ + ε], ε >. Case. ψ has finitely many points of increase. Then ψ is a step function with finitely many jumps ζ j, j =, 2,..., m (namely at the points of increase). There holds m z + t dψ(t) = j= ψ(ζ j +) ψ(ζ j ) z + ζ j =: r(z) r a rational function with simple poles on the negative real axis and positive residues. Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 5 / 24

Moreover, (i) r is analytic in, (ii) r(x) for x, (iii) r(u) L and r(l) U, where U and L denote the upper and lower half-plane, respectively. Functions satisfying (i) (iii) are called positive symmetric rational functions. Every symmetric rational functions r of type (m, m) and (m, m) can be written as r(z) = α + z + t dψ(t) with α and a nondecreasing function ψ : [, ) R which has finitely many points of increase. Case 2. ψ has infinitely many points of increase. Then f (z) = z + t dψ(t) exists for all z C \ (, ) and is an analytic function there. f is called the Stieltjes transform of ψ. Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 6 / 24

f (z) = log( + z ) when ψ(t) = t if t and ψ(t) = for t. f (z) = arctan(/ z)/ z when ψ(t) = t if t and ψ(t) = for t. f (z) = z α, α (, ) when ψ(t) = sin(( α)π) π t α. f (z) = z α ( + z) β, < α, α + β <. If ψ is the distribution function of the measure µ, i.e., ψ(x) = µ([, x]) = x dµ(t), and if w(t) is the associated density function, then (under suitable conditions) z + t dµ(t) = z + t dψ(t) = w(t) z + t dt. Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 7 / 24

The main theorem Theorem [Frommer]. Let A C n n be Hermitian positive definite and b C n. Assume that the function f : (, ) R can be written as f (z) = dµ(t), z >, (t + z) k with a nonnegative measure µ and k N. For the Lanczos approximants f m to f (A)b and the resulting errors d m = f (A)b f m, there holds: { f m } m L is monotonically increasing. { d m } m L is monotonically decreasing. Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 8 / 24

Proof. Step. For the matrix S m = diag(,,..., ( ) m ) R m m, there holds: S T m = S m and S 2 m = I m, i.e., S m = S T m = S m, The columns of V m S m = [v v 2 ( ) m+ v m ] =: V ± m form an ON basis of K m (A, b). T ± m := S m T m S m = α β 2 β 2 α 2 β 3 β 3 α 3... α m β m β m has nonpositive off-diagonal entries. If A and therefore T m as well as T ± m are positive definite, then T ± m and T ± m + ti m, t, are M-matrices ((M ) and (M 3 )). Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 9 / 24 α m

Step 2. We can write the Lanczos approximants f m in the form f m = β V m f (T m )e = β V m S m f (S m T m S m )S m e = β V ± m f (T ± m )e. Consequently, f m = y m, where y m := β f (T ± m )e. For the special functions f which we consider here, there holds y m = β (ti m + T m ± ) k e dµ(t) Step 3. We define T ± m := [ T ± m α m ]. Then ti m + T ± m ti m + T ± m for all t. By (M 3 ) ti m + T ± m is an M-matrix for all t and O (ti m + T ± m ) (ti m + T ± m ) for all T by (M 2 ). Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 2 / 24

Thus, for every t and as well as O But this is just O (ti m + T ± m ) k (ti m + T ± m ) k (ti m + T ± m ) k dµ(t) (ti m + T ± m ) k e dµ(t) [ y m ] y m which is equivalent to f m f m. (ti m + T ± m ) k dµ(t) (ti m + T ± m ) k e dµ(t). Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 2 / 24

Step 4. The monotonicity of the errors d m = f (A)b β V m f (T m )e = V L y L V m y m follows from d m = V L (y L [ y m ]). Remark. For the Dirac measure µ = δ there holds δ (M) = if M and δ (M) = if M, (t + z) k dµ(t) = z k. For k = this means that the errors of the CG method decrease monotonically wrt 2 (for a different proof, see [Steihaug (983)]). Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 22 / 24

An extension. We can apply the monotonicity results to functions of the form g(z) = f (z)p(z), where f is as above and p is a polynomial (of low degree). We write g(a)b = f (A) b with b = p(a)b and apply the Lanczos method in the Krylov spaces K m (A, b). E.g., sign(a)b = (A 2 ) /2 Ab, which suggests to approximate B /2 b with B = A 2 (Hermitian positive definite if A is Hermitian) and b = Ab, i.e., we work in K m (A 2, Ab). Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 23 / 24

Hints to the literature C. Berg. Quelques remarques sur le cône de Stieltjes. Lecture Notes in Mathematics 84, Springer, Berlin, Heidelberg 984. A. Berman and R. J. Plemmons. Nonnegative Matrices in the Mathematical Sciences. Academic Press, New York 979. Updated edition, Classics in Applied Mathematics Vol. 9, SIAM, Philadelphia 994. V. Druskin. On monotonicity of the Lanczos approximation to the matrix exponential. Linear Algebra Appl. 429, 679 683 (28). A. Frommer. Monotone convergence of the Lanczos approximations to matrix functions of Hermitian matrices. Electron. Trans. Numer. Anal. 35, 8 28 (29). T. Fujimoto and R. R. Ranade. Two characterizations of inverse-positive matrices: the Hawkins-Simon condition and the Le Chatelier-Braun principle. Electron. J. Linear Algebra, 59 65 (24). P. Henrici. Applied and Computational Complex Analysis. Vol. 2: Special Functions Integral Transforms Asymptotics Continued Fractions. Jon Wiley & Sons, New York 977. T. Steihaug. The conjugate gradient method and trust regions in large scale optimization. SIAM J. Numer. Anal. 2, 626 637 (983). Michael Eiermann (TU Freiberg) Matrix Functions WS 2/2 24 / 24