Krylov subspace projection methods

I.1.(a) Krylov subspace projection methods

Orthogonal projection technique : framework Let A be an n n complex matrix and K be an m-dimensional subspace of C n. An orthogonal projection technique seeks an approximate eigenpair ( λ,ũ) with λ C and ũ K. This approximate eigenpair is obtained by imposing the following Galerkin condition: Aũ λũ K, (1) or, equivalently, v H (Aũ λũ) = 0, v K. (2) In matrix form, assume that an orthonormal basis {v 1,v 2,...,v k } of K is available. Denote V = (v 1, v 2,...,v k ), and let ũ = V y. Then, the condition (2) becomes v H j (AV y λv y) = 0, j = 1,...,k.

Therefore, y and λ must satisfy where B k y = λy, (3) B k = V H AV. Each eigenvalue λ i of B k is called a Ritz value, and V y i is called Ritz vector, where y i is the eigenvector of B k associated with λ i.

Rayleigh-Ritz procedure - orthogonal projection 1. Compute an orthonormal basis {v i } i=1,...,k of the subspace K. Let V = (v 1, v 2,...,v k ). 2. Compute B k = V H AV. 3. Compute the eigenvalues of B k and select k 0 desired ones: λ i, i = 1, 2,...,k 0, where k 0 k. 4. Compute the eigenvectors y i,i = 1,...,k 0, of B k associated with λ i,i = 1,...,k 0, and the corresponding approximate eigenvectors of A, ũ i = V y i, i = 1,...,k 0.

Oblique projection technique : framework Select two subspaces L and K and then seek an approximate eigenpair ( λ, ũ) with λ C and ũ K that satisfies the Petrov-Galerkin condition: v H (Aũ λũ) = 0, v L. (4) In matrix form, let V denote the basis for the subspace K and W for L. Then, writing ũ = V y, the Petrov-Galerkin condition (4) yields the reduced eigenvalue problem B k y = λc k y, where B k = W H AV and C k = W H V.

If C k = V H V = I, then the two bases are called biorthonormal. In order for a biorthonormal pair V and W to exist the following additional assumption for L and K must hold. For any two bases V and W of K and L, respectively, det(w H V ) 0. (5)

Rayleigh-Ritz procedure - oblique projection 1. Compute an orthonormal bases {v i } i=1,...,k of the subspace K. and {w i } i=1,...,k of the subspace L. Let V = (v 1, v 2,...,v k ) and W = (w 1, w 2,...,w k ). 2. Compute B k = W H AV and C k = W H V. 3. Compute the eigenvalues of B k λc k and select k 0 desired ones: λ i, i = 1, 2,...,k 0, where k 0 k. 4. Compute the right and left eigenvectors y i and z i, i = 1,...,k 0, of B k λc k associated with λ i, i = 1,...,k 0, and the corresponding approximate right and left eigenvectors of A, ũ i = V y i, and ṽ i = Wz i, i = 1,...,k 0.

Optimality Let Q = (Q k, Q u ) be an n-by-n orthogonal matrix, where Q k is n-by-k, and Q u is n-by-(n k), and span(q k ) = K. Then [ Q T = Q T T AQ = k AQ k Q T k AQ ] [ ] u Tk T Q T uaq k Q T uk uaq u T ku T u The Ritz values and Ritz vectors are considered optimal approximations to the eigenvalues and eigenvectors of A from the selected subsapce K = span(q k ) as justified by the follows. Theorem. min S,k k AQ k Q k S 2 = AQ k Q k T k 2

Krylov subspace K k+1 (A, u 0 ) = span{u 0,Au 0, A 2 u 0,...,A k u 0 } = {q(a)u 0 q P k }, where P k is the set of all polynomial of degree less than k + 1. Properties of K k+1 (A,u 0 ): 1. K k (A, u 0 ) K k+1 (A, u 0 ). AK k (A,u 0 ) K k+1 (A, u 0 ). 2. If σ 0, K k (A,u 0 ) = K k (σa, u 0 ) = K k (A, σu 0 ). 3. For any scalar κ, K k (A,u 0 ) = K k (A κi,u 0 ). 4. If W is nonsingular, K k (W 1 AW, W 1 u 0 ) = W 1 K k (A, u 0 ).

Arnoldi decomposition An explicit Krylov basis {u 0,Au 0, A 2 u 0,...,A k u 0 } is not suitable for numerical computing. It is extremely ill-conditioned. Therefore, our first task is to replace a Krylov basis with a better conditioned basis, say an orthonormal basis. Theorem. Let the columns of K j+1 = ( u 0 Au 0... A j u 0 ) be linearly independent. Let K j+1 = U j+1 R j+1 (6) be the QR factorization of K j+1. Then there is a (j + 1) j unreduced upper Hessenberg matrix Ĥj such that AU j = U j+1 Ĥ j. (7) Conversely, if U j+1 is orthonormal and satisfies (7), then span(u j+1 ) = span{u 0,Au 0,...,A j u 0 }. (8)

Proof: Partitioning the QR decomposition (6), we have ( ) ( ) ( ) Kj A j R u 0 = Uj u j r j+1 j+1, 0 r j+1,j+1 where K j = U j R j is the QR decomposition of K j. Then or AK j = AU j R j ( ) 0 AU j = AK j Rj 1 = K j+1 Rj 1 It is easy to verify that ( ) 0 Ĥ j = R j+1 Rj 1 ( ) 0 = U j+1 R j+1 Rj 1. is a (j + 1) j unreduced upper Hessenberg matrix. Therefore we complete the proof of (7). Conversely, suppose that U j+1 satisfies (7), then by induction, we can prove the identity (8).

: Arnoldi decomposition: by partitioning, ( ) Hj Ĥ j = h j+1,j e T, j the decomposition (7) can be written as follows: AU j = U j H j + h j+1,j u j+1 e T j. (9) We call (9) an Arnoldi decomposition of order j. The decomposition (7) is a compact form.

Arnoldi procedure By the Arnoldi decomposition (9), we deduce the following process to generate an orthogonormal basis {v 1,v 2,...,v m } of the Krylov subspace K m (A, v): Arnoldi Process: 1. v 1 = v/ v 2 2. for j = 1, 2,...,k 3. compute w = Av j 4. for i = 1, 2,...,j 5. h ij = vi T w 6. w = w h ij v i 7. end for 8. h j+1,j = w 2 9. If h j+1,j = 0, stop 10. v j+1 = w j /h j+1,j 11. endfor

Remarks: 1. The matrix A is only referenced via the matrix-vector multiplication Av j. Therefore, it is ideal for large scale matrices. Any sparsity or structure of a matrix can be exploited. 2. The main storage requirement is (m + 1)n for storing Arnoldi vectors {v i } 3. the cost of arithmetic is m matrix-vector products plus 2m 2 n for the rest. It is common that the matrix-vector multiplication is the dominant cost. 4. The Arnoldi procedure breaks down when h j+1,j = 0 for some j. It is easy to see that if the Arnoldi procedure breaks down at step j (i.e. h j+1,j = 0), then K j = span(v j ) is invariant subspace of A. 5. Some care must be taken to insure that the vectors v j remain orthogonal to working accuracy in the presence of rounding error. The usual technique is reorthogonalization.

Arnoldi decomposition Denote and H k = V k = ( v 1 v 2... v k ) h 11 h 12 h 1,k 1 h 1k h 21 h 22 h 2,k 1 h 2k h 32... h 3,k 1 h 3k...... h k,k 1 h kk The Arnoldi process can be expressed in the following governing relations: and AV k = V k H k + h k+1,k v k+1 e T k (10) V H k V k = I and V H k v k+1 = 0.

The decomposition is uniquely determined by the starting vector v (the implicit Q-Theorem). Since V H k v k+1 = 0, we have H k = V T k AV k. Let µ be an eigenvalue of H k and y be a corresponding eigenvector y, i.e., H k y = µy, y 2 = 1. Then the corresponding Ritz pair is (µ, V k y). Applying y to the right hand side of (10), the residual vector for (µ, V k y) is given by A(V k y) µ(v k y) = h k+1,k v k+1 (e T ky).

Using the backward error interpretation, we know that (µ, V k y) is an exact eigenpair of A + E: where (A + E)(V k y) = µ(v k y), E 2 = h k+1,k e T ky. This gives us a criterion of whether to accept the Ritz pair (µ, V k y) as an accurate approximate eigenpair of A.

Arnoldi method = RR + Arnoldi 1. Choose a starting vector v; 2. Generate the Arnoldi decomposition of length k by the Arnoldi process; 3. Compute the Ritz pairs and decide which ones are acceptable; 4. If necessary, increase k and repeat.

An example A = sprandn(100,100,0.1) and v = (1, 1,...,1) T. + are the eigenvalues of matrix A are the eigenvalues of the upper Hessenberg matrix H 30

4 Eigenvalues of A in "+" and H 30 in "o" 3 2 1 Imaginary Part 0 1 2 3 4 4 3 2 1 0 1 2 3 4 Real Part Observation: exterior eigenvalues converge first, a typical convergence phenomenon.

The need of restarting The algorithm has two nice aspects: 1. H k is already in the Hessenberg form, so we can immediately apply the QR algorithm to find its eigenvalues. 2. After we increase k to, say k +p, we only have to orthogonalize p vectors to compute the (k + p)th Arnoldi decomposition. The work already completed previously is not thrown away. Unfortunately, the algorithm has its drawbacks, too: 1. If A is large, we cannot increase k indefinitely, since V k requires nk memory locations to store. 2. We have little control over which eigenpairs the algorithm finds.

Implicit restarting Goal: purge the unwanted eigenvalues µ from H k. 1. Exact arithmetic case: By one step of the QR algorithm with shift µ, we have R = U H (H µi) = upper triangular Note that H µi is singular, hence R must have a zero on its diagonal. Because H is unreduced, then r nn = 0. Furthermore, note that U = P 12 P 23 P n 1,n, where P i,i+1 is a rotation in the (i,i + 1)-plane. Consequently, U is Hessenberg: ( ) U U = u u k,k 1 e T k 1 u. k,k Hence H = RU + µi = ( Ĥ ĥ 0 µ ) = U H HU.

In other words, one step of the shifted QR has found the eigenvalue µ exactly and has deflated the problem. 2. Finite precision arithmetic case: In the presence of rounding error, after one step of the shifted QR, we have ( ) Ĥ = Û H Ĥ HÛ = ĥ ĥ k,k 1 e T k 1 µ. From the Arnoldi decomposition, we have Partition AV k Û = V k Û(ÛT H k Û) + h k+1,k v k+1 e T kû. V k = V k Û = ( Vk 1 v k ) Then A ( V k 1 v k ) = ( V k 1 v k ) ( Ĥ ĥ ĥ k,k 1 e T k 1 µ ) +h k+1,k v k+1 (ûk,k 1 e T k 1 û

From the first k 1 columns of this partition, we get A V k 1 = V k 1 Ĥ + fe T k 1, (11) where f = ĥk,k 1 v k + h k+1,k û k,k 1 v k+1. Note that Ĥ is Hessenberg. f is orthogonal to V k 1. Hence (11) is an Arnoldi decomposition of length k 1. The process may be repeated to remove other unwanted values from H.

The symmetric Lanczos procedure Observation: in the Arnoldi decomposition, if A is symmetric, then the upper Hessenberg matrix H j is symmetric tridiagonal. The following is a simplified process to compute an orthonormal basis of a Krylov subspace: Lanczos process: 1 q 1 = v/ v 2, β 0 = 0; q 0 = 0; 2 for j = 1 to k, do 3 w = Aq j ; 4 α j = q T j w; 5 w = w α j q j β j 1 q j 1 ; 6 β j = w 2 ; 7 if β j = 0, quit; 8 q j+1 = w/β j ; 9 endfor

The symmetric Lanczos algorithm : governing equation Denote and T k = Q k = ( q 1 q 2... q k ) α 1 β 1 β 1 α 2 β 2............ α k 1 β k 1 = tridiag(β j, α j, β j+1 ), β k 1 α k the k-step Lanczos process yields AQ k = Q k T k + f k e T k, f k = β k q k+1 (12) and Q T k Q k = I and Q T k q k+1 = 0.

Let T k y = µy, y 2 = 1. Then A(Q k y) = Q k T k y + f k (e T ky) = µ(q k y) + f k (e T ky). Here µ is a Ritz value, and Q k y is the corresponding Ritz vector.

Error bound Lemma. Let H be symmetric, and Hz µz = r and z 0. Then min λ µ r 2. λ λ(h) z 2 Proof: Let H = UΛU T be the eigen-decomposition of H. Then (H µi)z = r U(Λ µi)u T z = r (Λ µi)(u T z) = U T r. Notice that Λ µi is diagonal. Thus as expected. r 2 = U T r 2 = (Λ µi)(u T z) 2 min λ µ λ λ(h) UT z 2 = min λ µ z 2, λ λ(h)

Error bound If f k (e T ky) = 0 for some k, then the associated Ritz value µ is an eigenvalue of A with the corresponding eigenvector Q k y. Let r 2 = f k (e T k y) 2, then by the lemma, we know that for the Ritz pair (µ, Q k y), there is an eigenvalue λ of A, such that λ µ f k(e T k y) 2 Q k y 2

Lanczos method = RR + Lanczos Simple Lanczos Algorithm: 1. q 1 = v/ v 2, β 0 = 0, q 0 = 0; 2. for j = 1 to k do 3. w = Aq j ; 4. α j = qj Tw; 5. w = w α j q j β j 1 q j 1 ; 6. β j = w 2 ; 7. if β j = 0, quit; 8. q j+1 = w/β j ; 9. Compute eigenvalues and eigenvectors of T j 10. Test for convergence 11. endfor

Example A = a random diagonal matrix A of order n = 1000 v = (1, 1,...,1) T Convergence behavior: 30 steps of Lanczos (full reorthogonalization) applied to A 3 2 1 Eigenvalues 0 1 2 3 0 5 10 15 20 25 30 Lanczos step

We observe that 1. Extreme eigenvalues, i.e., the largest and smallest ones, converge first, and the interior eigenvalues converge last. 2. Convergence is monotonic, with the ith largest (smallest) eigenvalues of T k increasing (decreasing) to the ith laregst (smallest) eigenvalue of A. = Convergence analysis

Thick Restarting Selects two indices l and u to indicate those Ritz values to be kept at both ends of spectrum: keep discard keep θ 1 θ 2 θ l θ u θ m The corresponding kept Ritz vectors are denoted by Q k = [ q 1, q 2,..., q k ] = Q m Y k, (13) where k = l + (m u + 1), (14) Y k = [y 1,y 2,...,y l, y u, y u+1,...,y m ], (15) and y i is the eigenvector of T m corresponding to θ i.

Sets these Ritz vectors Q k as the first k basis vectors at the restart and q k+1 = q m+1. To compute the (k + 2)th basis vector q k+2, TRLan computes A q k+1 and orthonormalizes it against the previous k + 1 basis vectors. That is, β k+1 q k+2 = A q k+1 Q k ( Q H k A q k+1 ) q k+1 ( q H k+1a q k+1 ). Note that A Q k satisfies the relation: A Q k = Q k D k + β m q k+1 s H, where D k is the k k diagonal matrix whose diagonal elements are the kept Ritz values, and s = Y H k e m. Thus, the coefficients Q H k A q k+1 can be computed efficiently: Q H k A q k+1 = (A Q k ) H q k+1 = ( Q k D k + β m q k+1 s H ) H q k+1 = D k Yk H (Q H mq m+1 ) + β m s( q k+1 q H k+1 ) = β m s.

Then after the first iteration after the restart, we have where A Q k+1 = Q k+1 Tk+1 + β k+1 q k+2 e H k+i, T k+1 = [ Dk β m s β m s H α k+1 In general, at the ith iteration after the restart, the new basis vector q k+i+1 satisfies the relation: A Q k+i = Q k+i T k+i + β k+i q k+i+1 e H k+i, where T k+i = Q H k+i A Q k+i is of the form D k β m s β m s H α k+1 βk+1 β T k+i = k+1 α k+2 βk+2......... β k+i 2 α k+i 1 βk+i 1 β k+i 1 α k+i ].

Note that the three-term recurrence is not valid only for computing the (k + 2)th basis vector and is resumed afterward.