Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Solution of eigenvalue problems Introduction motivation Projection methods for eigenvalue problems Subspace iteration, The symmetric Lanczos algorithm Nonsymmetric Lanczos procedure; Implicit restarts Harmonic Ritz values, Jacobi-Davidson s method

Origins of Eigenvalue Problems Structural Engineering [Ku = λmu] Electronic structure calculations [Schrödinger equation..] Stability analysis [e.g., electrical networks, mechanical system,..] Bifurcation analysis [e.g., in fluid flow] Large sparse eigenvalue problems are among the most demanding calculations (in terms of CPU time) in scientific computing. 17-2 eig1

New application in information technology Search engines (google) rank web-sites in order to improve searches The google toolbar on some browsers (http://toolbar.google.com) - gives a measure of relevance of a page. The problem can be formulated as a Markov chain Seek the dominant eigenvector Algorithm used: power method For details see: http://www.iprcom.com/papers/pagerank/index.html 17-3 eig1

The Problem We consider the eigenvalue problem Ax = λx or Ax = λbx Typically: B is symmetric (semi) positive definite, A is symmetric or nonsymmetric Requirements vary: Compute a few λ i s with smallest or largest real parts; Compute all λ i s in a certain region of C; Compute a few of the dominant eigenvalues; Compute all λ i s. 17-4 eig1

Types of problems * Standard Hermitian (or symmetric real) Ax = λx, A H = A * Standard non-hermitian Ax = λx, A H A * Generalized Ax = λbx Several distinct sub-cases (B SPD, B SSPD, B singular with large null space, both A and B singular, etc..) * Quadratic (A + λb + λ 2 C)x = 0 * Nonlinear A(λ)x = 0 17-5 eig1

General Tools for Solving Large Eigen-Problems Projection techniques Arnoldi, Lanczos, Subspace Iteration; Preconditionings: shift-and-invert, Polynomials,... Deflation and restarting techniques Good computational codes combine these 3 ingredients 17-6 eig1

A few popular solution Methods Subspace Iteration [Now less popular sometimes used for validation] Arnoldi s method (or Lanczos) with polynomial acceleration [Stiefel 58, Rutishauser 62, YS 84, 85, Sorensen 89,...] Shift-and-invert and other preconditioners. [Use Arnoldi or Lanczos for (A σi) 1.] Davidson s method and variants, Generalized Davidosn s method [Morgan and Scott, 89], Jacobi-Davidson 17-7 eig1

Projection Methods for Eigenvalue Problems General formulation: Projection method onto K orthogonal to L Given: Two subspaces K and L of same dimension. Find: λ, ũ such that λ C, ũ K; ( λi A)ũ L Two types of methods: Orthogonal projection methods: situation when L = K. Oblique projection methods: When L K. 17-8 eig1

Rayleigh-Ritz projection Given: a subspace X known to contain good approximations to eigenvectors of A. Question: How to extract good approximations to eigenvalues/ eigenvectors from this subspace? Answer: Rayleigh Ritz process. Let Q = [q 1,..., q m ] an orthonormal basis of X. Then write an approximation in the form ũ = Qy and obtain y by writing Q H (A λi)ũ = 0 Q H AQy = λy 17-9 eig1

Procedure: 1. Obtain an orthonormal basis of X 2. Compute C = Q H AQ (an m m matrix) 3. Obtain Schur factorization of C, C = Y RY H 4. Compute Ũ = QY Property: if X is (exactly) invariant, then procedure will yield exact eigenvalues and eigenvectors. Proof: Since X is invariant, (A λi)u = Qz for a certain z. Q H Qz = 0 implies z = 0 and therefore (A λi)u = 0. Can use this procedure in conjunction with the subspace obtained from subspace iteration algorithm 17-10 eig1

Subspace Iteration Original idea: projection technique onto a subspace if the form Y = A k X In practice: Replace A k by suitable polynomial [Chebyshev] Advantages: Disadvantage: Slow. Easy to implement (in symmetric case); Easy to analyze; Often used with polynomial acceleration: A k X replaced by C k (A)X. Typically C k = Chebyshev polynomial. 17-11 eig1

Algorithm: Subspace Iteration with Projection 1. Start: Choose an initial system of vectors X = [x 0,..., x m ] and an initial polynomial C k. 2. Iterate: Until convergence do: (a) Compute Ẑ = C k (A)X old. (b) Orthonormalize Ẑ into Z. (c) Compute B = Z H AZ and use the QR algorithm to compute the Schur vectors Y = [y 1,..., y m ] of B. (d) Compute X new = ZY. (e) Test for convergence. If satisfied stop. Else select a new polynomial C k and continue.

THEOREM: Let S 0 = span{x 1, x 2,..., x m } and assume that S 0 is such that the vectors {P x i } i=1,...,m are linearly independent where P is the spectral projector associated with λ 1,..., λ m. Let P k the orthogonal projector onto the subspace S k = span{x k }. Then for each eigenvector u i of A, i = 1,..., m, there exists a unique vector s i in the subspace S 0 such that P s i = u i. Moreover, the following inequality is satisfied ( λ m+1 (I P k )u i 2 u i s i 2 λ i where ɛ k tends to zero as k tends to infinity. k k) + ɛ, (1) 17-13 eig1

Krylov subspace methods Principle: Projection methods on Krylov subspaces, i.e., on K m (A, v 1 ) = span{v 1, Av 1,, A m 1 v 1 } probably the most important class of projection methods [for linear systems and for eigenvalue problems] many variants exist depending on the subspace L. Properties of K m. Let µ = deg. of minimal polynom. of v. Then, K m = {p(a)v p = polynomial of degree m 1} K m = K µ for all m µ. Moreover, K µ is invariant under A. dim(k m ) = m iff µ m. 17-14 eig1

Arnoldi s Algorithm Goal: to compute an orthogonal basis of K m. Input: Initial vector v 1, with v 1 2 = 1 and m. ALGORITHM : 1 Arnoldi s procedure For j = 1,..., m do Compute w := Av j For i = 1,..., j, do { hi,j := (w, v i ) w := w h i,j v i h j+1,j = w 2 ; v j+1 = w/h j+1,j End 17-15 eig1

Result of Arnoldi s algorithm Let H m = x x x x x x x x x x x x x x x x x x x x ; H m = H m (1 : m, 1 : m) 1. V m = [v 1, v 2,..., v m ] orthonormal basis of K m. 2. AV m = V m+1 H m = V m H m + h m+1,m v m+1 e T m 3. V T m AV m = H m H m last row. 17-16 eig1

Appliaction to eigenvalue problems Write approximate eigenvector as ũ = V m y + Galerkin condition (A λi)v m y K m V H m (A λi)v m y = 0 Approximate eigenvalues are eigenvalues of H m H m y j = λ j y j Associated approximate eigenvectors are ũ j = V m y j Typically a few of the outermost eigenvalues will converge first. 17-17 eig1

Restarted Arnoldi In practice: Memory requirement of algorithm implies restarting is necessary Restarted Arnoldi for computing rightmost eigenpair: ALGORITHM : 2 Restarted Arnoldi 1. Start: Choose an initial vector v 1 and a dimension m. 2. Iterate: Perform m steps of Arnoldi s algorithm. 3. Restart: Compute the approximate eigenvector u (m) 1 4. associated with the rightmost eigenvalue λ (m) 1. 5. If satisfied stop, else set v 1 u (m) 1 and goto 2. 17-18 eig1

Example: Small Markov Chain matrix [ Mark(10), dimension = 55]. Restarted Arnoldi procedure for computing the eigenvector associated with the eigenvalue with algebraically largest real part. We use m = 10. m R(λ) I(λ) Res. Norm 10 0.9987435899D+00 0.0 0.246D-01 20 0.9999523324D+00 0.0 0.144D-02 30 0.1000000368D+01 0.0 0.221D-04 40 0.1000000025D+01 0.0 0.508D-06 50 0.9999999996D+00 0.0 0.138D-07 17-19 eig1

Restarted Arnoldi (cont.) Can be generalized to more than *one* eigenvector : p v (new) 1 = ρ i u (m) i i=1 However: often does not work well (hard to find good coefficients ρ i s) Alternative : compute eigenvectors (actually Schur vectors) one at a time. Implicit deflation. 17-20 eig1

Deflation Very useful in practice. Different forms: locking (subspace iteration), selective orthogonalization (Lanczos), Schur deflation,... A little background Consider Schur canonical form A = URU H where U is a (complex) upper triangular matrix. Vector columns u 1,..., u n called Schur vectors. Note: Schur vectors depend on each other, and on the order of the eigenvalues

Wiedlandt Deflation: Assume we have computed a right eigenpair λ 1, u 1. Wielandt deflation considers eigenvalues of Note: A 1 = A σu 1 v H Λ(A 1 ) = {λ 1 σ, λ 2,..., λ n } Wielandt deflation preserves u 1 as an eigenvector as well all the left eigenvectors not associated with λ 1. An interesting choice for v is to take simply v = u 1. In this case Wielandt deflation preserves Schur vectors as well. Can apply above procedure successively. 17-22 eig1

ALGORITHM : 3 Explicit Deflation 1. A 0 = A 2. For j = 0... µ 1 Do: 3. Compute a dominant eigenvector of A j 4. Define A j+1 = A j σ j u j u H j 5. End Computed u 1, u 2.,.. form a set of Schur vectors for A. Alternative: implicit deflation (within a procedure such as Arnoldi). 17-23 eig1

Deflated Arnoldi When first eigenvector converges, put it in 1st column of V m = [v 1, v 2,..., v m ]. Arnoldi will now start at column 2, orthogonaling still against v 1,..., v j at step j. Accumulate each new converged eigenvector in columns 2, 3,... [ locked set of eigenvectors.] [ ] active {}}{ Thus, for k = 2: V m = v } 1 {{, v } 2, v 3,..., v m Locked H m =

Similar techniques in Subspace iteration [G. Stewart s SRRIT] Example: Matrix Mark(10) small Markov chain matrix (N = 55). First eigenpair by iterative Arnoldi with m = 10. m Re(λ) Im(λ) Res. Norm 10 0.9987435899D+00 0.0 0.246D-01 20 0.9999523324D+00 0.0 0.144D-02 30 0.1000000368D+01 0.0 0.221D-04 40 0.1000000025D+01 0.0 0.508D-06 50 0.9999999996D+00 0.0 0.138D-07 17-25 eig1

Computing the next 2 eigenvalues of Mark(10). Eig. Mat-Vec s Re(λ) Im(λ) Res. Norm 2 60 0.9370509474 0.0 0.870D-03 69 0.9371549617 0.0 0.175D-04 78 0.9371501442 0.0 0.313D-06 87 0.9371501564 0.0 0.490D-08 3 96 0.8112247133 0.0 0.210D-02 104 0.8097553450 0.0 0.538D-03 112 0.8096419483.. 0.0. 0.874D-04..... 152 0.8095717167 0.0 0.444D-07 17-26 eig1

Hermitian case: The Lanczos Algorithm The Hessenberg matrix becomes tridiagonal : A = A H and V H m AV m = H m H m = H H m We can write H m = α 1 β 2 β 2 α 2 β 3 β 3 α 3 β 4...... β m α m (2) Consequence: three term recurrence β j+1 v j+1 = Av j α j v j β j v j 1 17-27 eig1

ALGORITHM : 4 Lanczos 1. Choose v 1 of norm unity. Set β 1 0, v 0 0 2. For j = 1, 2,..., m Do: 3. w j := Av j β j v j 1 4. α j := (w j, v j ) 5. w j := w j α j v j 6. β j+1 := w j 2. If β j+1 = 0 then Stop 7. v j+1 := w j /β j+1 8. EndDo Hermitian matrix + Arnoldi Hermitian Lanczos In theory v i s defined by 3-term recurrence are orthogonal. However: in practice severe loss of orthogonality; 17-28 eig1

Lanczos with reorthogonalization Observation [Paige, 1981]: Loss of orthogonality starts suddenly, when the first eigenpair converges. It indicates loss of linear indedependence of the v i s. When orthogonality is lost, then several copies of the same eigenvalue start appearing. Full reorthogonalization reorthogonalize v j+1 against all previous v i s every time. Partial reorthogonalization reorthogonalize v j+1 against all previous v i s only when needed [Parlett & Simon] Selective reorthogonalization reorthogonalize v j+1 against computed eigenvectors [Parlett & Scott] No reorthogonalization Do not reorthogonalize - but take measures to deal with spurious eigenvalues. [Cullum & Willoughby] 17-29 eig1

Partial reorthogonalization Partial reorthogonalization: reorthogonalize only when deemed necessary. Main question is when? Uses an inexpensive recurrence relation Work done in the 80 s [Parlett, Simon, and co-workers] + more recent work [Larsen, 98] Package: PROPACK [Larsen] V 1: 2001, most recent: V 2.1 (Apr. 05) Often, need for reorthogonalization not too strong 17-30 eig1

The Lanczos Algorithm in the Hermitian Case Assume eigenvalues sorted increasingly λ 1 λ 2 λ n Orthogonal projection method onto K m ; To derive error bounds, use the Courant characterization λ 1 = λ j = min u K, u 0 { min u K, u 0 u ũ 1,...,ũ j 1 (Au, u) (u, u) = (Aũ 1, ũ 1 ) (ũ 1, ũ 1 ) (Au, u) (u, u) = (Aũ j, ũ j ) (ũ j, ũ j ) 17-31 eig1

Bounds for λ 1 easy to find similar to linear systems. Ritz values approximate eigenvalues of A inside out: λ 1 λ 2 λ n 1 λ n λ 1 λ2 λ n 1 λn 17-32 eig1

A-priori error bounds Theorem [Kaniel, 1966]: 0 λ (m) 1 λ 1 (λ N λ 1 ) [ tan (v1, u 1 ) T m 1 (1 + 2γ 1 ) where γ 1 = λ 2 λ 1 λ N λ 2 ; and (v 1, u 1 ) = angle between v 1 and u 1. + results for other eigenvalues. [Kaniel, Paige, YS] Theorem 0 λ (m) i λ i (λ N λ 1 ) where γ i = λ i+1 λ i λ N λ i+1, κ (m) i [ = j<i κ (m) i tan (v i, u i ) ] 2 T m i (1 + 2γ i ) λ (m) j λ N λ (m) j λ i 17-33 eig1 ] 2

The Lanczos biorthogonalization (A H A) ALGORITHM : 5 Lanczos bi-orthogonalization 1. Choose two vectors v 1, w 1 such that (v 1, w 1 ) = 1. 2. Set β 1 = δ 1 0, w 0 = v 0 0 3. For j = 1, 2,..., m Do: 4. α j = (Av j, w j ) 5. ˆv j+1 = Av j α j v j β j v j 1 6. ŵ j+1 = A T w j α j w j δ j w j 1 7. δ j+1 = (ˆv j+1, ŵ j+1 ) 1/2. If δ j+1 = 0 Stop 8. β j+1 = (ˆv j+1, ŵ j+1 )/δ j+1 9. w j+1 = ŵ j+1 /β j+1 10. v j+1 = ˆv j+1 /δ j+1 11.EndDo 17-34 eig1

Builds a pair of biorthogonal bases for the two subspaces K m (A, v 1 ) and K m (A H, w 1 ) Many choices for δ j+1, β j+1 in lines 7 and 8. Only constraint: δ j+1 β j+1 = (ˆv j+1, ŵ j+1 ) Let T m = α 1 β 2 δ 2 α 2 β 3... δ m 1 α m 1 β m δ m α m. v i K m (A, v 1 ) and w j K m (A T, w 1 ). 17-35 eig1

If the algorithm does not break down before step m, then the vectors v i, i = 1,..., m, and w j, j = 1,..., m, are biorthogonal, i.e., (v j, w i ) = δ ij 1 i, j m. Moreover, {v i } i=1,2,...,m is a basis of K m (A, v 1 ) and {w i } i=1,2,...,m is a basis of K m (A H, w 1 ) and AV m = V m T m + δ m+1 v m+1 e H m, A H W m = W m T H m + β m+1 w m+1 e H m, W H m AV m = T m. 17-36 eig1

If θ j, y j, z j are, respectively an eigenvalue of T m, with associated right and left eigenvectors y j and z j respectively, then corresponding approximations for A are Ritz value Right Ritz vector Left Ritz vector θ j V m y j W m z j [Note: terminology is abused slightly - Ritz values and vectors normally refer to Hermitian cases.] 17-37 eig1

Advantages and disadvantages Advantages: Nice three-term recurrence requires little storage in theory. Computes left and a right eigenvectors at the same time Disadvantages: Algorithm can break down or nearly break down. Convergence not too well understood. Erratic behavior Not easy to take advantage of the tridiagonal form of T m. 17-38 eig1

Look-ahead Lanczos Algorithm breaks down when: (ˆv j+1, ŵ j+1 ) = 0 Three distinct situations. lucky breakdown when either ˆv j+1 or ŵ j+1 is zero. In this case, eigenvalues of T m are eigenvalues of A. (ˆv j+1, ŵ j+1 ) = 0 but of ˆv j+1 0, ŵ j+1 0 serious breakdown. Often possible to bypass the step (+ a few more) and continue the algorithm. If this is not possible then we get an...... Incurable break-down. [very rare] 17-39 eig1

Look-ahead Lanczos algorithms deal with the second case. See Parlett 80, Freund and Nachtigal 90... Main idea: when break-down occurs, skip the computation of v j+1, w j+1 and define v j+2, w j+2 from v j, w j. For example by orthogonalizing A 2 v j... Can define v j+1 somewhat arbitrarily as v j+1 = Av j. Similarly for w j+1. Drawbacks: (1) projected problem no longer tridiagonal (2) difficult to know what constitutes near-breakdown. 17-40 eig1

Preconditioning eigenvalue problems Goal: To extract good approximations to add to a subspace in a projection process. Result: faster convergence. Best known technique: Shift-and-invert; Work with B = (A σi) 1 Some success with polynomial preconditioning [Chebyshev iteration / least-squares polynomials]. Work with B = p(a) Above preconditioners preserve eigenvectors. Other methods (Davidson) use a more general preconditioner M. 17-41 eig2

Shift-and-invert preconditioning Main idea: to use Arnoldi, or Lanczos, or subspace iteration for the matrix B = (A σi) 1. The matrix B need not be computed explicitly. Each time we need to apply B to a vector we solve a system with B. Factor B = A σi = LU. Then each solution Bx = y requires solving Lz = y and Ux = z. How to deal with complex shifts? If A is complex need to work in complex arithmetic. If A is real, then instead of (A σi) 1 use Re(A σi) 1 = 1 [ 2 (A σi) 1 + (A σi) 1] 17-42 eig2

Preconditioning by polynomials Main idea: Iterate with p(a) instead of A in Arnoldi or Lanczos,.. Used very early on in subspace iteration [Rutishauser, 1959.] Usually not as reliable as Shift-and-invert techniques but less demanding in terms of storage. 17-43 eig2

Question: How to find a good polynomial (dynamically)? 1 Use of Chebyshev polynomials over ellipses Approaches: 2 Use polynomials based on Leja points 3 Least-squares polynomials over polygons 4 Polynomials from previous Arnoldi decompositions 17-44 eig2

Polynomial filters and implicit restart Goal: exploit the Arnoldi procedure to apply polynomial filter of the form: p(t) = (t θ 1 )(t θ 2 )... (t θ q ) Assume AV m = V m H m + ˆv m+1 e T m and consider first factor: (t θ 1 ) (A θ 1 I)V m = V m (H m θ 1 I) + ˆv m+1 e T m Let H m θ 1 I = Q 1 R 1. Then, (A θ 1 I)V m = V m Q 1 R 1 + ˆv m+1 e T m (A θ 1 I)(V m Q 1 ) = (V m Q 1 )R 1 Q 1 + ˆv m+1 e T m Q 1 A(V m Q 1 ) = (V m Q 1 )(R 1 Q 1 + θ 1 I) +ˆv m+1 e T m Q 1 17-45 eig2

Notation: R 1 Q 1 + θ 1 I H (1) m ; AV m (1) = V m (1) Note that H (1) m (b(1) m+1) T e T m Q 1; V m Q 1 V m (1) H(1) is upper Hessenberg. Similar to an Arnoldi decomposition. Observe: m + v m+1(b (1) m+1) T R 1 Q 1 + θ 1 I matrix resulting from one step of the QR algorithm with shift θ 1 applied to H m. First column of V m (1) is a multiple of (A θ 1I)v 1. The columns of V m (1) are orthonormal. 17-46 eig2

Can now apply second shift in same way: (A θ 2 I)V (1) m = V (1) m Similar process: (H (1) m (H(1) m θ 2I) + v m+1 (b (1) m+1) T θ 2I) = Q 2 R 2 then Q 2 to the right: (A θ 2 I)V (1) m Q 2 = (V (1) m Q 2)(R 2 Q 2 ) + v m+1 (b (1) m+1) T Q 2 AV (2) m = V m (2) H(2) m + v m+1(b (2) m+1) T Now: 1st column of V m (2) = scalar (A θ 2I)v (1) 1 = scalar (A θ 2 I)(A θ 1 I)v 1 17-47 eig2

Note that (b (2) m+1) T = e T m Q 1Q 2 = [0, 0,, 0, η 1, η 2, η 3 ] Let: ˆV m 2 = [ˆv 1,..., ˆv m 2 ] consist of first m 2 columns of V m (2) and Ĥ m 2 = H m (1 : m 2, 1 : m 2). Then A ˆV m 2 = ˆV m 2 Ĥ m 2 + ˆβ m 1ˆv m 1 e T m with ˆβ m 1ˆv m 1 η 1 v m+1 + h (2) m 1,m 2v (2) m 1 ˆv m 1 2 = 1 Result: An Arnoldi process of m 2 steps with the initial vector p(a)v 1. In other words: We know how to apply polynomial filtering via a form of the Arnoldi process, combined with the QR algorithm. 17-48 eig2