A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

Size: px

Start display at page:

Download "A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering"

Roxanne Webb
5 years ago
Views:

1 A short course on: Preconditioned Krylov subspace methods Yousef Saad University of Minnesota Dept. of Computer Science and Engineering Universite du Littoral, Jan 19-30, 2005 Outline Part 1 Introd., discretization of PDEs Sparse matrices and sparsity Basic iterative methods (Relaxation..) Part 2 Projection methods Krylov subspace methods Part 3 Preconditioned iterations Preconditioning techniques Parallel implementations Part 4 Eigenvalue problems Applications Calais February 7, Origins of Eigenvalue Problems Structural Engineering [Ku = λm u] Electronic structure calculations [Shrödinger equation..] INTRODUCTION - MOTIVATION Stability analysis [e.g., electrical networks, mechanical system,..] Bifurcation analysis [e.g., in uid ow] Large sparse eigenvalue problems are among the most demanding calculations (in terms of CPU time) in scienti c computing. Calais February 7,

2 New application in information technology The Problem Search engines (google) rank web-sites in order to improve searches The google toolbar on some browsers ( - gives a measure of relevance of a page. The problem can be formulated as a Markov chain Seek the dominant eigenvector Algorithm used: power method For details see: We consider the eigenvalue problem Ax = λx or Ax = λbx Typically: B is symmetric (semi) positive de nite, A is symmetric or nonsymmetric Requirements vary: Compute a few λ i s with smallest or largest real parts; Compute all λ i s in a certain region of C ; Compute a few of the dominant eigenvalues; Compute all λ i s. Calais February 7, Calais February 7, Types of problems * Standard Hermitian (or symmetric real) Ax = λx, A H = A * Standard non-hermitian Ax = λx, A H A * Generalized Ax = λbx Several distinct sub-cases (B SPD, B SSPD, B singular with large null space, both A and B singular, etc..) * Quadratic (A + λb + λ 2 C)x = 0 EIGENVALUE PROBLEMS BASICS DENSE MATRIX CASE Background on eigenvalues/ eigenvectors/ Jordan form The Schur form Perturbation analysis, condition numbers.. Power method, subspace iteration algorithms The QR algorithm Practical QR algorithms: use of Hessenberg form and shifts The symmetric eigenvalue problem. * Nonlinear A(λ)x = 0 Calais February 7,

3 Basic de nitions and properties A complex scalar λ is called an eigenvalue of a square matrix A if there exists a nonzero vector u in C n such that Au = λu. The vector u is called an eigenvector of A associated with λ. The set of all eigenvalues of A is the spectrum of A. Notation: Λ(A). λ Λ(A) iff the columns of A λi are linearly dependent.... equivalent to saying that its rows are linearly dependent. So: there is a nonzero vector w such that w H (A λi) = 0 w H is called a left eigenvector of A (u is a right eigenvector) Basic de nitions and properties (cont.) An eigenvalue is a root of the Characteristic polynomial: p A (λ) = det(a λi) So there are n eigenvalues (counted with their multiplicities). The multiplicity of these eigenvalues as roots of p A are called algebraic multiplicities. The geometric multiplicity of an eigenvalue λ i is the number of linearly independent eigenvectors associated with λ i. λ Λ(A) iff det(a λi) = 0 Calais February 7, Calais February 7, Geometric multiplicity is algebraic multiplicity. An eigenvalue is simple if its (algebraic) multiplicity is one. It is semi-simple if its geometric and algebraic multiplicities are equal. Example: Consider A = What are the eigenvalues of A? their algebraic multiplicities? their geometric multiplicities? Is one a semi-simple eigenvalue? Same questions if a 33 is replaced by one. Two matrices A and B are similar if there exists a nonsingular matrix X such that B = XAX 1 De nition: A is diagonalizable if it is similar to a diagonal matrix THEOREM: A matrix is diagonalizable iff it has n linearly independent eigenvectors THEOREM (Schur form): Any matrix is unitarily similar to a triangular matrix, i.e., for any A there exists a unitary matrix Q and an upper triangular matrix R such that A = QRQ H Any Hermitian matrix is unitarily similar to a real diagonal matrix, (i.e. its Schur form is real diagonal). Calais February 7, Calais February 7,

4 Special case: symmetric / Hermitian matrices The min-max theorem Consider the Schur form of a real symmetric matrix A: A = QRQ H Since A H = A then R = R H Eigenvalues of A are real In addition, Q can be taken to be real when A is real. Label eigenvalues increasingly: λ 1 λ 2 λ n The eigenvalues of a Hermitian matrix A are characterized by the relation λ k = max S, dim(s)=k min x S,x 0 (Ax, x) (x, x) (A λi)(u + iv) = 0 (A λi)u = 0 and (A λi)v = 0 Can select eigenvectors to be real. There is an orthonormal basis of eigenvectors of A Consequence: λ 1 = max(ax, x)/(x, x) λ x 0 n = min(ax, x)/(x, x) x 0 Calais February 7, Calais February 7, The Law of interia Perturbation analysis A matrix A with m negative, z zero, and p positive eigenvalues, has inertia [m, z, p]. Sylvester s Law of inertia: then A and X T AX have the same inertia. If X is an n n nonsingular matrix, Example: Suppose that A = LDL T where L is unit lower triangular, and D diagonal. How many negative eigenvalues does A have? Example: Assume that A is tridiagonal. How many operations are required to determine the number of negative eigenvalues of A? Example: Devise an algorithm based on the inertia theorem to compute the i-th eigenvalue of a tridiagonal matrix. Calais February 7, General questions: If A is perturbed how does an eigenvalue change? How about an eigenvector? Also: sensitivity of an eigenvalue to perturbations THEOREM [Gerschgorin] λ Λ(A), i such that λ a ii j=n j=1 j i a ij. In words: An eigenvalue λ of A is located in one of the closed discs D(a ii, ρ i ) with ρ i = j i a ij. Calais February 7,

5 Gerschgorin s theorem - example Find a region of the complex plane where the eigenvalues of the following matrix are located: A = Re nement: if disks are all disjoint then each of them contains one eigenvalue Re nement: can combine row and column version of the theorem (column version obtained by applying theorem to A H ). Bauer-Fike theorem THEOREM [Bauer-Fike] Let λ, ũ be an approximate eigenpair with ũ 2 = 1, and let r = Aũ λũ ( residual vector ). Assume A is diagonalizable: A = XDX 1, with D diagonal. Then λ Λ(A) such that λ λ cond 2 (X) r 2. Very restrictive result - also not too sharp in general. Alternative formulation. If E is a perturbation to A then for any eigenvalue λ of A + E there is an eigenvalue λ of A such that: λ λ cond 2 (X) E 2. Calais February 7, Prove this result from the previous one. Calais February 7, Conditioning of Eigenvalues Assume that λ is a simple eigenvalue with right and left eigenvectors u and w H respectively. Consider the matrices: A(t) = A + te Eigenvalue λ(t), eigenvector u(t). Conditioning of λ of A relative to E is the dλ(t)/dt at t = 0. Write A(t)u(t) = λ(t)u(t) then multiply both sides to the left by w H w H (A + te)u(t) = λ(t)w H u(t) λ(t)w H u(t) = w H Au(t) + tw H Eu(t) Hence, λ(t) λ w H u(t) = w H Eu(t) t Take the limit at t = 0, λ (0) = wh Eu w H u Note: the left and right eigenvectors associated with a simple eigenvalue cannot be orthogonal to each other. Actual conditioning of an eigenvalue, given a perturbation in the direction of E is the modulus of the above quantity. In practice, one only has an estimate of E for some norm λ (0) Eu 2 w 2 (u, w) E 2 u 2 w 2 (u, w) = λw H u(t) + tw H Eu(t).

6 De nition. The condition number of a simple eigenvalue λ of an arbitrary matrix A is de ned by 1 cond(λ) = cos θ(u, w) in which u and w H are the right and left eigenvectors, respectively, associated with λ. Example: Consider the matrix A = Λ(A) = {1, 2, 3}. Right and left eigenvectors associated with λ 1 = 1: So: cond(λ 1 ) Perturbing a 11 to yields the spectrum: {0.2287, , }. as expected.. For Hermitian (also normal matrices) every simple eigenvalue is well-conditioned, since cond(λ) = 1. u = and w = Calais February 7, The power method Convergence of the power method Basic idea is to generate the sequence of vectors A k v 0 where v 0 0 then normalize. Most commonly used normalization: ensure that the largest component of the approximation is equal to one. ALGORITHM : 1 The Power Method 1. Choose a nonzero initial vector v (0). 2. For k = 1, 2,..., until convergence, Do: 3. v (k) = 1 Av (k 1) where α k 4. α k = argmax i=1,...,n (Av (k 1) ) i 5. EndDo argmax i=1,..,n x i the component x i with largest modulus Calais February 7, THEOREM Assume that there is one and only one eigenvalue λ 1 of A of largest modulus and that λ 1 is semi-simple. Then either the initial vector v 0 has no component in the invariant subspace associated with λ 1 or the sequence of vectors generated by the algorithm converges to an eigenvector associated with λ 1 and α k converges to λ 1. Proof in the diagonalizable case. v k is = vector A k v 0 normalized by a certain scalar ˆα k in such a way that its largest component is 1. Decompose the initial vector v 0 as v 0 = p i=1 γ i u i where the u i s are the eigenvectors associated with the λ i s, i = 1,..., n. Calais February 7,

7 Note that A k u i = λ p i u i 1 v k = scaling n λ k i γ iu i i=1 1 = scaling λ k 1 γ 1u 1 + n λ k i γk i u i i=2 1 = scaling u 1 + n λ k i γ i u i i=2 γ 1 Second term inside bracket converges to zero. QED Proof suggests that the convergence factor is given by ρ D = λ 2 λ 1 where λ 2 is the second largest eigenvalue in modulus. Example: Consider a Markov Chain matrix of size n = 55. Dominant eigenvalues are λ = 1 and λ = 1 the power method applied directly to A fails. (Why?) λ 1 Calais February 7, We can consider instead the matrix I + A The eigenvalue λ = 1 is then transformed into the (only) dominant eigenvalue λ = 2 Iteration Norm of diff. Res. norm Eigenvalue D D D D D D D D D D D D D D D D Calais February 7, The Shifted Power Method In previous example shifted A into B = A + I before applying power method. We could also iterate with B(σ) = A + σi for any positive σ Example: With σ = 0.1 we get the following improvement. Question: What is the best shift-of-origin σ to use? When all eigenvalues are real and such that λ 1 > λ 2 λ 2 λ n, then the value of σ which yields the best convergence factor is: σ opt = λ 2 + λ n 2 Iteration Norm of diff. Res. Norm Eigenvalue D D D D D D D D D D Calais February 7, Calais February 7,

8 Inverse Iteration Subspace iteration Observation: The eigenvectors of A and A 1 are identical. Idea: use the power method on A 1. Will compute the eigenvalues closest to zero. Shift-and-invert Use power method on (A σi) 1. will compute eigenvalues closest to σ. Advantages: fast convergence in general. Drawbacks: need to factor A (or A σi) into LU.. Generalizes the power method ALGORITHM : 2 Orthogonal iteration 1. Start: Q 0 = [q 1,..., q m ] 2. Iterate: Until convergence do, 3. X := AQ k 1 4. X = Q k R (QR factorization) 5. EndDo Normalization in step 4 is similar to the scaling used in the power method. Improvement: normalize only once in a while. Calais February 7, Calais February 7, ALGORITHM : 3 Subspace Iteration with Projection Start: Choose Q 0 = [q 0,..., q m ] Iterate: For k = 1,..., until convergence do: Compute Ẑ = AQ k 1. Ẑ = ZR Z (QR factorization) B = Z H AZ Compute the Schur factorization B = Y RY H Q k = ZY EndDo Again: no need to orthogonalize + project at each step. Assume λ 1 λ 2 λ m > λ m+1 λ n, then convergence rate for λ 1 is (generally) The QR algorithm The most common method for solving small (dense) eigenvalue problems. The basic algorithm: ALGORITHM : 4 QR without shifts 1. Until Convergence Do: 2. Compute the QR factorization A = QR 3. Set A := RQ 4. EndDo Until Convergence means Until A becomes close enough to an upper triangular matrix λ m+1 /λ 1 Calais February 7, Calais February 7,

9 Note: A new = RQ = Q H (QR)Q = Q H AQ A new is similar to A throughout the algorithm. Above basic algorithm is never used in practice. Two variations: (1) use shift of origin and (2) Transform A into Hessenberg form.. Practical QR: Shifts of origin Observation: (from theory): Last row converges fastest. Convergence is dictated by λ n λ n 1 We will now consider only the real symmetric case. Eigenvalues are real. A (k) remains symmetric throughout process. As k goes to in nity the last column and row (except a (k) nn ) converge to zero quickly.,, and a (k) nn converges to lowest eigenvalue. Calais February 7, Calais February 7, A (k) =..... a..... a..... a..... a..... a a a a a a a ALGORITHM : 5 QR with shifts 1. Until row a in, 1 i < n converges to zero DO: 2. Obtain next shift (e.g. µ = a nn ) 3. A µi = QR 5. Set A := RQ + µi 6. EndDo Convergence is cubic at the limit! [for symmetric case] Idea: Apply QR algorithm to A (k) µi with µ = a (k) nn. Note: eigenvalues of A (k) µi are shifted by µ, and eigenvectors are the same. Calais February 7, Calais February 7,

10 Result of algorithm: A (k) = λ n Next step: de ate, i.e., apply above algorithm to (n 1) (n 1) upper triangular matrix. Practical QR: Use of the Hessenberg Form Recall: Upper Hessenberg matrix is such that a ij = 0 for j < i 1 Observation: The QR algorithm preserves Hessenberg form (tridiagonal form in symmetric case). Results in substantial savings. 1-st step: reduce A to Hessenberg form. Then (2nd step) apply QR algorithm to resulting matrix. It is easy to adapt the Householder factorization to reduce a matrix into Hessenberg form [similarity transformation] Consider the rst step only on a 6 6 matrix. Calais February 7, We want H 1AH T 1 = H 1AH 1 to have the form: Choose a w in H 1 = I 2ww T so that (H 1 A)[2 : n, 1] = 0 Apply to left B = H 1 A. Then apply to right A 1 = BH 1. Observation: the Householder matrix H 1 which transforms the column A(:, 1) into e 1 works only on rows 2 to n. When applying H T 1 to the right of B = H 1A, only columns 2 to n will be altered 1st column retains the same pattern (zeros below row 2) Calais February 7, QR for Hessenberg matrices Need the implicit Q theorem Suppose that Q T AQ is an unreduced upper Hessenberg matrix. Then columns 2 to n of Q are determined uniquely (up to signs) by the rst column of Q. Implication: In order to compute A i+1 = Q T i AQ i we can: Compute the rst column of Q i [easy: = scalar A(:, 1)] Choose other columns so Q i = unitary, and A i+1 = Hessenberg. Calais February 7,

11 Example: With n = 6 : A = Choose G 1 = G(1, 2, θ 1 ) so that (G 1 A) 21 = 0 A 1 = G T 1 AG 1 = Choose G 2 = G(2, 3, θ 2 ) so that (G 2 A 1 ) 31 = 0 A 2 = G T 2 A 1G 2 = Choose G 3 = G(3, 4, θ 3 ) so that (G 3 A 2 ) 42 = 0 A 3 = G T 3 A 2G 3 = Choose G 4 = G(4, 5, θ 4 ) so that (G 4 A 3 ) 53 = 0 A 4 = G T 4 A 3G 4 = Process known as Bulge chasing Similar idea for the symmetric (tridiagonal) case Calais February 7, The QR algorithm for symmetric matrices Most important method used : reduce to tridiagonal form and apply the QR algorithm with shifts. Householder transformation to Hesseenberg form yields a tridiagonal matrix because HAH T = A 1 is symmetric and also of Hessenberg form it is tridiagonal symmetric. Tridiagonal form is preserved by QR similarity transformation Calais February 7,

12 Practical method How to implement the QR algorithm with shifts? It is best to use Givens rotations can do a shifted QR step without explicitly shifting the matrix.. Two most popular shifts: s = a nn and s = smallest e.v. of A(n 1 : n, n 1 : n) THE SINGULAR VALUE DECOMPOSITION The SVD existence - properties. Pseudo-inverses and the SVD Use of SVD for least-squares problems Applications of the SVD Calais February 7, The Singular Value Decomposition (SVD) For any real n m matrix A there exists orthogonal matrices U R n n and V R m m such that A = UΣV T where Σ is a diagonal matrix with nonnegative diagonal entries. σ 11 σ 22 σ pp 0 with p = min(m, n) Complete v 1 into an orthonormal basis of R m V [v 1, V 2 ] = m m unitary Complete u 1 into an orthonormal basis of R m U [u 1, U 2 ] = n n unitary Then, it is easy to show that AV = U σ 1 w T 0 B U T AV = σ 1 w T 0 B A 1 The σ ii are called singular values of A. Denoted simply by σ i. Proof: [one among many!] Let σ 1 = A 2 = max x, x 2 =1 Ax 2 There exists a pair of unit vectors v 1, u 1 such that Av 1 = σ 1 u 1 Calais February 7, Observe that A 1 σ 1 w 2 σ 1 σ w 2 = σ1 2 + w 2 w This shows that w must be zero [why?] Complete the proof by an induction argument. Calais February 7,

13 Case 1: The thin SVD T V Consider the Case-1. It can be rewritten as A = U Σ A = [U 1 U 2 ] Σ 1 0 V T Case 2: Which gives: A = U 1 Σ 1 V T where U 1 is n m (same shape as A), and Σ 1 and V are m m A = U Σ V T referred to as the thin SVD. Important in practice. Show how to obtain the thin SVD from the QR factorization of A and the SVD of an m m matrix Calais February 7, Calais February 7, Some properties. Assume that Properties of the SVD (continued) σ 1 σ 2 σ r > 0 and σ r+1 = = σ p = 0 A 2 = σ 1 = largest singular value Then: rank(a) = r = number of nonzero singular values. Ran(A) = span{u 1, u 2,..., u r } Null(A) = span{v r+1, v r+2,..., v m } The matrix A admits the SVD expansion: r A = σ i u i v T i i=1 A F = ( r i=1 σ 2 i ) 1/2 When A is an n n nonsingular matrix then A 1 2 = 1/σ n = inverse of smallest s.v. Let k < r and then A k = k i=1 σ i u i v T i min rank(b)=k A B 2 = A A k 2 = σ k+1 Calais February 7,

14 De ne the r r matrix Σ 1 = diag(σ 1,..., σ r ) Let A R n m and consider now A T A (which is of size m m) Compute the singular value decomposition the matrix: A = A T A = V Σ T ΣV T A T A = V Σ } {{ } m m V T Find the matrix B of rank 1 which is the closest to the above matrix in the 2-norm sense. This gives the spectral decomposition of A T A. Similarly, U gives the eigenvectors of AA T. What is the pseudo-inverse of A? What is the pseudo-inverse of B? AA T = U Σ } {{ } n n U T Find the vector x of smallest norm which minimizes b Ax 2 with b = (1, 1) T Find the vector x of smallest norm which minimizes b Bx 2 Important: A T A = V D 1 V T and AA T = UD 2 U T give the SVD factors U, V up to signs! with b = (1, 1) T Calais February 7, Pseudo-inverse of an arbitrary matrix Least-squares problems and the SVD The pseudo-inverse of A is given by Σ A = V 0 0 Moore-Penrose conditions: The pseudo inverse of a matrix is uniquely U T determined by these four conditions: (1) AXA = A (2) XAX = X (3) (AX) H = AX (4) (XA) H = XA In the full-rank overdetermined case, A = (A T A) 1 A T SVD can give much information about solving overdetermined and underdetermined linear systems Let A be an n m matrix and A = UΣV its SVD with r = rank(a), V = [v 1,..., v m ] U = [u 1,..., u n ]. Then x LS = r u T i b i=1 σ i minimizes b Ax] 2 and has the smallest 2-norm among all possible minimizers. In addition, ρ LS b Ax LS 2 = z 2 with z = [u r+1,..., u n ] T b v i Calais February 7, Calais February 7,

15 Least-squares problems and pseudo-inverses Ill-conditioned systems and the SVD A restatement of the rst part of the previous result: Consider the general linear least-squares problem min x S x 2 S = {x R m b Ax 2 min} This problem always has a unique solution given by x = A b Let A be n n (square matrix) and A = UΣV T its SVD Solution of Ax = b is x = A 1 b = n i=1 u T i b σ i When A is very ill-conditioned, it may have many small singular values. The division by these small σ i s will amplify any noise in the data. Result: solution may be meaningless. Remedy: use regularization, i.e., truncate the SVD by only keeping the σ i s that are larger than a threshold τ. v i This gives the truncated SVD solution (SVD regularization:) x T SV D = u T i b σ i τ σ i v i Calais February 7, Many applications [e.g., Image processing,..] Calais February 7, Numerical rank and the SVD Assume that the original matrix A is exactly of rank k. The computed SVD of A will be the SVD of a nearby matrix A+E. Easy to show that ˆσ i σ i α σ 1 eps LARGE SPARSE EIGENVALUE PROBLEMS Result: zero singular values will yield small computed singular values Determining the numerical rank: treat singular values below a certain threshold δ as zero. Practical problem : need to set δ. Calais February 7,

16 General Tools for Solving Large Eigen-Problems A few popular solution Methods Projection techniques Arnoldi, Lanczos, Subspace Iteration; Preconditioninings: shift-and-invert, Polynomials,... De ation and restarting techniques Good computational codes combine these three ingredients Subspace Iteration [Now less popular sometimes used for validation] Arnoldi s method (or Lanczos) with polynomial acceleration [Stiefel 58, Rutishauser 62, YS 84, 85, Sorensen 89,...] Shift-and-invert and other preconditioners. [Use Arnoldi or Lanczos for (A σi) 1.] Davidson s method and variants, Generalized Davidosn s method [Morgan and Scott, 89], Jacobi-Davidsion Emerning method: Automatic Multilevel Substructuring (AMLS). Calais February 7, Calais February 7, Projection Methods for Eigenvalue Problems Rayleigh-Ritz projection General formulation: Projection method onto K orthogonal to L Given: Two subspaces K and L of same dimension. Find: λ, ũ such that λ C, ũ K; ( λi A)ũ L Two types of methods: Orthogonal projection methods: situation when L = K. Oblique projection methods: When L K. Given: a subspace X known to contain good approximations to eigenvectors of A. Question: How to extract good approximations to eigenvalues/ eigenvectors from this subspace? Answer: Rayleigh Ritz process. Let Q = [q 1,..., q m ] an orthonormal basis of X. Then write an approximation in the form ũ = Qy and obtain y by writing Q H (A λi)ũ = 0 Q H AQy = λy Calais February 7, Calais February 7,

17 Procedure: 1. Obtain an orthonormal basis of X 2. Compute C = Q H AQ (an m m matrix) 3. Obtain Schur factorization of C, C = Y RY H 4. Compute Ũ = QY Property: if X is (exactly) invariant, then procedure will yield exact eigenvalues and eigenvectors. Proof: Since X is invariant, (A λi)u = Qz for a certain z. Q H Qz = 0 implies z = 0 and therefore (A λi)u = 0. Can use this procedure in conjunction with the subspace obtained from subspace iteration algorithm Subspace Iteration Original idea: projection technique onto a subspace if the form Y = A k X In practice: Replace A k by suitable polynomial [Chebyshev] Easy to implement (in symmetric case); Advantages: Easy to analyze; Disadvantage: Slow. Often used with polynomial acceleration: A k X replaced by C k (A)X. Typically C k = Chebyshev polynomial. Calais February 7, Algorithm: Subspace Iteration with Projection 1. Start: Choose an initial system of vectors X = [x 0,..., x m ] and an initial polynomial C k. 2. Iterate: Until convergence do: (a) Compute Ẑ = C k (A)X old. (b) Orthonormalize Ẑ into Z. (c) Compute B = Z H AZ and use the QR algorithm to compute the Schur vectors Y = [y 1,..., y m ] of B. (d) Compute X new = ZY. THEOREM: Let S 0 = span{x 1, x 2,..., x m } and assume that S 0 is such that the vectors {P x i } i=1,...,m are linearly independent where P is the spectral projector associated with λ 1,..., λ m. Let P k the orthogonal projector onto the subspace S k = span{x k }. Then for each eigenvector u i of A, i = 1,..., m, there exists a unique vector s i in the subspace S 0 such that P s i = u i. Moreover, the following inequality is satis ed (I P k )u i 2 u i s i 2 where ɛ k tends to zero as k tends to in nity. λ m+1 λ i + ɛ k k, (1) (e) Test for convergence. If satis ed stop. Else select a new polynomial C k and continue. Calais February 7,

18 KRYLOV SUBSPACE METHODS Principle: Projection methods on Krylov subspaces, i.e., on K m (A, v 1 ) = span{v 1, Av 1,, A m 1 v 1 } KRYLOV SUBSPACE METHODS probably the most important class of projection methods [for linear systems and for eigenvalue problems] many variants exist depending on the subspace L. Properties of K m. Let µ = deg. of minimal polynom. of v. Then, K m = {p(a)v p = polynomial of degree m 1} K m = K µ for all m µ. Moreover, K µ is invariant under A. dim(k m ) = m iff µ m. Calais February 7, ARNOLDI S ALGORITHM Goal: to compute an orthogonal basis of K m. Input: Initial vector v 1, with v 1 2 = 1 and m. ALGORITHM : 6 Arnoldi s procedure For j = 1,..., m do Compute w := Av j For i = 1,..., j, do h i,j := (w, v i ) w := w h i,j v i h j+1,j = w 2 ; v j+1 = w/h j+1,j End Result of Arnoldi s algorithm Let H m = x x x x x x x x x x x x x x x x x x x x H m = 1. V m = [v 1, v 2,..., v m ] orthonormal basis of K m. 2. AV m = V m+1 H m = V m H m + h m+1,m v m+1 e T m 3. Vm T AV m = H m H m last row. x x x x x x x x x x x x x x x x x x x Calais February 7, Calais February 7,

19 Appliaction to eigenvalue problems Restarted Arnoldi Write approximate eigenvector as ũ = V m y + Galerkin condition (A λi)v m y K m V H m (A λi)v m y = 0 Approximate eigenvalues are eigenvalues of H m H m y j = λ j y j Associated approximate eigenvectors are ũ j = V m y j Typically a few of the outermost eigenvalues will converge rst. In practice: Memory requirement of algorithm implies restarting is necessary ALGORITHM : 7 Restarted Arnoldi (computes rightmost eigenpair) 1. Start: Choose an initial vector v 1 and a dimension m. 2. Iterate: Perform m steps of Arnoldi s algorithm. 3. Restart: Compute the approximate eigenvector u (m) 1 4. associated with the rightmost eigenvalue λ (m) If satis ed stop, else set v 1 u (m) 1 and goto 2. Calais February 7, Calais February 7, Example: Small Markov Chain matrix [ Mark(10), dimension = 55]. Restarted Arnoldi procedure for computing the eigenvector associated with the eigenvalue with algebraically largest real part. We use m = 10. m R(λ) I(λ) Res. Norm D D D D D D D D D D-07 Restarted Arnoldi (cont.) Can be generalized to more than *one* eigenvector : v (new) 1 = p ρ i u (m) i i=1 However: often does not work well (hard to nd good coef cients ρ i s) Alternative : compute eigenvectors (actually Schur vectors) one at a time. Implicit de ation. Calais February 7, Calais February 7,

20 Hermitian case: The Lanczos Algorithm The Hessenberg matrix becomes tridiagonal : A = A H and Vm HAV m = H m H m = Hm H We can write α 1 β 2 H m = β 2 α 2 β 3 β 3 α 3 β β m α m Consequence: three term recurrence β j+1 v j+1 = Av j α j v j β j v j 1 (2) ALGORITHM : 8 Lanczos 1. Choose an initial vector v 1 of norm unity. Set β 1 0, v For j = 1, 2,..., m Do: 3. w j := Av j β j v j 1 4. α j := (w j, v j ) 5. w j := w j α j v j 6. β j+1 := w j 2. If β j+1 = 0 then Stop 7. v j+1 := w j /β j+1 8. EndDo Hermitian matrix + Arnoldi Hermitian Lanczos In theory v i s de ned by 3-term recurrence are orthogonal. However: in practice severe loss of orthogonality; Calais February 7, Calais February 7, Observation [Paige, 1981]: Loss of orthogonality starts suddenly, when the rst eigenpair has converged. It is a sign of loss of linear indedependence of the computed eigenvectors. When orthogonality is lost, then several the copies of the same eigenvalue start appearing. Reorthogonalization Full reorthogonalization reorthogonalize v j+1 against all previous v i s every time. Partial reorthogonalization reorthogonalize v j+1 against all previous v i s only when needed [Parlett & Simon] Selective reorthogonalization reorthogonalize v j+1 against computed eigenvectors [Parlett & Scott] No reorthogonalization Do not reorthogonalize - but take measures to deal with spurious eigenvalues. [Cullum & Willoughby] Calais February 7, Calais February 7,

21 The Lanczos biorthogonalization (A H A) LANCZOS BIORTHOGONALIZATION ALGORITHM : 9 The Lanczos Bi-Orthogonalization Procedure 1. Choose v 1, w 1 such that (v 1, w 1 ) = 1. Set β 1 = δ 1 0, w 0 = v For j = 1, 2,..., m Do: 3. α j = (Av j, w j ) [ α j = (Av j β j v j 1, w j ) ] 4. ˆv j+1 = Av j α j v j β j v j 1 [ ˆv j+1 = (Av j β j v j 1 ) α j v j ] 5. ŵ j+1 = A H w j ᾱ j w j δ j w j 1 [ ŵ j+1 = (A H w j δ j w j 1 ) ᾱ j w j ] 6. δ j+1 = (ˆv j+1, ŵ j+1 ) 1/2. If δ j+1 = 0 Stop 7. β j+1 = (ˆv j+1, ŵ j+1 )/δ j+1 8. w j+1 = ŵ j+1 / β j+1 9. v j+1 = ˆv j+1 /δ j EndDo Calais February 7, Builds a pair of biorthogonal bases for the two subspaces K m (A, v 1 ) and K m (A H, w 1 ) Many choices for δ j+1, β j+1 in lines 7 and 8. Only constraint: δ j+1 β j+1 = (ˆv j+1, ŵ j+1 ) Let T m = α 1 β 2 δ 2 α 2 β 3... δ m 1 α m 1 β m δ m α m. If the algorithm does not break down before step m, then the vectors v i, i = 1,..., m, and w j, j = 1,..., m, are biorthogonal, i.e., (v j, w i ) = δ ij 1 i, j m. Moreover, {v i } i=1,2,...,m is a basis of K m (A, v 1 ) and {w i } i=1,2,...,m is a basis of K m (A H, w 1 ) and AV m = V m T m + δ m+1 v m+1 e H m, A H W m = W m T H m + β m+1 w m+1 e H m, W H m AV m = T m. v i K m (A, v 1 ) and w j K m (A T, w 1 ). Calais February 7,

22 If θ j, y j, z j are, respectively an eigenvalue of T m, with associated right and left eigenvectors y j and z j respectively, then corresponding approximations for A are Ritz value Right Ritz vector Left Ritz vector θ j V m y j W m z j [Note: terminology is abused slightly - Ritz values and vectors normally refer to Hermitian cases.] Advantages and disadvantages Advantages: Nice three-term recurrence requires little storage in theory. Computes left and a right eigenvectors at the same time Disadvantages: Algorithm can breakdown or nearly breakdown. Convergence not too well understood. Erratic behavior Not easy to take advantage of the tridiagonal form of T m. Calais February 7, Calais February 7, Look-ahead Lanczos Algorithm breaks down when: (ˆv j+1, ŵ j+1 ) = 0 Three distinct situations. lucky breakdown when either ˆv j+1 or ŵ j+1 is zero. In this case, eigenvalues of T m are eigenvalues of A. (ˆv j+1, ŵ j+1 ) = 0 but of ˆv j+1 0, ŵ j+1 0 serious breakdown. Often possible to bypass the step (+ a few more) and continue the algorithm. If this is not possible then we get an... Look-ahead Lanczos algorithms deal with the second case. See Parlett 80, Freund and Nachtigal Main idea: when break-down occurs, skip the computation of v j+1, w j+1 and de ne v j+2, w j+2 from v j, w j. For example by orthogonalizing A 2 v j... Can de ne v j+1 somewhat arbitrarily as v j+1 = Av j. Similarly for w j+1. Drawbacks: (1) projected problem no longer tridiagonal (2) dif cult to know what constitutes near-breakdown.... Incurable break-down. [very rare] Calais February 7, Calais February 7,

23 De ation Very useful in practice. DEFLATION Different forms: locking (subspace iteration), selective orthogonalization (Lanczos), Schur de ation,... Calais February 7, A little background Consider Schur canonical form A = URU H where U is a (complex) upper triangular matrix. Vector columns u 1,..., u n called Schur vectors. Note: Schur vectors depend on each other, and on the order of the eigenvalues Wiedlandt De ation: Assume we have computed a right eigenpair λ 1, u 1. Wielandt de ation considers eigenvalues of A 1 = A σu 1 v H Note: Λ(A 1 ) = {λ 1 σ, λ 2,..., λ n } Wielandt de ation preserves u 1 as an eigenvector as well all the left eigenvectors not associated with λ 1. An interesting choice for v is to take simply v = u 1. In this case Wielandt de ation preserves Schur vectors as well.

24 It is possible to apply this procedure successively: ALGORITHM : 10 Explicit De ation 1. A 0 = A 2. For j = 0... µ 1 Do: 3. Compute a dominant eigenvector of A j De ated Arnoldi: When rst eigenvector converges, we freeze it as the rst vector of V m = [v 1, v 2,..., v m ]. Arnoldi starts working at column v 2. Orthogonalization is still done against v 1,..., v j at step j. Each new converged eigenvector will be added to the locked set of eigenvectors. 4. De ne A j+1 = A j σ j u j u H j 5. End Computed u 1, u 2.,.. form a set of Schur vectors for A. Alternative: implicit de ation (within a procedure such as Arnoldi). Calais February 7, For k = 1,....NEV do: /* Eigenvalue loop */ Thus, for k = 2: 1. For j = k, k + 1,..., m do: /* Arnoldi loop*/ Compute w := Av j. {}}{ V m = v 1, v 2, v 3,..., v m }{{} Locked active Orthonormalize w against v 1, v 2,..., v j v j+1 2. Compute next approximate eigenpair λ, ũ. 3. Orthonormalize ũ against v 1,..., v j Result = s = approximate Schur vector. 4. De ne v k := s. Hm = 5. If approximation not satisfactory go to 1. Similar techniques in Subspace iteration [G. Stewart s SRRIT] 6. Else de ne h i,k = (Av k, v i ), i = 1,.., k,

25 Example: Matrix Mark(10) small Markov chain matrix (N = 55). First eigenpair by iterative Arnoldi with m = 10. m Re(λ) Im(λ) Res. Norm D D D D D D D D D D-07 Computing the next 2 eigenvalues of Mark(10). Eig. Mat-Vec s Re(λ) Im(λ) Res. Norm D D D D D D D D-07 Preconditioning eigenvalue problems Goal: To extract good approximations to add to a subspace in a projection process. Result: faster convergence. PRECONDITIONING - DAVIDSON S METHOD Best known technique: Shift-and-invert; Work with B = (A σi) 1 Some success with polynomial preconditioning [Chebyshev iteration / least-squares polynomials]. Work with B = p(a) Above preconditioners preserve eigenvectors. Other methods (Davidson) use a more general preconditioner M. Calais February 7,

26 Shift-and-invert preconditioning How to deal with complex shifts? Main idea: to use Arnoldi, or Lanczos, or subspace iteration for the matrix B = (A σi) 1. The matrix B need not be computed explicitly. Each time we need to apply B to a vector we solve a system with B. Factor B = A σi = LU. Then each solution Bx = y requires solving Lz = y and Ux = z. If A is complex need to work in complex arithmetic. If A is real, it is desirable that Arnoldi/ Lanczos algorithms work with a real matrix. Idea: Instead of using B = (A σi) 1 use B + = Re(A σi) 1 = 1 [ 2 (A σi) 1 + (A σi) 1] or B = Im(A σi) 1 = 1 [ 2i (A σi) 1 (A σi) 1] Little difference between the two. Result: B = θ(a σi) 1 (A σi) with θ = Im(σ). Calais February 7, Calais February 7, Preconditioning by polynomials Main idea: Iterate with p(a) instead of A in Arnoldi or Lanczos,.. Used very early on in subspace iteration [Rutishauser, 1959.] Usually not as reliable as Shift-and-invert techniques but less demanding in terms of storage. Question: How to nd a good polynomial (dynamically)? 1 Use of Chebyshev polynomials over ellipses 2 Use polynomials based on Leja points Approaches: 3 Least-squares polynomials over polygons 4 Polynomials from previous Arnoldi decompositions Calais February 7, Calais February 7,

27 Polynomial lters and implicit restart Calais February 7, Goal: to apply polynomial lter of the form p(t) = (t θ 1 )(t θ 2 )... (t θ q ) by exploiting the Arnoldi procedure. Assume AV m = V m H m + β m v m+1 e T m and consider rst factor: (t θ 1 ) (A θ 1 I)V m = V m (H m θ 1 I) + β m v m+1 e T m Let H m θ 1 I = Q 1 R 1. Then, (A θ 1 I)V m = V m Q 1 R 1 + β m v m+1 e T m (A θ 1 I)(V m Q 1 ) = (V m Q 1 )R 1 Q 1 + β m v m+1 e T m Q 1 A(V m Q 1 ) = (V m Q 1 )(R 1 Q 1 + θ 1 I) + β m v m+1 e T m Q 1 Notation: R 1 Q 1 +θ 1 I H (1) m ; V (1) m AV m (1) = V m (1) Note that H (1) m H(1) is upper Hessenberg. Similar to an Arnoldi decomposition. Observe: (b(1) m+1) T e T m Q 1; V m Q 1 = m + v m+1(b (1) m+1) T R 1 Q 1 + θ 1 I matrix resulting from one step of the QR algorithm with shift θ 1 applied to H m. Can now apply second shift in same way: (A θ 2 I)V (1) m = V (1) m Similar process: (H (1) m Now: (H(1) m θ 2I) + v m+1 (b (1) m+1 )T θ 2I) = Q 2 R 2 then Q 2 to the right: (A θ 2 I)V (1) m Q 2 = (V (1) m Q 2)(R 2 Q 2 ) + v m+1 (b (1) m+1 )T Q 2 AV (2) m = V m (2) H(2) m + v m+1(b (2) m+1) T First column of V m (2) = scalar (A θ 2 I)v (1) 1 = scalar (A θ 2 I)(A θ 1 I)v 1 First column of V (1) m is a multiple of (A θ 1I)v 1. The columns of V (1) m are orthonormal. Calais February 7, Calais February 7,

28 Note that (b (2) m+1) T = e T m Q 1Q 2 = [0, 0,, 0, q 1, q 2, q 3 ] Let: ˆV m 2 = [ˆv 1,..., ˆv m 2 ] consist of rst m 2 columns of V (2) m and Ĥ m 2 = leading principal submatrix of H m. Then A ˆV m 2 = ˆV m 2 Ĥ m 2 + ˆβ m 1ˆv m 1 e T m with ˆβ m 1ˆv m 1 q 1 v m+1 + h (2) m 1,m 2 v(2) m 1 ˆv m 1 2 = 1 Result: An Arnoldi process of m 2 steps with the initial vector p(a)v 1. In other words: We know how to apply polynomial ltering via a form of the Arnoldi process, combined with the QR algorithm. The Davidson approach Goal: to use a more general preconditioner to introduce good new components to the subspace. Ideal new vector would be eigenvector itself! Next best thing: an approximation to (A µi) 1 r where r = (A µi)z, current residual. Approximation written in the form M 1 r. Note that M can vary at every step if needed. Calais February 7, Calais February 7, ALGORITHM : 11 Davidson s method (Real symmetric case) Note: Traditional Davidson uses diagonal preconditioning: M j = 1. Choose an initial unit vector v 1. Set V 1 = [v 1 ]. 2. Until convergence Do: 3. For j = 1,..., m Do: 4. w := Av j. 5. Update H j Vj T j 6. Compute the smallest eigenpair µ, y of H j. 7. z := V j y r := Az µz 8. Test for convergence. If satis ed Return 9. If j < m compute t := Mj 1 r 10. compute V j+1 := ORT HN([V j, t]) 11. EndIf 12. Set v 1 := z and go to EndDo 14. EndDo Calais February 7, D σ j I. Will work only for some matrices Other options: Shift-and-invert using ILU [negatives: expensive + hard to parallelize.] Filtering (by averaging) Filtering by using smoothers (multigrid style) Iterative solves [See Jacobi-Davidson] Calais February 7,

29 The Lanczos Algorithm in the Hermitian Case Assume eigenvalues sorted increasingly λ 1 λ 2 λ n CONVERGENCE THEORY Orthogonal projection method onto K m ; To derive error bounds, use the Courant characterization (Au, u) λ 1 = min u K, u 0 (u, u) = (Aũ 1, ũ 1 ) (ũ 1, ũ 1 ) (Au, u) λ j = { min u K, u 0 (u, u) = (Aũ j, ũ j ) (ũ u ũ 1,...,ũ j, ũ j ) j 1 Calais February 7, Bounds for λ 1 easy to nd similar to linear systems. Ritz values approximate eigenvalues of A inside out: λ 1 λ 2 λ 1 λ2 λ n 1 λn λ n 1 λ n A-priori error bounds Theorem [Kaniel, 1966]: 0 λ (m) tan (v 1, u 1 ) 1 λ 1 (λ N λ 1 ) T m 1 (1 + 2γ 1 ) where γ 1 = λ 2 λ 1 λ N λ 2 ; and (v 1, u 1 ) = acute angle between v 1 and u results for other eigenvalues. [Kaniel, Paige, YS] Theorem [YS,1980] 0 λ (m) i λ i (λ N λ 1 ) where γ i = λ i+1 λ i λ N λ i+1, κ (m) i = j<i κ (m) i λ (m) j λ N λ (m) j λ i tan (v i, u i ) T m i (1 + 2γ i ) 2 Calais February 7, Calais February 7,

30 Theory for nonhermitian case More dif cult. No convincing results on global convergence. Can get a general a-priori a-posteriori error bound Let P be the orthogonal projector onto K and Q be the (oblique) projector onto K and orthogonally to L. Q x L x P x K Px K, x Px K Qx K, x Qx L Analysis Approximate problem amounts to solving Q(Ax λx) = 0, x K or in operator form QAPx = λx Set A m QAP THEOREM. Let γ = Q(A λi)(i P) 2. Then the residual norms of the pairs λ, Pu and λ, u for the linear operator A m satisfy, respectively (A m λi)pu 2 γ (I P)u 2 (A m λi)u 2 λ 2 + γ 2 (I P)u 2. Calais February 7, Calais February 7, How to estimate (I P)u i 2? Particular case i = 1 Assume that A is diagonizable and expand v 1 in the eigen-basis v 1 = N j=1 α j u j Assume: Λ(A)\{λ 1 } is an ellipse E(c, e, a). Im(z) Assume α i 0, u j 2 = 1 for all j. Then: (I P)u i 2 ξ i ɛ (m) i c a c e c c+e c+a Re(z) where ξ i = j i α j α i and ɛ (m) i = { min max p(λ p Pm 1 j) j i p(λ i )=1 ɛ (m) ( ) a ( e C λ1 c m 1 e 1 C m 1 where C m 1 = Chebyshev polynomial of degree m 1 of the rst ) Calais February 7, kind.

31 Calais February 7, HARMONIC RITZ VALUES Harmonic Ritz values: Literature Harmonic Ritz values (continued) Morgan 91 [Hermitian case] Main idea: take L = AK in projection process Freund 93 [Non-Hermitian case, Starting point: GMRES] Morgan 93 [Nonhermitian case] Paige, Parlett, Van der Vorst 95. Chapman & Y.S. 95. [use in De ated GMRES] Many publications in the 40s and 50s (Intermediate eigenvalue problems, Lehman intervals, etc..) In context of Arnoldi s method. Write ũ = V m y then: (A λi)v m y {AV m } Using AV m = V m+1 H m H H m V H [ m+1 Vm+1 H m y λv m y ] = 0 Notation: H m = H m last row. Then H H m H my λh H m y = 0 Calais February 7, Calais February 7,

32 or Remark: ( H H m H m + h 2 m+1,m e me H m ) y = λh H m y Implementation within Davidson framework Slight varation to standard Davidson: Introduce z i = Mi 1 r i to subspace. Proceed as in FGMRES: v j+1 = Orthn(Az j, V j ). Assume H m is nonsingular and multiply both sides by Hm H. Then, the problem is equivalent to with z m = h 2 m+1,m H H m e m. ( Hm + z m e H m ) y = λy Modi ed from H m only in the last column. From Gram-Schmidt process: Hence the relation Approximation: λ, ũ = Z m y Az j = j+1 i=1 h ij v i AZ m = V m+1 H m Galerkin Condition: r AZ m gives the generalized problem H H m H m y = λ H H m V H m+1 Z m y Calais February 7, Calais February 7, Davidson s algorithm and two variants Calais February 7, DAVIDSON s ALGORITHM. 1 Start: select v 1. For j = 1,..., m Do: Update V H j AV j. Compute Ritz pair ũ, λ Compute r = Aũ λu z = M 1 r v j+1 = ORT HN(z, V j ) EndDo DAVIDSON S ALGORITHM. 2 Start: select r. For j = 1,..., m Do: z = M 1 r v j = ORT HN(z, V j 1 ) Compute w = Av j and Update V H j AV j. Compute Ritz pair ũ, λ Compute r = Aũ λu EndDo Difference: start with a preconditioning operation instead of a matvec. In general minor differences.

33 HARMONIC DAVIDSON Start: select r. Set v 1 = r/ r 2. For j = 1,..., m Do: z j = M 1 r Compute w = Az j and v j+1 = ORT HN(w, V j, h :,j ); Update G = H H j H j, and S = H H j V j+1 H Z j ; Compute Ritz pair ũ, λ : Gy = λsy, ũ = Z j y Compute r = Aũ λu EndDo Arnoldi part identical with that of FGMRES. Calais February 7, Relation with GMRES (Freund 91) The Harmonic Ritz values are the roots of the GMRES polynomial: ψ m = arg min = ψ(a)r 0 2 ψ P m, ψ(0)=1 Proof. GMRES condition is: βv 1 AV m y {AV m } ψ m (A)v 1 {AV m } (A λ i I)V m y i {AV m } Same condition as that of Harmonic Ritz projection. λ i = Ritz harmonic value, ũ i = V m y i = Ritz Harmonic vector. Calais February 7, Harmonic values and interior eigenvalues Harmonic Ritz projection in the Hermitian Case Let z = A 1 ũ and rewrite the condition [A λi]ũ AK as: Order eigenvalues increasingly: [ λ 1 I A 1 ] z L Orthogonal projection method for A 1. z L λ 1 λ 2 λ n Recall: Ritz values approximate eigenvalues of A inside out: Note: This is NOT shift-and-invert in disguise. Space of approximants is the same as for standard projection. Interesting consequence for Hermitian case. λ 1 λ 2 λ 1 λ2 λ n 1 λn λ n 1 λ n De ne: K (i) = {x K x ũ 1, ũ 2,... ũ i 1 } (K (1) K). Then λ i = min x Ki (Ax,x) (x,x) Calais February 7, Calais February 7,

34 Apply principle to Harmonic Ritz values λ 1 1 λ 1 1 ; λ 1 n λ 1 n λ1 λ 1 ; λn λ n Careful: treat positive and negative eigenvalues separately. Result: [Paige, Parlett, Van der Vorst 95] λ 2 λ 2 λ 1 λ 1 0 Assume for simplicity that A is SPD. λ + 1 λ + 2 De ne: K (i) = {x K Ax Aũ 1, Aũ 2,... Aũ i 1 } (K (i) K). Then λ + 1 λ 1 i = max x Ki (Ax,x) (Ax,Ax) λ + 2 Alternative Projections Eigenvalue problems are really non-linear systems of equations.. Idea: nd µ such (A µi)v is nearly rank-de cient Leads to det[v H (A µi) H (A µi)v ] = 0 Assume µ = real. Using AV m = V m+1 H quadratic problem ( H m T H m µ(h m + Hm T ) + µ2 I)y = 0 Calais February 7, Calais February 7, Alternative formulations det ( V H (A µi)v ) = 0 orthog. projection det ( (AV ) H (A µi)v ) = 0 Harmonic projection Markov chain Pb. N = 1024; m = 15 Restarted Arnoldi Restarted Harmonic Krylov Restarted Krylov SVD σ min ((A µi)v ) = 0 SVD projection log 10 of residual Numbers of iterations Calais February 7,

35 10 1 CONVD(32,32),0.05,.1) m = CONVD(32,32),0.05,0) m = 10 Restarted Arnoldi Restarted Harmonic Krylov Restarted Krylov SVD 10 0 Restarted Arnoldi Restarted Harmonic Krylov Restarted Krylov SVD log 10 of residual log 10 of residual Numbers of iterations Numbers of iterations Calais February 7, Introduction via Newton s metod Assumptions: M = A + E and Az µz Goal: to nd an improved eigenpair (µ + η, z + v). JACOBI DAVIDSON Write A(z + v) = (µ + η)(z + v) and neglect second order terms + rearrange (M µi)v ηz = r with r (A µi)z Unknowns: η and v. Underdertermined system. Need one constraint. Add the condition: w H v = 0 for some vector w.

36 In matrix form: M µi z w H 0 Eliminate v from second equation: M µi z v 0 w H (M µi) 1 z Solution: [Olsen s method] η = wh (M µi) 1 r w H (M µi) 1 z v η η = = r 0 r w H (M µi) 1 r v = (M µi) 1 (r ηz) Note: Another way to characterize the solution is: v = (M µi) 1 r + η(m µi) 1 z, η such that w H v = 0 Involves inverse of (M λi). Jacobi-Davidson rewrites solution using projectors. Let P z be a projector in the direction of z which leaves r invariant. It is of the form P z = I zsh s H z where s r. Similarly let P w any projector which leaves v inchanged. Then the Olsen s solution can be written as [P z (M µi)p w ]v = r w H v = 0 When M = A, corresponds to Newton s method for solving (A λi)u = 0 w T u = Constant The two solutions are mathematically equivalent. Calais February 7, The Jacobi-Davidson approach Automatic Multi-Level Substructuring In orthogonal projection methods (e.g. Arnoldi) we have r z Also it is natural to take w z. Assume z 2 = 1 Origin: Extention of substructuring for eigenvalue problems. Background: Domain decomposition. Let A C n n, Hermitian With the above assumptions, Olsen s correction equation is mathematically equivalent to nding v such that : (I zz H )(M µi)(i zz H )v = r v z Γ Ω Ω 1 2 A = B E E C Note: B is block-diagonal B C (n p) (n p) Main attraction: can use iterative method for the solution of the correction equation. (M -solves not explicitly required). Calais February 7, B= block-diagonal - represents local matrices - E represent coupling - C operates on interface variables. Calais February 7,

37 The problem Au = λu, can be written as: B E E C ũ B ũ S = λ Basic idea of the method for two levels First step: eliminate the blocks E, E. U = I B 1 E 0 I U AU = ũ B ũ S B 0 ; S = C E B 1 E. 0 S Original problem is equivalent to U AUu = λu Uu B 0 0 S u = λ I B 1 E E B 1 M S u ; Second step: B 0 0 S neglect the coupling in right-hand side matrix: u = λ I 0 0 M S u Bv = µ v Sw = η M S w Compute a few of the smallest eigenvalues of above problem. Third step: Build a good subspace to approximate to eigenfunctions of original problem. For projection, use basis the form ˆv i = v i i = 1,..., m B ; ŵ j = 0 where m B < (n p) and m S < p. 0 w j j = 1,..., m S, with M S = I + E B 2 E Calais February 7, Calais February 7, Then use this subspace for a Rayleigh-Ritz projection applied to B 0 0 S u B u S = λ I B 1 E E B 1 M S (Note: not the original problem.) Final step: exploit recursion NOTE: algorithm does only one shot of descent - ascent (no iterative improvement). u B u S Calais February 7, References: [1] J. K. BENNIGHOF AND R. B. LEHOUCQ, An automated multilevel substructuring method for eigenspace computation in linear elastodynamics, To appear in SIAM. J. Sci. Comput., (2003). [2] R. R. GRAIG, JR. AND M. C. C. BAMPTON, Coupling of substructures for dynamic analysis, AIAA Journal, 6 (1968), pp [3] W. C. HURTY, Vibrations of structural systems by componentmode synthesis, Journal of the Engineering Mechanics Division, ASCE, 86 (1960), pp [4] K. BEKAS AND Y. SAAD, Computation of Smallest Eigenvalues using Spectral Schur Complements, MSI technical report, Jan to appear. Calais February 7,

38 Spectral Schur complements Can interpret AMLS in terms of Schur complements. Start with B E E C u B u S = λ For λ / Λ(B) de ne S(λ) = C E (B λi) 1 E When λ / Λ(B) then λ Λ(A) λ Λ(S(λ)), i.e., iff S(λ)u S = λu S Observation: The Schur complement problem solved by AMLS can be viewed as the problem resulting from rst order approximation of S(λ) around λ = 0. u B u S Calais February 7, The standard expansion of the resolvent (B λi) 1 = B 1 (λb 1 ) k = λ k B k 1, k=0 around λ = 0, leads to the series S(λ) = C E ( B 1 + λb 2 + λ 2 B ) E = S λ k E B k 1 E Zeroth order approximation [ shift-and-invert with zero shift] Su S = λu S First order approximation [AMLS] Su S = λ(i + E B 2 E)u S Second order approximation [See Bekas and YS 04] Su S = λ(i + E B 2 E + λe B 3 E)u S k=0 k=1 Calais February 7, Approximating the eigenvectors Let λ, u S be an eigenpair of the nonlinear eigenvalue problem i.e., such that: S(λ)u S associated eigenvector: (B λi) 1 Eu S u S = λu S Then, λ is an eigenvalue of A with = I (B λi) 1 E 0 I } {{ } U(λ) 0 u S the space of approximants is spanned by the family of vectors: in which v B i are eigenvectors of B associated with the smallest eigenvalues v B i 0, B 1 Eu S j u S j = U(0) When λ is small, then U(λ) U(0) some simple bounds can obtained for the distance between this space of approximants and exact eigenvectors of A. 0 u S, AMLS approximates the exact prolongator U(λ) by U(0) U; It then adds approximate eigenvectors from B to construct a subspace of approximants to perform a projection process. Calais February 7, Calais February 7,

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Solution of eigenvalue problems Introduction motivation Projection methods for eigenvalue problems Subspace iteration, The symmetric Lanczos algorithm Nonsymmetric Lanczos procedure; Implicit restarts