Krylov Space Methods Nonstationary sounds good Radu Trîmbiţaş Babeş-Bolyai University Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17
Introduction These methods are used both to solve Ax = b and to nd eigenvalues of A They assume A is accessible via a black box subroutine that returns y = Az (and perhaps y = A T z if A is nonsymmetric) no direct access or manipulation of matrix entries is used. Reasons: This is the cheapest operation on sparse matrices A may not be represented as a matrix but as a subroutine for computing Ax There exists a variety of such methods depending on A and availability of A T Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 2 / 17
Extracting information via matrix-vector multiplication Given a vector b and a subroutine for computing Ax what can we deduce about A? Compute y 1 = b,..., y m = Ay m 1 = A m 1 y 1. Let K = [y 1,..., y m ]. A K = [Ay 1,..., Ay m 1, Ay m ] = [y 2,..., y m, A m y 1 ]. (1) Assume K nonsingular, let c = K 1 A m y 1 ; A K = K [e 2, e 3,..., e m, c] K C, or 2 0 0 0 c 1 1 0 0 c 2 0 1.. K 1 AK = C =. 0.. 6 4.. 0... 1 c m 3. 7 5 Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 3 / 17
Extracting information via matrix-vector multiplication C upper Hessenberg; it is a companion matrix and its caracteristic polynomial is p(x) = x m + m i=1 c i x i 1 We can in principle nd the eigenvalues of A by nding the zeros of p(x) This simple form is not very useful Finding c requires m 1 matrix-vector multiplication by A and solving a linear system with K K is ill-conditioned equivalent to power method y i converge to a dominant ev of A; column of K tend to get more and more parallel We x this by replacing K with an orthogonal matrix Q such that the leading k columns of K and Q span the same space called a Krylov subspace. We will compute only as many leading columns of Q as needed (for the solution of Ax = b or Ax = λx) very few compare to matrix size Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 4 / 17
Extracting information via matrix-vector multiplication Let K = QR the QR decomposition of K K 1 AK = R 1 Q T A(QR) = C =) Q T AQ = RCR 1 H C and H are upper Hessenberg If A is symmetric, then H = Q T AQ is tridiagonal We compute columns of Q one at a time, rather than all of them, using the modi ed Gram-Schmidt algorithm (instead of Householder re ections). Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 5 / 17
Extracting information via matrix-vector multiplication Let Q n be the m n matrix with the rst n columns of AQ = QH, or AQ n = Q n+1 eh n : 2 3 2 3 2 3 2 3 h 11 h 1n 4 A 5 4 q 1... q n 5 = 4 q 1... q n+1 5 h 21. 6. 4.. 7. 5 The nth column of AQ n = Q n+1 H n gives Aq n = h 1n q 1 + + h nn q n + h n+1,n q n+1 q i are orthonormal, multiply both sides by q T j h n+1,n q T j Aq n = n+1 h in qj T q n = h j,n =) h n+1,n q n+1 = Aq n i=1 n h in q i i=1 Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 6 / 17
The Arnoldi algorithm Algorithm: Arnoldi Iteration for partial reduction to Hessenberg form b := arbitrary; q 1 := b/kbk; for n := 1, 2, 3,... do v := Aq n ; for j := 1 ton do h jn := q j v; v := v h jn q j ; h n+1,n := kvk; if h n+1,n = 0 then return q n+1 := v/h n+1,n ; Complexity n matrix-vector multiplications by A +O(n 2 m) other work Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 7 / 17
The Arnoldi algorithm If we stop the algorithm here, what we have learned about A? Let Q = [Q n, Q u ], where Q n = [q 1,..., q k ], Q u = [q n+1,..., q m ]; colums of Q u excepting q n+1 are unknown Then H = Q T AQ = [Q n, Q u ] T Q T A [Q n, Q u ] = n AQ n Qu T AQ n n m n n m n Hn H nu H un H u Qn T AQ u Qu T AQ u H n is upper Hessenberg, H nu has a single nonzero entry (h n+1,n ), H u, H u,n unknown Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 8 / 17
Symmetric Matrices and the Lanczos Iteration For symmetric A, H n reduces to tridiagonal T n, and q n+1 can be computed by a three-term recurrence: Aq n = β n 1 q n 1 + α n q n + β n q n+1 Algorithm: Lanczos Iteration β 0 := 0; q 0 := 0; b :=arbitrary; q 1 := b/kbk; for n := 1, 2, 3,... do v := Aq n ; {or Aq n β n 1 q n 1 for greater stability} α n := q T n v; v := v β n 1 q n 1 α n q n ; β n := kvk; if β n = 0 then return q n+1 := v/β n ; Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 9 / 17
Symmetric Matrices and the Lanczos Iteration If we stop the algorithm here, what we have learned about A? We have T = Q T AQ = [Q n, Q u ] T Q T A [Q n, Q u ] = n AQ n Qn T AQ u Qu T AQ n Qu T AQ u = n m n Tn T nu Tnu T T u n m n Tn T nu T un T u Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 10 / 17
Krylov subspace De nition The Krylov subspace K n (A, b) is b, Ab, A 2 b,..., A n 1 b. We shall write K n instead of K n (A, b) if A and b are implicit from the context. H n or T n is the projection of A onto the Krylov subspace K n. One can show that K n has dimension n i the Arnoldi or Lanczos algorithm can compute q n without quitting rst. We shall solve Ax = b and Ax = λx using only the information computed by n steps of Arnoldi or Lanczos algorithm We hope n m, so the algorithms are e cient Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 11 / 17
Krylov subspace For eigenvalues, if h n+1,n = 0, then H (or T ) is block upper triangular and eigenvalues of H n are eigenvalues of H, and therefore of A To obtain eigenvector of A we multiply eigenvectors of H n by Q n If h n+1,n is small ew and ev of H n are good approximations to the ew and ev of A Roundo errors cause a number of algorithms to behave entirely di erent from how they would in exact arithmetic. In particular Lanczos vectorscan loose their orthogonality and become almost linearly dependent. Researchers learned how to stabilize the algorithm or that the convergence occured despite instability. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 12 / 17
Solving a system using Krylov subspace We express the solution as the best approximation in the Krylov subspace of the form x n = n z k q k = Q n z, k=1 where z = [z 1,..., z n ] T We have to de ne best. There are several natural but di erent de nitions, leading to di erent algorithms. Let x = A 1 b be the exact solution and r n = b Ax n be the residual 1 x n minimizes kx n xk 2 we do not have enough information Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 13 / 17
Solving a system using Krylov subspace We express the solution as the best approximation in the Krylov subspace of the form x n = n z k q k = Q n z, k=1 where z = [z 1,..., z n ] T We have to de ne best. There are several natural but di erent de nitions, leading to di erent algorithms. Let x = A 1 b be the exact solution and r n = b Ax n be the residual 1 x n minimizes kx n xk 2 we do not have enough information 2 x n minimizes kr n k 2 A symmetric MINRES (minimum residual), A nonsymmetric (generalized minimum residual) Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 13 / 17
Solving a system using Krylov subspace We express the solution as the best approximation in the Krylov subspace of the form x n = n z k q k = Q n z, k=1 where z = [z 1,..., z n ] T We have to de ne best. There are several natural but di erent de nitions, leading to di erent algorithms. Let x = A 1 b be the exact solution and r n = b Ax n be the residual 1 x n minimizes kx n xk 2 we do not have enough information 2 x n minimizes kr n k 2 A symmetric MINRES (minimum residual), A nonsymmetric (generalized minimum residual) 3 x n makes r n? K n, i.e. Q T n r n = 0 SYMMLQ and GMRES Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 13 / 17
Solving a system using Krylov subspace We express the solution as the best approximation in the Krylov subspace of the form x n = n z k q k = Q n z, k=1 where z = [z 1,..., z n ] T We have to de ne best. There are several natural but di erent de nitions, leading to di erent algorithms. Let x = A 1 b be the exact solution and r n = b Ax n be the residual 1 x n minimizes kx n xk 2 we do not have enough information 2 x n minimizes kr n k 2 A symmetric MINRES (minimum residual), A nonsymmetric (generalized minimum residual) 3 x n makes r n? K n, i.e. Q T n r n = 0 SYMMLQ and GMRES 4 When A is SPD, it de nes a norm krk A 1 = r T A 1 r 1/2. The best x n minimizes kr n k A 1. This norm is the same as kx n xk A conjugate gradient algorithm Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 13 / 17
Minimizing Residuals Generalized Minimal RESiduals iterative method for solving Ax = b Find x n 2 K n that minimizes kr n k = kb Ax n k This is a least squares problem: Find a vector c such that kak n c bk = minimum where K n is the m n Krylov matrix QR factorization could be used to solve for c, and x n = K n c In practice the columns of K n are ill-conditioned and an orthogonal basis is used instead, produced by Arnoldi iteration Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 14 / 17
Minimal Residual with Orthogonal Basis Instead of x n = K n c set x n = Q n y, where the orthogonal columns of Q n span K n, and solve kaq n y bk = minimum For the Arnoldi iteration we showed that AQ n = Q n+1 H n : Qn+1 H n y b = minimum Left multiplication by Qn+1 does not change the norm (since both vectors are in the column space Q n+1 ): H n y Qn+1b = minimum Finally, it is clear that Qn+1 b = kbk e 1: H n y kbk e 1 = minimum Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 15 / 17
The GMRES Algorithm High-level description of the algorithm: Algorithm: GMRES q 1 := b/kbk; for n := 1, 2, 3,... do < step n of Arnoldi iteration > Find y to minimize k H n y kbke 1 k = kr n k; x n := Q n y The residual kr n k does not need to be computed explicitly from x n Least squares problem has Hessenberg structure, solve with QR factorization of H n (computed by updating the factorization of H n 1 ) Memory and cost grow with n restart the algorithm by clearing accumulated data (might stagnate the method) Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 16 / 17
Convergence of GMRES Two obvious observations based on the minimization in K n : GMRES converges monotonically and it converges after at most m steps, kr n+1 k kr n k ^ kr m k = 0 The residual r n = p n (A)b, where p n 2 P n is a degree n polynomial with p(0) = 1, so GMRES also nds a minimizing polynomial: kp n (A)bk = minimum Based on this, diagonalizable A = V ΛV 1 converges as: kr n k kbk κ(v ) inf p n 2P n kp n k Λ(A) or in words: If A has well-conditioned eigenvectors, the convergence is based on how small polynomials p n can be on the spectrum Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 17 / 17