AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 19: More on Arnoldi Iteration; Lanczos Iteration Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 17

Outline 1 More on Arnoldi Iteration 2 Lanczos Iterations Xiangmin Jiao Numerical Analysis I 2 / 17

Review: Arnoldi Iteration Arnoldi iteration reduces a general, nonsymmetric matrix A to Hessenberg form by similarity transformation A = QHQ Start with a random q 1. Let Q k = [q 1 q 2 q k ] be m k matrix with first k columns of Q and H k be (k + 1) k upper-left section of H, i.e., H k = H 1:k+1,1:k kth columns of AQ k = Q k+1 H k can be written as where h ik = q i Aq k. Aq k = h 1k q 1 + + h kk q k + h k+1,k q k+1 Xiangmin Jiao Numerical Analysis I 3 / 17

Review: Arnoldi Algorithm Algorithm: Arnoldi Iteration given random nonzero b, let q 1 = b/ b for k =1, 2, 3,... v = Aq k for j = 1 to k h jk = q j v v = v h jk q j h k+1,k = v q k+1 = v/h k+1,k Xiangmin Jiao Numerical Analysis I 4 / 17

QR Factorization of Krylov Matrix Vectors q j from Arnoldi iteration are orthonormal bases of successive Krylov subspaces K k = b, Ab,..., A k 1 b = q 1, q 2,..., q k C m assuming h k+1,k 0 Q k is reduced QR factorization K k = Q k R k of Krylov matrix K k = b Ab A k 1 b However, K k and R k are not formed explicitly; forming them explicitly would be unstable and can suffer from overflow and underflow Xiangmin Jiao Numerical Analysis I 5 / 17

Comparison Against QR Algorithm QR factorization of Krylov subspace gives intuitive explanation of Arnoldi iteration This is analogous to using simultaneous iteration to explain QR algorithm for computing eigenvalues quasi-direct iterative intuitive but unstable simultaneous iteration QR of Krylov subspace subtle but stable QR algorithm Arnoldi Xiangmin Jiao Numerical Analysis I 6 / 17

Projection onto Krylov Subspaces Arnoldi process computes projections of A onto successive Krylov subspaces H k = Q k AQ k because AQ k = Q k+1 Hk, H k = Q k+1 AQ k, and H k = H 1:k,1:k H k can be interpreted as orthogonal projection of A onto K k in the basis {q 1, q 2,..., q k }, restricting mapping A : C m C m to H k : K k K k. This kind of projection is known as Rayleigh-Ritz procedure Arnoldi iteration is useful as 1 basis for iterative algorithms (such as GMRES, to be discussed later) 2 technique for estimating eigenvalues of nonhermitian matrices Caution: eigenvalues of nonmormal matrices may have little to do with physical system, since eigenvalues of such equations are ill-conditioned. When such problems arise, the original problem is mostly likely posed improperly Xiangmin Jiao Numerical Analysis I 7 / 17

Estimating Eigenvalues by Arnoldi Iteration Diagonal entries of H k are Rayleigh quotients of A w.r.t. vectors q i H k is generalized Rayleigh quotient w.r.t Q k, whose eigenvalues {θ j } are called Arnoldi estimates or Ritz values w.r.t. K k of A Ritz vectors corresponds to θ j are Q k y j, where H k y j = θ j y j To use Arnoldi iteration to estimate eigenvalues, compute eigenvalues of H k at kth step When k = m, Ritz values are eigenvalues In general, k m, so we can only estimate only a few eigenvalues Which eigenvalues? Typically, it finds extreme eigenvalues first In many applications, extreme eigenvalues are of main interests Stability analysis typically requires estimating spectral radius Principal component analysis requires estimating largest eigenvalues and corresponding eigenvectors of A T A Xiangmin Jiao Numerical Analysis I 8 / 17

Invariance Properties of Arnoldi Iteration Theorem Let the Arnoldi iteration be applied to matrix A C m m as described above. Translation invariance. If A is changed to A + σi for some σ C, and b is unchanged, then Ritz values {θ j } change to {θ j + σ}. Scale invariance. If A is changed to σa for some σ C, and b is unchanged, then {θ j } change to {σθ j }. Invariance under unitary similarity transformation. If A is changed to UAU for some unitary matrix U, and b is changed to Ub, then {θ j } do not change. In all three cases, the Ritz vectors, namely Q k y k corresponding to eigenvectors y j of H k do not change under indicated transformation. Xiangmin Jiao Numerical Analysis I 9 / 17

Convergence of Arnoldi Iteration If A has n distinct eigenvalues, Arnoldi iteration finds them all in n steps Under certain circumstances, convergence of some Arnoldi estimates is geometric (i.e., linear), and it accelerates in later iterations However, these matters are not yet fully understood 10 0 10-4 10-8 10-12 0 25 50 Example convergence of extreme Arnoldi eigenvalue estimation. Xiangmin Jiao Numerical Analysis I 10 / 17

Outline 1 More on Arnoldi Iteration 2 Lanczos Iterations Xiangmin Jiao Numerical Analysis I 11 / 17

Lanczos Iteration for Symmetric Matrices For symmetric A, H k and H k in Arnoldi iteration are tridiagonal We denote them by T k and T k, respectively. Let α k = h kk and β k = h k+1,k = h k,k+1 AQ k = Q k+1 H k can then be written as three-term recurrence Aq k = β k 1 q k 1 + α k q k + β k q k+1 where α i are diagonal entries and β i are sub-diagonal entries of T k α 1 β 1 β 1 α 2 β 2 T n =. β 2 α.. 3...... βn 1 β n 1 Arnoldi iteration for symmetric matrices is known as Lanczos iteration α n Xiangmin Jiao Numerical Analysis I 12 / 17

Algorithm of Lanczos Iteration Algorithm: Lanczos Iteration β 0 = 0, q 0 = 0 given random b, let q 1 = b/ b for k = 1, 2, 3,... v = Aq k α k = q k v v = v β k 1 q k 1 α k q k β k = v q k+1 = v/β k Each step consists of matrix-vector multiplication, an inner product, and a couple of vector operations This is particularly efficient for sparse matrices. In practice, Lanczos iteration is used to compute eigenvalues of large symmetric matrices Like Arnoldi iteration, Lanczos iteration is useful as 1 basis for other iterative algorithms (such as conjugate gradient) 2 technique for estimating eigenvalues of Hermitian matrices Xiangmin Jiao Numerical Analysis I 13 / 17

Estimating Eigenvalues by Lanczos Iterations For symmetric matrices with evenly spaced eigenvalues, Ritz values tend to first convert to extreme eigenvalue. 3 2 1 0 0 5 10 15 20 Ritz values for first 20 steps for Lanczos iteration applied to example 203 203 matrix. Convergence of extreme eigenvalues is geometric. Xiangmin Jiao Numerical Analysis I 14 / 17

Effect of Rounding Errors Rounding errors have complex effects on Lanczos iteration and all iterations based on three-term recurrence Rounding errors cause loss of orthogonality of q 1, q 2,..., q k In Arnoldi iteration, vectors q 1, q 2,..., q k are enforced to be orthogonal by explicit modified Gram-Schmidt orthogonalization, which suffer some but not as much loss of orthogonality In Lanczos iteration, orthogonality of qj, q j 1 and q j 2 are enforced, but orthogonality of q j with q j 3,..., q 1 are automatic, based on mathematical identities In practice, such mathematical identities are not accurately preserved in the presence of rounding errors In practice, periodic re-orthogonalization of Q k is sometimes used to alleviate effect of rounding errors Xiangmin Jiao Numerical Analysis I 15 / 17

Rounding Errors and Ghost Eigenvalues With rounding errors, Lanczos iteration can suffer from loss of orthogonality and can in turn lead to spurious ghost eigenvalues 3 2 1 0 0 40 80 120 Continuation to 120 steps of Lanczos iteration. Numbers indicate multiplicities of Ritz values. 4 ghost copies of 3.0 and 2 ghost copies of 2.5 appear. Xiangmin Jiao Numerical Analysis I 16 / 17

Explanation of Ghost Eigenvalues Intuitive explanation of ghost eigenvalues Convergence of Ritz value annihilates corresponding eigenvector components in the vector being operated upon With rounding errors, random noise re-introduce and excite those components again We cannot trust multiplicities of Ritz values as those of eigenvalues Nevertheless, Lanczos iteration can still be very useful in practice E.g., in PCA for dimension reduction in data analysis, one needs to find leading singular values and corresponding singular vectors of A. One standard approach is to apply Lanczos iteration to A T A or AA T without forming the product explicitly, and then use Ritz vectors to approximate singular vectors Xiangmin Jiao Numerical Analysis I 17 / 17