Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to say, we completely avoid trying to solve (or even write down) the characteristic polynomial equation The very basic power method and its variants, which is based on linear iteration, is used to effectively approximate selected eigenvalues To determine the complete system of eigenvalues and eigenvectors, the remarkable Q R algorithm, which relies on the Gram Schmidt orthogonalization procedure, is the method of choice, and we shall close with a new proof of its convergence 81 The Power Method We have already noted the role played by the eigenvalues and eigenvectors in the solution to linear iterative systems Now we are going to turn the tables, and use the iterative system as a mechanism for approximating the eigenvalues, or, more correctly, selected eigenvalues of the coefficient matrix The simplest of the resulting computational procedures is known as the power method We assume, for simplicity, that A is a complete n n matrix Let v 1,,v n denote its eigenvector basis, and λ 1,, λ n the corresponding eigenvalues As we have learned, the solution to the linear iterative system v (k+1) = Av (k), v () = v, (81) is obtained by multiplying the initial vector v by the successive powers of the coefficient matrix: v (k) = A k v If we write the initial vector in terms of the eigenvector basis v = c 1 v 1 + + c n v n, (82) then the solution takes the explicit form given in Theorem 72, namely v (k) = A k v = c 1 λ k 1 v 1 + + c n λk n v n (83) This is not a very severe restriction Most matrices are complete Moreover, perturbations caused by round off and/or numerical inaccuracies will almost inevitably make an incomplete matrix complete 5/18/8 131 c 28 Peter J Olver
Suppose further that A has a single dominant real eigenvalue, λ 1, that is larger than all others in magnitude, so λ 1 > λ j for all j > 1 (84) As its name implies, this eigenvalue will eventually dominate the iteration (83) Indeed, since λ 1 k λ j k for all j > 1 and all k, the first term in the iterative formula (83) will eventually be much larger than the rest, and so, provided c 1, v (k) c 1 λ k 1 v 1 for k Therefore, the solution to the iterative system (81) will, almost always, end up being a multiple of the dominant eigenvector of the coefficient matrix To compute the corresponding eigenvalue, we note that the i th entry of the iterate v (k) is approximated by v (k) i c 1 λ k 1 v 1,i, where v 1,i is the ith entry of the eigenvector v 1 Thus, as long as v 1,i, we can recover the dominant eigenvalue by taking a ratio between selected components of successive iterates: v(k) i v (k 1) i λ 1, provided that v (k 1) i (85) Example 81 Consider the matrix A = 1 2 2 1 4 2 As you can check, its eigenvalues and eigenvectors are 3 9 7 λ 1 = 3, v 1 = 1 1, λ 2 = 2, v 2 = 1, λ 3 = 1, v 3 = 1 1 3 1 2 Repeatedly multiplying an initial vector v = (1,, ) T, say, by A results in the iterates v (k) = A k v listed in the accompanying table The last column indicates the ratio λ (k) = v (k) 1 /v(k 1) 1 between the first components of successive iterates (One could equally well use the second or third components) The ratios are converging to the dominant eigenvalue λ 1 = 3, while the vectors v (k) are converging to a very large multiple of the corresponding eigenvector v 1 = (1, 1, 3 ) T The success of the power method lies in the assumption that A has a unique dominant eigenvalue of maximal modulus, which, by definition, equals its spectral radius: λ 1 = ρ(a) The rate of convergence of the method is governed by the ratio λ 2 /λ 1 between the subdominant and dominant eigenvalues Thus, the farther the dominant eigenvalue lies away from the rest, the faster the power method converges We also assumed that the initial vector v () includes a nonzero multiple of the dominant eigenvector, ie, c 1 As we do not know the eigenvectors, it is not so easy to guarantee this in advance, although one must be quite unlucky to make such a poor choice of initial vector (Of course, the stupid choice v () = is not counted) Moreover, even if c 1 happens to be initially, 5/18/8 132 c 28 Peter J Olver
k v (k) λ (k) 1 1 1 1 3 1 2 7 11 27 7 3 25 17 69 35714 4 79 95 255 316 5 241 29 693 356 6 727 791 2247 3166 7 2185 257 6429 355 8 6559 6815 19935 318 9 19681 19169 58533 36 1 5947 671 178167 32 11 177145 17597 529389 31 12 531439 535535 1598415 3 numerical round-off error will typically come to one s rescue, since it will almost inevitably introduce a tiny component of the eigenvector v 1 into some iterate, and this component will eventually dominate the computation The trick is to wait long enough for it to show up! Since the iterates of A are, typically, getting either very large when ρ(a) > 1 or very small when ρ(a) < 1 the iterated vectors will be increasingly subject to numerical over- or under-flow, and the method may break down before a reasonable approximation is achieved One way to avoid this outcome is to restrict our attention to unit vectors relative to a given norm, eg, the Euclidean norm or the norm, since their entries cannot be too large, and so are less likely to cause numerical errors in the computations As usual, the unit vector u (k) = v (k) 1 v (k) is obtained by dividing the iterate by its norm; it can be computed directly by the modified iterative scheme u () = v() v (), and u(k+1) = Au(k) Au (k) (86) If the dominant eigenvalue λ 1 > is positive, then u (k) u 1 will converge to one of the two dominant unit eigenvectors (the other is u 1 ) If λ 1 <, then the iterates will switch back and forth between the two eigenvectors, so u (k) ±u 1 In either case, the dominant eigenvalue λ 1 is obtained as a limiting ratio between nonzero entries of Au (k) and u (k) If some other sort of behavior is observed, it means that one of our assumptions is not valid; either A has more than one dominant eigenvalue of maximum modulus, eg, it has a complex conjugate pair of eigenvalues of largest modulus, or it is not complete Example 82 For the matrix considered in Example 81, starting the iterative scheme (86) with u (k) = ( 1,, ) T by A, the resulting unit vectors are tabulated below 5/18/8 133 c 28 Peter J Olver
k u (k) λ 1 1 315 315 945 1 2 2335 3669 95 7 3 3319 2257 9159 35714 4 2788 3353 8999 316 5 3159 274 984 356 6 2919 3176 922 3166 7 38 2899 961 355 8 2973 389 935 318 9 344 2965 952 36 1 2996 348 941 32 11 328 2993 948 31 12 37 33 943 3 The last column, being the ratio between the first components of Au (k 1) and u (k 1), again converges to the dominant eigenvalue λ 1 = 3 Variants of the power method for computing the other eigenvalues of the matrix are explored in the exercises 82 The Q R Algorithm The most popular scheme for simultaneously approximating all the eigenvalues of a matrix A is the remarkable QR algorithm, first proposed in 1961 by Francis, [18], and Kublanovskaya, [32] The underlying idea is simple, but surprising The first step is to factor the matrix A = A = Q R into a product of an orthogonal matrix Q and a positive (ie, with all positive entries along the diagonal) upper triangular matrix R by using the Gram Schmidt orthogonalization procedure Next, multiply the two factors together in the wrong order! The result is the new matrix A 1 = R Q We then repeat these two steps Thus, we next factor A 1 = Q 1 R 1 using the Gram Schmidt process, and then multiply the factors in the reverse order to produce A 2 = R 1 Q 1 5/18/8 134 c 28 Peter J Olver
The complete algorithm can be written as A = Q R, A k+1 = R k Q k = Q k+1 R k+1, k =, 1, 2,, (87) where Q k, R k come from the previous step, and the subsequent orthogonal matrix Q k+1 and positive upper triangular matrix R k+1 are computed by using the numerically stable form of the Gram Schmidt algorithm The astonishing fact is that, for many matrices A, the iterates A k V converge to an upper triangular matrix V whose diagonal entries are the eigenvalues of A Thus, after a sufficient number of iterations, say k, the matrix A k will have very small entries below the diagonal, and one can read off a complete system of (approximate) eigenvalues along its diagonal For each eigenvalue, the computation of the corresponding eigenvector can be done by solving the appropriate homogeneous linear system, or by applying the shifted inverse power method 2 1 Example 83 Consider the matrix A = The initial Gram Schmidt factorization A = Q R 2 3 yields Q = ( 771 771 771 771 ), R = These are multiplied in the reverse order to give 4 A 1 = R Q = 1 1 28284 28284 14142 We refactor A 1 = Q 1 R 1 via Gram Schmidt, and then reverse multiply to produce 971 2425 41231 2425 Q 1 =, R 2425 971 1 =, 971 4588 7647 A 2 = R 1 Q 1 = 2353 9412 The next iteration yields 9983 579 4656 79 Q 2 =, R 579 9983 2 =, 9839 4178 9431 A 3 = R 2 Q 2 = 569 9822 Continuing in this manner, after 9 iterations we find, to four decimal places, 1 4 1 Q 9 =, R 1 9 =, A 1 1 = R 9 Q 9 = ( 4 1 1 The eigenvalues of A, namely 4 and 1, appear along the diagonal of A 1 Additional iterations produce very little further change, although they can be used for increasing the accuracy of the computed eigenvalues 5/18/8 135 c 28 Peter J Olver )
If the original matrix A happens to be symmetric and positive definite, then the limiting matrix A k V = Λ is, in fact, the diagonal matrix containing the eigenvalues of A Moreover, if, in this case, we recursively define S k = S k 1 Q k = Q Q 1 Q k 1 Q k, (88) then S k S have, as their limit, an orthogonal matrix whose columns are the orthonormal eigenvector basis of A Example 84 Consider the symmetric matrix A = 2 1 1 3 1 The initial A = Q R factorization produces 1 6 8944 482 1826 22361 22361 4472 S = Q = 4472 8165 3651, R = 24495 3266, 482 9129 51121 and so 3 1954 A 1 = R Q = 1954 33333 287 287 46667 We refactor A 1 = Q 1 R 1 and reverse multiply to produce 9393 2734 271 71 44 5623 Q 1 = 343 7488 5672, S 1 = S Q 1 = 71 2686 6615, 638 7972 14 8569 4962 31937 21723 7158 37451 11856 R 1 = 34565 4384, A 2 = R 1 Q 1 = 11856 5233 15314 25364 15314 2219 Continuing in this manner, after 1 iterations we find 1 67 753 5667 825 Q 1 = 67 1 1, S 1 = 3128 7679 5591, 1 1 9468 2987 1194 63229 647 63232 224 R 1 = 33582 6, A 11 = 224 33581 2 13187 2 13187 After 2 iterations, the process has completely settled down, and Q 2 = 1 71 5672 825 1, S 2 = 369 772 559, 1 9491 2915 1194 63234 1 R 2 = 33579, A 21 = 63234 33579 13187 13187 The eigenvalues of A appear along the diagonal of A 21, while the columns of S 2 are the corresponding orthonormal eigenvector basis, listed in the same order as the eigenvalues, both correct to 4 decimal places 5/18/8 136 c 28 Peter J Olver
v u u H v Figure 81 Elementary Reflection Matrix Tridiagonalization In practical implementations, the direct Q R algorithm often takes too long to provide reasonable approximations to the eigenvalues of large matrices Fortunately, the algorithm can be made much more efficient by a simple preprocessing step The key observation is that the Q R algorithm preserves the class of symmetric tridiagonal matrices, and, moreover, like Gaussian Elimination, is much faster when applied to this class of matrices Consider the Householder or elementary reflection matrix H = I 2uu T (89) in which u is a unit vector (in the Euclidean norm) The matrix H represents a reflection of vectors through the orthogonal complement to u, as illustrated in Figure 81 It is easy to shoe that H is a symmetric orthogonal matrix, and so H T = H, H 2 = I, H 1 = H (81) The proof is straightforward: symmetry is immediate, while H H T = H 2 = ( I 2uu T ) ( I 2uu T ) = I 4uu T + 4u (u T u)u T = I since, by assumption, u T u = u 2 = 1 In Householder s approach to the Q R factorization, we were able to convert the matrix A to upper triangular form R by a sequence of elementary reflection matrices Unfortunately, this procedure does not preserve the eigenvalues of the matrix the diagonal entries of R are not the eigenvalues and so we need to be a bit more clever here Lemma 85 If H = I 2uu T is an elementary reflection matrix, with u a unit vector, then A and B = HAH are similar matrices and hence have the same eigenvalues Proof : According to (81), H 1 = H, and hence B = H 1 AH is similar to A QED 5/18/8 137 c 28 Peter J Olver
Given a symmetric n n matrix A, our goal is to devise a similar tridiagonal matrix by applying a sequence of Householder reflections We begin by setting a 21 ±r x 1 = a 31, y 1 = 1, where r 1 = x 1 = y 1, a n1 so that x 1 contains all the off-diagonal entries of the first column of A Let H 1 = I 2u 1 u T 1, where u 1 = x 1 y 1 x 1 y 1 be the corresponding elementary reflection matrix that maps x 1 to y 1 Either ± sign in the formula for y 1 works in the algorithm; a good choice is to set it to be the opposite of the sign of the entry a 21, which helps minimize the possible effects of round-off error when computing the unit vector u 1 By direct computation, a 11 r 1 r 1 ã 22 ã 23 ã 2n A 2 = H 1 A H 1 = ã 32 ã 33 ã 3n (811) ã n2 ã n3 ã nn for certain ã ij ; the explicit formulae are not needed Thus, by a single Householder transformation, we convert A into a similar matrix A 2 whose first row and column are in tridiagonal form We repeat the process on the lower right (n 1) (n 1) submatrix of A 2 We set x 2 = ã 32 ±r ã 42, y 1 = 2 ã n2 and the ± sign is chosen to be the opposite of that of ã 32 Setting we construct the similar matrix, where r 2 = x 2 = y 2, H 2 = I 2u 2 u T 2, where u 2 = x 2 y 2 x 2 y 2, A 3 = H 2 A 2 H 2 = a 11 r 1 r 1 ã 22 r 2 r 2 â 33 â 34 â 3n â 43 â 44 â 4n â n3 â n4 â nn 5/18/8 138 c 28 Peter J Olver
whose first two rows and columns are now in tridiagonal form The remaining steps in the algorithm should now be clear Thus, the final result is a tridiagonal matrix T = A n that has the same eigenvalues as the original symmetric matrix A Let us illustrate the method by an example 4 1 1 2 1 4 1 1 Example 86 To tridiagonalize A =, we begin with its first 1 1 4 1 2 1 1 4 1 6 24495 column We set x 1 =, so that y 1 1 = Therefore, the unit 2 vector is u 1 = x 1 y 1 x 1 y 1 = 8391, with corresponding Householder matrix 2433 4865 1 H 1 = I 2u 1 u T 1 = 482 482 8165 482 8816 2367 8165 2367 5266 Thus, In the next phase, x 2 = The resulting matrix is now in tridiagonal form 4 24495 24495 23333 3865 8599 A 2 = H 1 A H 1 = 3865 4944 1246 8599 1246 47227 3865 8599, y 2 = 9428, so u 2 = 1 H 2 = I 2u 2 u T 2 = 1 41 9121 9121 41 8396 5431 4 24495 24495 23333 9428 T = A 3 = H 2 A 2 H 2 = 9428 46667 5, and Since the final tridiagonal matrix T has the same eigenvalues as A, we can apply the QR algorithm to T to approximate the common eigenvalues (The eigenvectors must then be computed separately, eg, by the shifted inverse power method) If A = A 1 is 5/18/8 139 c 28 Peter J Olver
tridiagonal, so are all the iterates A 2, A 3, Moreover, far fewer arithmetic operations are required For instance, in the preceding example, after we apply 2 iterations of the QR algorithm directly to T, the upper triangular factor has become R 2 = 6 65 45616 5 4384 The eigenvalues of T, and hence also of A, appear along the diagonal, and are correct to 4 decimal places Finally, even if A is not symmetric, one can still apply the same sequence of Householder transformations to simplify it The final result is no longer tridiagonal, but rather a similar upper Hessenberg matrix, which means that all entries below the subdiagonal are zero, but those above the superdiagonal are not necessarily zero For instance, a 5 5 upper Hessenberg matrix looks like, where the starred entries can be anything It can be proved that the QR algorithm maintains the upper Hessenberg form, and, while not as efficient as in the tridiagonal case, still yields a significant savings in computational effort required to find the common eigenvalues Further details and analysis can be found in [13, 47, 54] 5/18/8 14 c 28 Peter J Olver