THE MATRIX EIGENVALUE PROBLEM Find scalars λ and vectors x 0forwhich Ax = λx The form of the matrix affects the way in which we solve this problem, and we also have variety as to what is to be found. A symmetric and real (or Hermitian and complex). This is the most common case. In some cases we want only the eigenvalues (and perhaps only some ofthem); and in other cases, we also want the eigenvectors. There are special classes ofsuch A, e.g. banded, positive definite, sparse, and others. A non-symmetric, but with a diagonal Jordan canonical form. This means there is a nonsingular matrix P for which P 1 AP = D =diag[λ 1,..., λ n ]
Then AP = PD and the columns of P are the eigenvectors of A. As we see later, these matrix eigenvalue problems may be ill-conditioned. There are special subclasses ofproblems, as with the symmetric case. Note that when A is real, the complex eigenvalues (ifany) must occur in conjugate pairs. A non-symmetric and the Jordan canonical form is not diagonal. These are very difficult problems, especially when calculating the eigenvectors. GENERAL APPROACH. Begin by finding the eigenvalues. Then find the eigenvectors, ifthey are needed. Finding the eigenvalues. Proceed in two steps. (1) Reduce A to a simpler form T, usually using orthogonal similarity transformations. (2) Apply some method to finding the eigenvalues of T.
GERSCHGORIN S CIRCLE THEOREM Where are the eigenvalues of A located? that for any matrix norm, We know max λ A λ σ(a) How can this be improved? Let r i = Z i = { z C n j=1 j i a i,j, i =1,..., n z a i,i ri }, i =1,..., n The set Z i is a circle with center a i,i and radius r i. Then the eigenvalues of A are located in the union of the circles Z i : λ σ(a) λ n Z i i=1 Moreover, break this union into disjoint components, say C 1,..., C m. Then each such component contains exactly as many eigenvalues as circles Z i.
PROOF. Let Ax = λx, x 0. Letk be an index for which From Ax = λx, x = x k > 0 n j=1 a i,j x j = λx i, Solve equation k for x k : i =1,..., n ( λ ak,k ) xk = Taking absolute values, λ a k,k xk Cancel x k to get n j=1 j k n j=1 j k a k,j x j a k,j xj n j=1 j k n λ a k,k a k,j = rk j=1 j k a k,j xk
EXAMPLE Recall the matrix A = c 1 0 0 0 1 c 1 0 0 0 1 c 1 0 0 0 1 c 1 0 0 0 1 c whichisusedasanexampleattheendofchapter7. In this case, r 1 = r 5 =1; r i =2, i =2, 3, 4 The centers ofthe circles are all a i,i = c. Then the union ofthe circles Z i is the circle {z z c 2} The matrix A is real and symmetric, and thus all eigenvalues are real. Thus the eigenvalues λ must be located in the interval c 2 λ c +2
BAUER-FIKE THEOREM Assume A has a diagonal Jordan canonical form, meaning there is a nonsingular matrix P for which P 1 AP = D =diag[λ 1,..., λ n ] Assume we are using a matrix norm for which D = max 1 i n λ i Then consider the eigenvalues λ ofthe perturbed matrix A + E. Forsuchλ, wehave min λ λ i P P 1 E 1 i n PROOF. Write (A + E) x = λx, x 0 (λi A) x = Ex
Assume λ λ 1,..., λ n, as otherwise the theorem is easily true. Substitute A = PDP 1, ( λi PDP 1 ) x = Ex (λi D) ( P 1 x ) = ( P 1 EP )( P 1 x ) Take norms ofboth sides, Cancel P 1 x P 1 x = (λi D) 1 ( P 1 EP )( P 1 x ) P 1 x, 1 (λi D) 1 P 1 EP (λi D) 1 P 1 EP P 1 x Also note that by our assumption on the matrix norm, (λi D) 1 =max i 1 λ λ i = 1 min i λ λ i Then min λ λ i P 1 P EP P 1 E i This completes the proof.
Consider the case in which A is symmetric and real. Then the matrix P can be chosen to be orthogonal, and P 1 = P T. Ifwe use the matrix norm 2 induced by the Euclidean vector norm 2,thenfrom Problem 13 ofchapter 7, P 2 = 1. Thus for this particular matrix norm, min λ λ i P 1 2 EP E i 2 =sqrt [ ( )] r σ E T E Thus small changes in the matrix lead to small changes in the eigenvalues. We can also use the inequality min λ λ i P P 1 E i to define a condition number for the eigenvalue problem. For it, we would use Then cond (A) = min i inf P P 1 AP =D P 1 λ λ i cond (A) E
This says the changes in the eigenvalues are small. But there may still be a large relative change. From the book, consider the 3 3 Hilbert matrix and its version rounded to four decimal digits. H 3 = H 3 = 1 1 2 1 1 2 3 1 1 3 4 1 3 1 4 1 5 1.000.5000.3333.5000.3333.2500.3333.2500.2000 In this case, A = H 3 and A + E = H 3, or E = H 3 H 3. Using the matrix norm 2, the Bauer- Fike result says that for each eigenvalue λ of H 3, min i λ λ i E 2 =3.3 10 5
In fact the true eigenvalues of H 3 are λ 1 =1.408319, λ 2 =.1223271, λ 3 =.002687340 and the true eigenvalues of H 3 are λ 1 =1.408294, λ 2 =.1223415, λ 3 =.002664489 For the errors, λ 1 λ 1 = 2.49 10 5 λ 2 λ 2 = 1.44 10 5 λ 3 λ 3 = 2.29 10 5 which is in agreement with min λ λ i E i 2 =3.3 10 5 For the relative errors, Rel ( λ 1 ) =1.77 10 5, Rel ( λ 2 ) = 1.18 10 4 Rel ( λ 3 ) =8.5 10 3
EXAMPLE A = [ 101 90 110 98 ], A+ E = [ 100.999 90.001 110 98 For A, the eigenvalues are 1, 2. For A + E, the eigenvalues are λ. =1.298, 1.701 This is a very significant change in eigenvalues for a very small change in the matrix. It is illustrative of what can happen with the non-symmetric eigenvalue problem. ]
WIELANDT-HOFFMAN THEOREM Let A and E be real and symmetric, and let  = A + E. Let the eigenvalues of A be and let those of  be λ 1 λ 2 λ n λ 1 λ 2 λ n Then n j=1 ( λj λ j ) 2 1 2 F (E) n n 2 E i,j i=1 j=1 1 2
EXAMPLE - NONSYMMETRIC Consider the n n matrix A = 1 1 0 0 0 0 1 1 0 0........ 1 1 0 0 1 Its characteristic polynomial is f(λ) =(1 λ) n Its only eigenvalue is λ = 1; and there is only a onedimensional family of eigenvectors, all multiples of x =[1, 0,, 0] T
Now perturb the matrix to A(ɛ) = Its characteristic polynomial is 1 1 0 0 0 0 1 1 0 0........ 0 1 1 ɛ 0 0 1 f ɛ (λ) =(1 λ) n ( 1) n ɛ Its roots, and the eigenvalues of A(ɛ), are λ k (ɛ) =1+ω k ɛ 1/n, k =1,..., n with {ω k } the n th roots ofunity, Thus ω k = e 2πki/n, k =1,..., n λ k λ k (ɛ) = ɛ 1/n For n =10andɛ =10 10, λ k λ k (ɛ) =0.1.
STABILITY FOR NONSYMMETRIC MATRICES Assume the matrix A has a diagonal Jordan canonical form: P 1 AP = D =diag[λ 1,..., λ n ] Let P =[u 1,..., u n ]. Then AP = PD implies Au i = λ i u i, i =1,..., n and the vectors {u 1,..., u n } form a basis of C n. To see some ofthe nonuniqueness in the choice ofp, let F be an arbitrary nonsingular diagonal matrix, Then F =diag[f 1,..., f n ] F 1 P 1 AP F = F 1 DF (PF) 1 A (PF) = D
The matrix PF is another nonsingular matrix; and since F is diagonal, PF =[u 1,..., u n ] F =[f 1 u 1,..., f n u n ] The vectors f i u i are again eigenvectors of A. Therefore, we assume that P has been so chosen that the vectors u i all have Euclidean length 1: u i u i =1, i =1,..., n Note that because the eigenvalues can be complex, we must now work in C n ; and we also allow A to be complex. Form the complex conjugate transpose of P 1 AP = D: Write P A (P ) 1 = D =diag[λ 1,..., λ n ] (P ) 1 =[w 1,..., w n ] Then as before with A, wehave A w i = λ i w i, i =1,..., n wi A = λ iwi
The vectors wi are sometimes called left eigenvectors of A. Taking the transpose of (P ) 1 =[w 1,..., w n ] P 1 = Write out P 1 P = I to get w 1. w n [u 1,..., u n ]= w i u j = w 1. w n { 1, i = j 0, i j 1 0..... 0 1 Normalize the eigenvectors {w i } by v i = w i, i =1,..., n w i 2 giving eigenvectors of A oflength 1. Define s i = v i u i = 1 w i 2, i =1,..., n
We can write and also (P ) 1 = [ v1,, v ] n s 1 s n A v i = λ i v i, v i 2 =1, i =1,..., n With these tools, we can now do a stability analysis for isolated eigenvalues of A. In particular, assume the eigenvalue λ k is a simple eigenvalue of A. Consider what happens to it with a perturbation ofthe matrix A, namely A(ɛ) =A + ɛb, ɛ > 0 Let λ 1 (ɛ),..., λ n (ɛ) denote the perturbed eigenvalues for A(ɛ). We want to estimate λ k (ɛ) λ k.
Using the matrix P, with P 1 A(ɛ)P = P 1 (A + ɛb) P = D + ɛc C = P 1 BP = v 1 s 1. v n s n B [u 1,..., u n ] c i,j = 1 s i v i Bu j, 1 i, j n We want to prove that λ k (ɛ) =λ k + ɛ vk s Bu k + O ( ɛ 2 ) k The argument for this is given on page 598 of the text, which I omit here. Using the vector and matrix 2-norms, λ k (ɛ) λ k ɛ B s 2 + O ( ɛ 2 ) k since u 2 = v 2 =1. Thusthesizeofs k is of crucial importance in determining the stability of λ k.
EXAMPLE Consider A = [ 101 90 110 98 ], P 1 AP = [ 1 0 0 2 ] P = 9 sqrt(181) 10 sqrt(181) 10 sqrt(221) 11 sqrt(221) P 1 = [ 11 sqrt (181) 10 sqrt (181) 10 sqrt (221) 9 sqrt (221) ] This defines the vectors {u 1,u 2 } and {w 1,w 2 },and thus s 1 = v 1 u 1 = 1 w 1 2 = 1 sqrt (221 181) λ 1 (ɛ) λ 1 ɛ B s 2 + O ( ɛ 2 1. = 200ɛ B 2 + O ( ɛ 2). =.005 )
ORTHOGONAL TRANSFORMATIONS Suppose we transform A using an orthogonal similarity transformation, Â = U AU, U U = I What are the vectors û i, v i, ŵ i for this new matrix? Tranform to P 1 AP = D =diag[λ 1,..., λ n ] P 1 U (U AU) U P = D (U P ) 1 Â (U P )=D This says that the columns {û i } are obtained from U P = U [u 1,..., u n ]=[U u 1,..., U u n ] Similarly, û i = U u i, v i = U v i, i =1,..., n i =1,..., n
Now consider the numbers s i which measure the sensitivity ofthe eigenvalues (provided they are simple). For the new matrix, call these numbers ŝ i.then ŝ i = v i ûi = (U v i ) (U u i ) = vi UU u i = s i Thus an orthogonal similarity transformation of A does not change these numbers {s i }, and thus the conditioning ofthe eigenvalue problem is not changed. This is a major reason for using orthogonal transformations.