Computing Eigenvalues and/or Eigenvectors;Part 1, Generalities and symmetric matrices Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo November 8, 2009
Today Given a matrix A C n,n. Finding the eigenvalues using the characteristic polynomial? Perturbation theory Reduction to Hessenberg form Sylvester s inertia theorem Find one or more selected eigenvalues of a symmetric, tridiagonal matrix Find one or more selected eigenvectors (next time) Find all eigenvalues and eigenvectors (next time)
Eigenvalues and Characteristic Polynomial The eigenvalues of A C n,n are the n roots of the characteristic polynomial π A (λ) := det(a λi) = 0. π A (λ) is of exact degree n Except for some special matrices the eigenvalues must be found numerically. Recall the notation σ(a), the set of eigenvalues of A.
Characteristic Polynomial Possible method: Compute the characteristic polynomial π A (λ) and apply a numerical method like Newton s method to find one or more of its roots. Not suitable as an all purpose method. Reason: A small change in one of the coefficients of π A (λ) can lead to a large change in the roots of the polynomial Example:π A (λ :) = λ 16. q(λ) = λ 16 10 16. Roots of π A are all equal to zero. Roots of q are λ j = 10 1 e 2πij/16, j = 1,..., 16. The roots of q have absolute value 0.1 Computed roots can be very inaccurate. Better to work directly with the matrix.
Gerschgorins circle theorem Where are the eigenvalues? Theorem Suppose A C n,n. Define for i, j = 1, 2,..., n R i = {α C : α a ii r i }, r i := C j = {z C : z a jj c j }, c j := n a ij, j=1 j i n a ij. i=1 i j Then any eigenvalue of A lies in R C where R = R 1 R 2 R n and C = C 1 C 2 C n. If A H = A then C i = R i = [a ii r i, a ii + r i ].
Examples Locate eigenvalues λ for A = A is symmetric. λ 1 = 3 and λ 2 = 1. [ 2 ] 1 1 2 R 1 = R 2 = [2 1, 2 + 1] = [1, 3] = R, so λ [1, 3], not bad. T = tridiag( 1, 2, 1) R m,m. R 1 = R m = [1, 3], R i = [0, 4], i = 2, 3,..., m 1, so R = [0, 4]. [ 2 jπ λ j = 4 sin 2(m+1)] (0, 4), j = 1, 2,..., m.
Proof of Gerschgorin Suppose (λ, x) is an eigenpair for A. We claim that λ R i, where i is such that x i = x. Indeed, Ax = λx implies that j a ijx j = λx i or (λ a ii )x i = j i a ijx j. Dividing by x i and taking absolute values we find λ a ii = j i a ij x j /x i j i a ij x j /x i j i a ij = r i Thus λ R i. Since λ is also an eigenvalue of A T, it must be in one of the row disks of A T. But these are the column disks C j of A. Hence λ C j for some j.
Distinct circles Sometimes some of the Gerschgorin disks are distinct and we have Corollary If p of the Gerschgorin row disks are distinct from the others, the union of these disks contains precisely p eigenvalues. The same result holds for the column disks. 1 ɛ 1 ɛ 2 A = ɛ 3 2 ɛ 4 where ɛ j 10 10 ɛ 5 ɛ 6 3 λ j j 2 10 10 for j = 1, 2, 3
Perturbation of eigenvalues; Example 1 A 0 := 0, λ σ(a 0 + E) = σ(e), λ E Any eigenvalue of A 0 is perturbed by at most E (why?).
Perturbation of eigenvalues; Example 2 A 1 := diag(ones(n 1, 1), 1), E := ɛe n e T 1. 0 1 0 0 0 0 0 0 A 1 := 0 0 1 0 0 0 0 1, E := 0 0 0 0 0 0 0 0 0 0 0 0 ɛ 0 0 0 The characteristic polynomial det(a 1 + E λi) is π(λ) := ( 1) n (λ n ɛ) (show this), The eigenvalues of A 1 are perturbed by the amount λ = E 1/n. Thus, for n = 16, a perturbation of say ɛ = 10 16, gives a change in eigenvalue of 0.1.
The factor 1/n Theorem (Elsner s Theorem) Suppose A, E C n,n. To every µ σ(a + E) there is a λ σ(a) such that µ λ ( A 2 + A + E 2 ) 1 1/n E 1/n 2. (1)
Proof Suppose A has eigenvalues λ 1,..., λ n. Use det(a) = λ1 λ n. U H U = 1 det(u) = 1. det(a) n j=1 a j 2, Hadamard s inequality. Details of proof will be given on the blackboard. If you did not come to the lecture then you should study the notes.
Can we improve on E 1/n 2? Recall perturbation for linear systems: Ax = b and Ay = b + e y x p x p K p (A) e p b p, where K p (A) := A p A 1 p. The relative error y x p x p in y as an approximation to x can possibly be K p (A) as large as the relative error e p b p the right hand side b. But the perturbation term e p is only raised to the first power. in
The Eigenvector Matrix Theorem Suppose A C n,n has linearly independent eigenvectors {x 1,..., x n } and let X = [x 1,..., x n ] be the eigenvector matrix. If (µ, x) is an eigenpair for A + E, then we can find an eigenvalue λ of A such that If A is symmetric then λ µ K p (X) E p, 1 p. (2) λ µ E 2. (3)
Two observations The eigenvalue problem for symmetric matrices is well conditioned. It is difficult or sometimes impossible to compute accurate eigenvalues and eigenvectors of matrices with linearly dependent eigenvectors or almost linearly dependent eigenvectors.
Reduction to Upper Hessenberg using orthonormal similarity transformations A = [ x x x x x x ] x x x x x x x x x x x x Q T AQ x x x x x x B = x x x x x x x x x x x x [ x x x x x x ] x x x x x x 0 x x x x x 0 0 x x x x 0 0 0 x x x 0 0 0 0 x x Before attempting to find eigenvalues and eigenvectors of a matrix (exceptions are made for certain sparse matrices), it is often advantageous to reduce it by similarity transformations to a simpler form. Orthonormal similarity transformations are particularly important since they are insensitive to noise in the elements of the matrix. Use Householder transformations.
The Householder reduction b 11 x x x x x x x x x x x x x x x H 1 A 1 H 1 b 11 b 12 x x b 21 b 22 x x 0 x x x 0 x x x [ ] [ ] B1 C A 1 = 1 B2 C, A D 1 E 2 = 2 1 D 2 E [ ] [ 2 ] 1 0 I 0 H 1 = H 0 V 2 = 1 0 V 2 H 2 A 2 H 2 b 11 b 12 b 13 b 14 b 21 b 22 b 23 b 24 0 b 32 b 33 b 34 0 0 b 43 b 44 [ B3 C A 3 = 3 D 3 E 3 B k R k,k is upper Hessenberg and D k = [0, 0,..., 0, d k ] R n k,k. [ Let H k = I 0 0 V k ], where V k = I v k v T k Rn k,n k is a Householder transformation such that V k d k = α k e 1, where α 2 k = dt k d k. ].
Reduction [ ] [ Ik 0 Bk C A k+1 = H k A k H k = k 0 V k D k E k [ ] Bk C = k V k. V k D k V k E k V k ] [ Ik 0 0 V k ] V k = I v k v T k, V k C = C v k (v T k C), CV k = C (Cv k )v T k. V k E k : C := A k (k+1:n, k+1:n), A k (k+1:n, k+1:n) = C v k (v T k [ ] C) Ck V V k E k : C := A k (1:n, k+1:n), k A k+1 (1:n, k+1:n) = C (Cv k )v T k.
Algorithm function [L,B] = hesshousegen(a) n=length(a); L=zeros(n,n); B=A; for k=1:n-2 [v,b(k+1,k)]=housegen(b(k+1:n,k)); L(k+1:n,k)=v; B(k+2:n,k)=zeros(n-k-1,1); C=B(k+1:n,k+1:n); B(k+1:n,k+1:n)=C-v*(v *C); C=B(1:n,k+1:n); B(1:n,k+1:n)=C-(C*v)*v ; end [u,a]=housegen(x) computes a = α and the vector u so that (I uu T )x = αe 1. v k stored under the kth diagonal in L.
Complexity: 5 Gaussian eliminations Suppose A R m,n, u R m and v R n. The computation of A u(u T A) and A (Av)v T both cost O(4mn) flops. C v (v T C), C C n k,n k C (C v) v T, C C n,n k n (4(n 0 k)2 + 4n(n k)) dk = 10n 3 /3 = 5(2n 3 /3)
The symmetric case If A 1 = A is symmetric, the matrix A n 1 will also be symmetric since A T k = A k implies A T k+1 = (H k A k H k ) T = H k A T k H k = A k+1. Since A n 1 is upper Hessenberg and symmetric, it must be tridiagonal. Thus the algorithm above reduces A to symmetric tridiagonal form if A is symmetric.
Symmetric tridiagonal A T = A R n,n eigenvalues λ 1 λ 2 λ n. Using Householder similarity transformations we can assume that A is symmetric and tridiagonal. d 1 c 1 c 1 d 2 c 2 A =.......... (4) c n 2 d n 1 c n 1 c n 1 d n
Split tridiagonal A into irreducible components A is reducible if c i = 0 for at least one i. Example: Suppose n = 4 and c 2 = 0 A = d 1 c 1 0 0 c 1 d 2 0 0 0 0 d 3 c 3 0 0 c 3 d 4 = [ A1 0 0 A 2 ]. The eigenvalues of A are the union of the eigenvalues of A 1 and A 2. Thus if A is reducible then the eigenvalue problem can be split into smaller irreducible problems. So assume that A is irreducible.
Lemma An irreducible, tridiagonal and symmetric matrix A R n,n has n real and distinct eigenvalues. Idea of Proof the eigenvalues are real. Define for x R the polynomial p k (x) := det(xi A(1:k, 1:k)) for k = 1,..., n. One shows by induction on k that the roots z 1,..., z k 1 of p k 1 separates the roots y 1,..., y k of p k. y 1 < z 1 < y 2 < z 2 < z k 1 < y k.
The inertia theorem We say that two matrices A, B C n,n are congruent if A = E H BE for some nonsingular matrix E C n,n. Let π(a), ζ(a) and υ(a) denote the number of positive, zero and negative eigenvalues of A. If A is Hermitian then π(a) + ζ(a) + υ(a) = n. Theorem (Sylvester s Inertia Theorem) If A, B C n,n are Hermitian and congruent then π(a) = π(b), ζ(a) = ζ(b) and υ(a) = υ(b).
Idea of Proof A = E H BE, Can find factorization D 1 = U H 1 AU 1, where D 1 is a diagonal matrix (why?). Can find factorization B = U 2 D 2 U H 2, where D 2 is a diagonal matrix (why?). D 1 = U H 1 AU 1 = U H 1 E H BEU 1 = U H 1 E H U 2 D 2 U H 2 EU 1 = F H D 2 F, F = U H 2 EU 1 is nonsingular (why?). Enough to show the theorem for diagonal matrices (why?).
Symmetric LU-factorization If A = L T DL is a symmetric LU-factorization of A then A and D are congruent. π(a) = π(d), ζ(a) = ζ(d) and υ(a) = υ(d). [ ] [ ] [ ] [ ] 1 3 1 0 1 0 1 3 = 3 4 3 1 0 5 0 1 [ ] 1 3 has one positive and one negative eigenvalue 3 4 (why?).
Corollary Suppose for some x R that A xi has a symmetric LU-factorization A xi = LDL T. Then the number of eigenvalues of A strictly less than x equals the number of negative diagonal entries in D. Explain why.
Counting eigenvalues in an interval Suppose A T = A R n,n Using for example Gerschgorin s theorem we can find an interval [a, b) containing the eigenvalues of A. For x [a, b) let ρ(x) be the number of negative diagonal entries in D in a symmetric LU-factorization of A xi. ρ(x) is the number of eigenvalues of A which are strictly less than x. ρ(a) = 0, ρ(b) = n ρ(e) ρ(d) is the number of eigenvalues in [d, e).
Approximating λ m λ 1 λ 2 λ n. Suppose 1 m n. Find λ m using interval bisection. Let c = (a + b)/2 and k := ρ(c). If k m then λ m c and λ m [a, c], while if k < m then λ m c and λ m [c, b]. Continuing with the interval containing λ m we generate a sequence {[a j, b j ]} of intervals, each containing λ m and b j a j = 2 j (b a).
function k=count(c,d,x) Suppose A = tridiag(c, d, c) is symmetric and tridiagonal with entries d 1,..., d n on the diagonal and c 1,..., c n 1 on the neighboring diagonals. For given x this function counts the number of eigenvalues of A strictly less than x.
function lambda=findeigv(c,d,m) Suppose A = tridiag(c, d, c) is symmetric and tridiagonal with entries d 1,..., d n on the diagonal and c 1,..., c n 1 on the neighboring diagonals. We first estimates an interval [a, b] containing all eigenvalues of A and then generates a sequence {[a k, b k ]} of intervals each containing λ m. We iterate until b k a k (b a)ɛ M, where ɛ M is Matlab s machine epsilon eps. Typically ɛ M 2.22 10 16.
Example Given T := tridiag( 1, 2, 1) of size 100. Estimate l 5 λ 5. Using findeigv we find l 5 = 0.024139120518486 Using Matlab s eig we find µ 5 = 0.024139120518486 Which is most accurate? Exact λ 5 = 4 sin (π 5/202) 2 = 0.024139120518487 µ 5 λ 5 = 8.6e 016 not bad! l 5 λ 5 = 3.4e 016 better!