Computing Eigenvalues and/or Eigenvectors;Part 1, Generalities and symmetric matrices

Computing Eigenvalues and/or Eigenvectors;Part 1, Generalities and symmetric matrices Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo November 9, 2008

Today Given a matrix A C n,n. Finding the eigenvalues using the characteristic polynomial? Perturbation theory Reduction to Hessenberg form Sylvester s inertia theorem Find one or more selected eigenvalues of a symmetric, tridiagonal matrix Find one or more selected eigenvectors (next time) Find all eigenvalues and eigenvectors (next time)

Eigenvalues and Characteristic Polynomial The eigenvalues of A C n,n are the n roots of the characteristic polynomial π A (λ) := det(a λi) = 0. π A (λ) is of exact degree n Except for some special matrices the eigenvalues must be found numerically.

Characteristic Polynomial Possible method: Compute the characteristic polynomial π A (λ) and apply a numerical method like Newton s method to find one or more of its roots. Not suitable as an all purpose method. Reason: A small change in one of the coefficients of π A (λ) can lead to a large change in the roots of the polynomial Example:π A (λ :) = λ 16. q(λ) = λ 16 10 16. Roots of π A are all equal to zero. Roots of q are λ j = 10 1 e 2πij/16, j = 1,..., 16. The roots of q have absolute value 0.1 Computed roots can be very inaccurate. Need to work directly with the matrix.

Gerschgorins circle theorem Where are the eigenvalues? Theorem Suppose A C n,n. Define for i, j = 1, 2,..., n R i = {α C : α a ii r i }, r i := C j = {z C : z a jj c j }, c j := n a ij, j=1 j i n a ij. i=1 i j Then any eigenvalue of A lies in R C where R = R 1 R 2 R n and C = C 1 C 2 C n. If A H = A then C i = R i = [a ii r i, a ii + r i ].

Examples Locate eigenvalues λ for A = A is symmetric. [ 2 ] 1 1 2 R 1 = R 2 = [2 1, 2 + 1] = [1, 3] = R, so λ [1, 3]. λ 1 = 3 and λ 2 = 1. In this case the smallest interval possible. Let T = tridiag( 1, 2, 1) R m,m be the second derivative matrix. R 1 = R m = [1, 3], R i = [0, 4], i = 2, 3,..., m 1, so R = [0, 4]. [ 2 jπ λ j = 4 sin 2(m+1)], j = 1, 2,..., m. λ j [δ, 4 δ], where δ = 1/(2(m + 1)) Gerschgorins theorem gives a remarkably good estimate.

proof of Gerschgorin Proof. Suppose (λ, x) is an eigenpair for A. We claim that λ R i, where i is such that x i = x. Indeed, Ax = λx implies that j a ijx j = λx i or (λ a ii )x i = j i a ijx j. Dividing by x i and taking absolute values we find λ a ii = j i a ijx j /x i j i a ij x j /x i r i Thus λ R i. Since λ is also an eigenvalue of A T, it must be in one of the row disks of A T. But these are the column disks C j of A. Hence λ C j for some j.

Distinct circles Sometimes some of the Gerschgorin disks are distinct and we have Corollary If p of the Gerschgorin row disks are distinct from the others, the union of these disks contains precisely p eigenvalues. The same result holds for the column disks. 1 ɛ 1 ɛ 2 A = ɛ 3 2 ɛ 4 where ɛ j 10 10 ɛ 5 ɛ 6 3 λ j j 2 10 10 for j = 1, 2, 3

Perturbation Analysis Recall linear systems Ax = b and Ay = b + e y x p x p K p (A) e p b p, where K p (A) := A p A 1 p. This means that the relative error y x p x p in y as an approximation to x can possibly be K p (A) as large as the relative error e p b p in the right hand side b. Consider now the eigenvalue problem. We restrict the discussion to nondefective matrices.

Absolute errors Theorem Suppose A C n,n has linearly independent eigenvectors {x 1,..., x n } and let X = [x 1,..., x n ] be the eigenvector matrix. If (µ, x) is an eigenpair for A + E, then we can find an eigenvalue λ of A such that If A is symmetric then λ µ K p (X) E p, 1 p. (1) λ µ E 2. (2)

Two observations It is difficult or sometimes impossible to compute accurate eigenvalues and eigenvectors of matrices with linearly dependent eigenvectors or almost linearly dependent eigenvectors. The eigenvalue problem for symmetric matrices is well conditioned.

Upper Hessenberg Before attempting to find eigenvalues and eigenvectors of a matrix (exceptions are made for certain sparse matrices), it is often advantageous to reduce it by similarity transformations to a simpler form. Orthogonal similarity transformations are particularly important since they are insensitive to noise in the elements of the matrix. Recall that a matrix A R n,n is upper Hessenberg if a i,j = 0 for j = 1, 2,..., i 2, i = 3, 4,..., n. ] [ x x x x x x x x x x x x 0 x x x x x 0 0 x x x x 0 0 0 x x x 0 0 0 0 x x

A matrix H R n,n of the form H := I uu T, where u R n and u T u = 2 is called a Householder transformation.

Zero out entries in a vector x Find a Householder transformation H := I uu T such that Hx = αe 1. u := { x/α e1 1 x1 /α { if x 0, 2e1, otherwise., α := x 2 if x 1 > 0 + x 2 otherwise, H = diag( 1, 1,..., 1) if α = 0. Assume α 0. u T u = (x/α e 1) T (x/α e 1 ) 1 x 1 /α Hx = x (u T x)u = (α2 /α 2x 1 /α+1 1 x 1 /α = 2 u T x = (x/α e 1) T x = xt x/α x 1 = α x 1 = α 1 x 1 /α. 1 x1 /α 1 x1 /α 1 x1 /α Hx = x (u T x)u = x α(x/α e 1 ) = αe 1.

Computing u u := { x/α e1 1 x1 /α 2e1, if x 0, otherwise. α = ± x 2 If α = 0 then u = 2e 1 exit v = x/α e 1 u = v/ v(1)

Recall Algorithm housegen To given x R n the following algorithm computes a = α and the vector u so that (I uu T )x = αe 1. Algorithm function [u,a]=housegen(x) a=norm(x); u=x; if a==0 u(1)=sqrt(2); return; end if u(1)>0 a=-a; end u=u/a; u(1)=u(1)-1; u=u/sqrt(-u(1));

Reduction to upper Hessenberg we define A 1 = A. Suppose for k 1 that A k is upper Hessenberg in its first k 1 [ columns. ] Bk C A k = k D k E k B k R k,k is upper Hessenberg and D k = [0, 0,..., 0, d k ] R n k,k. Let H k = [ I 0 0 V k ], where Vk = I v k v T k Rn k,n k is a Householder transformation such that V k d k = α k e 1, where α 2 k = dt k d k.

Reduction 2 [ ] [ Ik 0 Bk C A k+1 = H k A k H k = k 0 V k D k E k [ ] Bk C = k V k. V k D k V k E k V k ] [ Ik 0 0 V k ] Now V k D k = [V k 0,..., V k 0, V k d k ] = (0,..., 0, α k e 1 ) and the matrix B k is not affected by the H k transformation. Therefore the matrix A k+1 will be upper Hessenberg in its first k columns, and the reduction is carried one step further. The reduction stops with A n 1 which is upper Hessenberg.

O(10n 3 /3) algorithm Algorithm function [L,B] = hesshousegen(a) n=length(a); L=zeros(n,n); B=A; for k=1:n-2 [v,b(k+1,k)]=housegen(b(k+1:n,k)); L(k+1:n,k)=v; B(k+2:n,k)=zeros(n-k-1,1); C=B(k+1:n,k+1:n); B(k+1:n,k+1:n)=C-v*(v *C); C=B(1:n,k+1:n); B(1:n,k+1:n)=C-(C*v)*v ; end This algorithm uses Householder similarity transformations to reduce a matrix A R n,n to upper Hessenberg form B. Details of the transformations are stored under the diagonal in a matrix L. The entries of L can be used to assemble an orthogonal matrix Q such that B = Q T AQ. Algorithm housegen is used in each step of the reduction.

The symmetric case If A 1 = A is symmetric, the matrix A n 1 will also be symmetric since A T k = A k implies A T k+1 = (H k A k H k ) T = H k A T k H k = A k+1. Since A n 1 is upper Hessenberg and symmetric, it must be tridiagonal. Thus the algorithm above reduces A to symmetric tridiagonal form if A is symmetric.

Symmetric tridiagonal A T = A R n,n eigenvalues λ 1 λ 2 λ n. Using Householder similarity transformations we can assume that A is symmetric and tridiagonal. d 1 c 1 c 1 d 2 c 2 A =.......... (3) c n 2 d n 1 c n 1 c n 1 d n

Split tridiagonal A into irreducible components Recall that A is reducible if c i = 0 for at least one i. Example: Suppose n = 4 and c 2 = 0 A = d 1 c 1 0 0 c 1 d 2 0 0 0 0 d 3 c 3 0 0 c 3 d 4 = [ A1 0 0 A 2 ]. The eigenvalues of A are the union of the eigenvalues of A 1 and A 2. Thus if A is reducible then the eigenvalue problem can be split into smaller irreducible problems. So assume that A is irreducible. Theorem: An irreducible, symmetric, tridiagonal matrix has distinct eigenvalues.

The inertia theorem We say that two matrices A, B C n,n are congruent if A = E H BE for some nonsingular matrix E C n,n. Let π(a), ζ(a) and υ(a) denote the number of positive, zero and negative eigenvalues of A. If A is Hermitian then π(a) + ζ(a) + υ(a) = n. Theorem (Sylvester s Inertia Theorem) If A, B C n,n are Hermitian and congruent then π(a) = π(b), ζ(a) = ζ(b) and υ(a) = υ(b).

LDLT factorization If A = L T DL is an LDLT-factorization of A then A and D are congruent. π(a) = π(d), ζ(a) = ζ(d) and υ(a) = υ(d). [ ] 1 3 = 3 4 [ ] [ 1 0 1 0 3 1 0 5 ] [ ] 1 3 0 1 one positive and one negative eigenvalue.

Corollary Suppose for some x R that A xi has an LDLT-factorization A xi = LDL T. Then the number of eigenvalues of A strictly less than x equals the number of negative diagonal entries in D. υ(a αi) = υ(d) If Ax = λx then (A αi)x = (λ α)x, υ(a αi) equals the number of eigenvalues of A which are less than α.

Counting eigenvalues in an interval Suppose A T = A R n,n Using for example Gerschgorin s theorem we can find an interval [a, b) containing the eigenvalues of A. For x [a, b) let ρ(x) be the number of negative diagonal entries in D in an LDLT-factorization of A xi. ρ(x) is the number of eigenvalues of A which are strictly less than x. ρ(a) = 0, ρ(b) = n ρ(e) ρ(d) is the number of eigenvalues in [d, e).

Approximating λ m λ 1 λ 2 λ n. Suppose 1 m n. Find λ m using interval bisection. Let c = (a + b)/2 and k := ρ(c). If k m then λ m c and λ m [a, c], while if k < m then λ m c and λ m [c, b]. Continuing with the interval containing λ m we generate a sequence {[a j, b j ]} of intervals, each containing λ m and b j a j = 2 j (b a).

Fixing a possible failure The method will fail if one of the diagonal entries in D is zero or very close to zero We replace such an entry by a suitable small number, say δ k = ± c k ɛ M, where the negative sign is used if c k < 0, and ɛ M is the Machine epsilon, typically ɛ M = 2 10 16 for Matlab. This replacement is done if d k (α) < δ k.

function k=count(c,d,x) Algorithm Suppose A = tridiag(c, d, c) is symmetric and tridiagonal with entries d 1,..., d n on the diagonal and c 1,..., c n 1 on the neighboring diagonals. For given x this function counts the number of eigenvalues of A strictly less than x. We use the replacement described above if one of the d j (x) is close to zero.

function lambda=findeigv(c,d,m) Algorithm Suppose A = tridiag(c, d, c) is symmetric and tridiagonal with entries d 1,..., d n on the diagonal and c 1,..., c n 1 on the neighboring diagonals. We first estimates an interval [a, b] containing all eigenvalues of A and then generates a sequence {[a k, b k ]} of intervals each containing λ m. We iterate until b k a k (b a)ɛ M, where ɛ M is Matlab s machine epsilon eps. Typically ɛ M 2.22 10 16.

Example Given T := tridiag( 1, 2, 1) of size 100. Estimate l 5 λ 5. Using findeigv we find l 5 = 0.024139120518486 Using Matlab s eig we find µ 5 = 0.024139120518486 Which is most accurate? Exact λ 5 = 4 sin (π 5/202) 2 = 0.024139120518487 µ 5 λ 5 = 8.6e 016 not bad! l 5 λ 5 = 3.4e 016 better!