The Eigenvalue Problem: Perturbation Theory

Similar documents
Lecture 10 - Eigenvalues problem

Eigenvalue and Eigenvector Problems

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Linear Algebra: Matrix Eigenvalue Problems

Eigenvalues and Eigenvectors

Econ Slides from Lecture 7

Numerical Methods I Eigenvalue Problems

EIGENVALUE PROBLEMS. Background on eigenvalues/ eigenvectors / decompositions. Perturbation analysis, condition numbers..

The German word eigen is cognate with the Old English word āgen, which became owen in Middle English and own in modern English.

Linear Algebra 1. M.T.Nair Department of Mathematics, IIT Madras. and in that case x is called an eigenvector of T corresponding to the eigenvalue λ.

Computational Methods. Eigenvalues and Singular Values

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Chapter 3 Transformations

Lecture 11: Diagonalization

Math 108b: Notes on the Spectral Theorem

Eigenvalues and eigenvectors

AMS526: Numerical Analysis I (Numerical Linear Algebra)

THE MATRIX EIGENVALUE PROBLEM

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Linear Algebra Massoud Malek

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Final Review Written by Victoria Kala SH 6432u Office Hours R 12:30 1:30pm Last Updated 11/30/2015

EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4

Recall : Eigenvalues and Eigenvectors

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

MATH 304 Linear Algebra Lecture 34: Review for Test 2.

Matrices, Moments and Quadrature, cont d

CS 246 Review of Linear Algebra 01/17/19

Chap 3. Linear Algebra

EIGENVALUE PROBLEMS (EVP)

MAT Linear Algebra Collection of sample exams

Eigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis

Scientific Computing: An Introductory Survey

Foundations of Matrix Analysis

Notes on Eigenvalues, Singular Values and QR

Linear System Theory

Eigenvalues and Eigenvectors

Linear Algebra Practice Problems


Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

235 Final exam review questions

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra

Preface. September 14, 2016

The QR Decomposition

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

ECS231 Handout Subspace projection methods for Solving Large-Scale Eigenvalue Problems. Part I: Review of basic theory of eigenvalue problems

Review of Some Concepts from Linear Algebra: Part 2

EXAM. Exam 1. Math 5316, Fall December 2, 2012

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

UNIT 6: The singular value decomposition.

Index. for generalized eigenvalue problem, butterfly form, 211

MATH 304 Linear Algebra Lecture 23: Diagonalization. Review for Test 2.

Chapter 6. Eigenvalues. Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45

ELE/MCE 503 Linear Algebra Facts Fall 2018

Math Linear Algebra Final Exam Review Sheet

Lecture 1: Review of linear algebra

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Lecture notes: Applied linear algebra Part 2. Version 1

Linear Algebra Review

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

1. General Vector Spaces

Numerical Solution of Linear Eigenvalue Problems

Computational math: Assignment 1

Remark 1 By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

Definition (T -invariant subspace) Example. Example

MA 265 FINAL EXAM Fall 2012

1. Let m 1 and n 1 be two natural numbers such that m > n. Which of the following is/are true?

I. Multiple Choice Questions (Answer any eight)

Math 4A Notes. Written by Victoria Kala Last updated June 11, 2017

Linear Algebra - Part II

Review problems for MA 54, Fall 2004.

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Linear Algebra in Actuarial Science: Slides to the lecture

Matrix Vector Products

a 11 a 12 a 11 a 12 a 13 a 21 a 22 a 23 . a 31 a 32 a 33 a 12 a 21 a 23 a 31 a = = = = 12

Remark By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

MATH 221, Spring Homework 10 Solutions

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

SECTIONS 5.2/5.4 BASIC PROPERTIES OF EIGENVALUES AND EIGENVECTORS / SIMILARITY TRANSFORMATIONS

Review of some mathematical tools

Jordan Normal Form and Singular Decomposition

8. Diagonalization.

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Computing Eigenvalues and/or Eigenvectors;Part 1, Generalities and symmetric matrices

Linear algebra and applications to graphs Part 1

MATH 583A REVIEW SESSION #1

Online Exercises for Linear Algebra XM511

Synopsis of Numerical Linear Algebra

Diagonalizing Matrices

Lecture 7: Positive Semidefinite Matrices

Quantum Computing Lecture 2. Review of Linear Algebra

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

Section 3.9. Matrix Norm

Transcription:

Jim Lambers MAT 610 Summer Session 2009-10 Lecture 13 Notes These notes correspond to Sections 7.2 and 8.1 in the text. The Eigenvalue Problem: Perturbation Theory The Unsymmetric Eigenvalue Problem Just as the problem of solving a system of linear equations Ax = b can be sensitive to perturbations in the data, the problem of computing the eigenvalues of a matrix can also be sensitive to perturbations in the matrix. We will now obtain some results concerning the extent of this sensitivity. Suppose that A is obtained by perturbing a diagonal matrix D by a matrix F whose diagonal entries are zero; that is, A = D + F. If λ is an eigenvalue of A with corresponding eigenvector x, then we have (D λi)x + F x = 0. If λ is not equal to any of the diagonal entries of A, then D λi is nonsingular and we have Taking -norms of both sides, we obtain which yields x = (D λi) 1 F x. x = (D λi) 1 F x (D λi) 1 F x, (D λi) 1 F = max It follows that for some i, 1 i n, λ satisfies d ii λ 1 i n j=1,j =i j=1,j =i f ij. f ij d ii λ 1. That is, λ lies within one of the Gerschgorin circles in the complex plane, that has center a ii and radius r i = a ij. j=1,j =i 1

This is result is known as the Gerschgorin Circle Theorem. Example The eigenvalues of the matrix are The Gerschgorin disks are A = 5 1 1 2 2 1 1 3 7 λ(a) = {6.4971, 2.7930, 5.2902}. D 1 = {z C z 7 4}, D 2 = {z C z 2 3}, D 3 = {z C z + 5 2}. We see that each disk contains one eigenvalue. It is important to note that while there are n eigenvalues and n Gerschgorin disks, it is not necessarily true that each disk contains an eigenvalue. The Gerschgorin Circle Theorem only states that all of the eigenvalues are contained within the union of the disks. Another useful sensitivity result that applies to diagonalizable matrices is the Bauer-Fike Theorem, which states that if X 1 AX = diag(λ 1,..., λ n ), and μ is an eigenvalue of a perturbed matrix A + E, then min λ μ κ p(x) E p. λ λ(a) That is, μ is within κ p (X) E p of an eigenvalue of A. It follows that if A is nearly nondiagonalizable, which can be the case if eigenvectors are nearly linearly dependent, then a small perturbation in A could still cause a large change in the eigenvalues. It would be desirable to have a concrete measure of the sensitivity of an eigenvalue, just as we have the condition number that measures the sensitivity of a system of linear equations. To that end, we assume that λ is a simple eigenvalue of an n n matrix A that has Jordan canonical form J = X 1 AX. Then, λ = J ii for some i, and x i, the ith column of X, is a corresponding right eigenvector. If we define Y = X H = (X 1 ) H, then y i is a left eigenvector of A corresponding to λ. From Y H X = I, it follows that y H x = 1. We now let A, λ, and x be functions of a parameter ε that satisfy A(ε)x(ε) = λ(ε)x(ε), A(ε) = A + εf, F 2 = 1. Differentiating with respect to ε, and evaluating at ε = 0, yields F x + Ax (0) = λx (0) + λ (0)x. 2

Taking the inner product of both sides with y yields y H F x + y H Ax (0) = λy H x (0) + λ (0)y H x. Because y is a left eigenvector corresponding to λ, and y H x = 1, we have We conclude that y H F x + λy H x (0) = λy H x (0) + λ (0). λ (0) = y H F x y 2 F 2 x 2 y 2 x 2. However, because θ, the angle between x and y, is given by it follows that cos θ = yh x y 2 x 2 = λ (0) 1 cos θ. 1 y 2 x 2, We define 1/ cos θ to be the condition number of the simple eigenvalue λ. We require λ to be simple because otherwise, the angle between the left and right eigenvectors is not unique, because the eigenvectors themselves are not unique. It should be noted that the condition number is also defined by 1/ y H x, where x and y are normalized so that x 2 = y 2 = 1, but either way, the condition number is equal to 1/ cos θ. The interpretation of the condition number is that an O(ε) perturbation in A can cause an O(ε/ cos θ ) perturbation in the eigenvalue λ. Therefore, if x and y are nearly orthogonal, a large change in the eigenvalue can occur. Furthermore, if the condition number is large, then A is close to a matrix with a multiple eigenvalue. Example The matrix A = 3.1482 0.2017 0.5363 0.4196 0.5171 1.0888 0.3658 1.7169 3.3361 has a simple eigenvalue λ = 1.9833 with left and right eigenvectors x = [ 0.4150 0.6160 0.6696 ] T, y = [ 7.9435 83.0701 70.0066 ] T such that y H x = 1. It follows that the condition number of this eigenvalue is x 2 y 2 = 108.925. In fact, the nearby matrix 3.1477 0.2023 0.5366 B = 0.4187 0.5169 1.0883 0.3654 1.7176 3.3354 3

has a double eigenvalue that is equal to 2. We now consider the sensitivity of repeated eigenvalues. First, it is important to note that while the eigenvalues of a matrix A are continuous functions of the entries of A, they are not necessarily differentiable functions of the entries. To see this, we consider the matrix A = [ 1 a ε 1 where a > 0. Computing its characteristic polynomial ], det(a λi) = λ 2 2λ + 1 aε and computings its roots yields the eigenvalues λ = 1 ± aε. Differentiating these eigenvalues with respect to ε yields dλ a dε = ± ε, which is undefined at ε = 0. In general, an O(ε) perturbation in A causes an O(ε 1/p ) perturbation in an eigenvalue associated with a p p Jordan block, meaning that the more defective an eigenvalue is, the more sensitive it is. We now consider the sensitivity of eigenvectors, or, more generally, invariant subspaces of a matrix A, such as a subspace spanned by the first k Schur vectors, which are the first k columns in a matrix Q such that Q H AQ is upper triangular. Suppose that an n n matrix A has the Schur decomposition A = QT Q H, Q = [ Q 1 Q 2 ], T = [ T11 T 12 0 T 22 where Q 1 is n r and T 11 is r r. We define the separation between the matrices T 11 and T 22 by T 11 X XT 22 F sep(t 11, T 22 ) = min. X =0 X F It can be shown that an O(ε) perturbation in A causes a O(ε/sep(T 11, T 22 )) perturbation in the invariant subspace Q 1. We now consider the case where r = 1, meaning that Q 1 is actually a vector q 1, that is also an eigenvector, and T 11 is the corresponding eigenvalue, λ. Then, we have sep(λ, T 22 ) = λx XT 22 F min X =0 X F = min y 2 =1 yh (T 2 2 λi) 2 = min y 2 =1 (T 22 λi) H y 2 = σ min ((T 22 λ I ) H ) = σ min (T 22 λi), 4 ],

since the Frobenius norm of a vector is equivalent to the vector 2-norm. Because the smallest singular value indicates the distance to a singular matrix, sep(λ, T 22 ) provides a measure of the separation of λ from the other eigenvalues of A. It follows that eigenvectors are more sensitive to perturbation if the corresponding eigenvalues are clustered near one another. That is, eigenvectors associated with nearby eigenvalues are wobbly. It should be emphasized that there is no direct relationship between the sensitivity of an eigenvalue and the sensitivity of its corresponding invariant subspace. The sensitivity of a simple eigenvalue depends on the angle between its left and right eigenvectors, while the sensitivity of an invariant subspace depends on the clustering of the eigenvalues. Therefore, a sensitive eigenvalue, that is nearly defective, can be associated with an insensitive invariant subspace, if it is distant from other eigenvalues, while an insensitive eigenvalue can have a sensitive invariant subspace if it is very close to other eigenvalues. The Symmetric Eigenvalue Problem In the symmetric case, the Gerschgorin circles become Gerschgorin intervals, because the eigenvalues of a symmetric matrix are real. Example The eigenvalues of the 3 3 symmetric matrix 10 3 2 A = 3 4 2 2 2 14 are The Gerschgorin intervals are λ(a) = {14.6515, 4.0638, 10.7153}. D 1 = {x R x 14 4}, D 2 = {x R x 4 5}, D 3 = {x R x + 10 5}. We see that each intervals contains one eigenvalue. The characterization of the eigenvalues of a symmetric matrix as constrained maxima of the Rayleight quotient lead to the following results about the eigenvalues of a perturbed symmetric matrix. As the eigenvalues are real, and therefore can be ordered, we denote by λ i (A) the ith largest eigenvalue of A. Theorem (Wielandt-Hoffman) If A and A + E are n n symmetric matrices, then (λ i (A + E) λ i (A)) 2 E 2 F. i=1 It is also possible to bound the distance between individual eigenvalues of A and A + E. 5

Theorem If A and A + E are n n symmetric matrices, then λ n (E) λ k (A + E) λ k (A) λ 1 (E). Furthermore, λ k (A + E) λ k (A) E 2. The second inequality in the above theorem follows directly from the first, as the 2-norm of the symmetric matrix E, being equal to its spectral radius, must be equal to the larger of the absolute value of λ 1 (E) or λ n (E). Theorem (Interlacing Property) If A is an n n symmetric matrix, and A r is the r r leading principal minor of A, then, for r = 1, 2,..., n 1, λ r+1 (A r+1 ) λ r (A r ) λ r (A r+1 ) λ 2 (A r+1 ) λ 1 (A r ) λ 1 (A r+1 ). For a symmetric matrix, or even a more general normal matrix, the left eigenvectors and right eigenvectors are the same, from which it follows that every simple eigenvalue is perfectly conditioned ; that is, the condition number 1/ cos θ is equal to 1 because θ = 0 in this case. However, the same results concerning the sensitivity of invariant subspaces from the nonsymmetric case apply in the symmetric case as well: such sensitivity increases as the eigenvalues become more clustered, even though there is no chance of a defective eigenvalue. This is because for a nondefective, repeated eigenvalue, there are infinitely many possible bases of the corresponding invariant subspace. Therefore, as the eigenvalues approach one another, the eigenvectors become more sensitive to small perturbations, for any matrix. Let Q be an n r matrix with orthonormal columns, meaning that Q T 1 Q 1 = I r. If it spans an invariant subspace of an n n symmetric matrix A, then AQ 1 = Q 1 S, where S = Q T 1 AQ 1. On the other hand, if range(q 1 ) is not an invariant subspace, but the matrix AQ 1 Q 1 S = E 1 is small for any given r r symmetric matrix S, then the columns of Q 1 define an approximate invariant subspace. It turns out that E 1 F is minimized by choosing S = Q T 1 AQ 1. Furthermore, we have AQ 1 S 1 S F = P 1 AQ 1 F, where P1 = I Q 1Q T 1 is the orthogonal projection into (range(q 1)), and there exist eigenvalues μ 1,..., μ r λ(a) such that μ k λ k (S) 2 E 1 2, k = 1,..., r. 6

That is, r eigenvalues of A are close to the eigenvalues of S, which are known as Ritz values, while the corresponding eigenvectors are called Ritz vectors. If (θ k, y k ) is an eigenvalue-eigenvector pair, or an eigenpair of S, then, because S is defined by S = Q T 1 AQ 1, it is also known as a Ritz pair. Furthermore, as θ k is an approximate eigenvalue of A, Q 1 y k is an approximate corresponding eigenvector. To see this, let σ k (not to be confused with singular values) be an eigenvalue of S, with eigenvector y k. We multiply both sides of the equation Sy k = σ k y k by Q 1 : Q 1 Sy k = σ k Q 1 y k. Then, we use the relation AQ 1 Q 1 S = E 1 to obtain Rearranging yields If we let x k = Q 1 y k, then we conclude (AQ 1 E 1 )y k = σ k Q 1 y k. A(Q 1 y k ) = σ k (Q 1 y k ) + E 1 y k. Ax k = σ k x k + E 1 y k. Therefore, E 1 is small in some norm, Q 1 y k is nearly an eigenvector. 7