Today: eigenvalue sensitivity, eigenvalue algorithms Reminder: midterm starts today

Similar documents
Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

EIGENVALUE PROBLEMS. Background on eigenvalues/ eigenvectors / decompositions. Perturbation analysis, condition numbers..

Eigenvalue and Eigenvector Problems

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Math 405: Numerical Methods for Differential Equations 2016 W1 Topics 10: Matrix Eigenvalues and the Symmetric QR Algorithm

Eigenvalues and Eigenvectors

Eigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis

Scientific Computing: An Introductory Survey

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

AMS526: Numerical Analysis I (Numerical Linear Algebra)

The Eigenvalue Problem: Perturbation Theory

Notes on Eigenvalues, Singular Values and QR

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM

Computational Methods. Eigenvalues and Singular Values

MATH 304 Linear Algebra Lecture 23: Diagonalization. Review for Test 2.

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Numerical Analysis Lecture Notes

Announcements Monday, November 06

MATH 304 Linear Algebra Lecture 33: Bases of eigenvectors. Diagonalization.

arxiv: v1 [math.na] 5 May 2011

Lecture 4 Eigenvalue problems

DIAGONALIZATION. In order to see the implications of this definition, let us consider the following example Example 1. Consider the matrix

Lecture 15, 16: Diagonalization

EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4

1. In this problem, if the statement is always true, circle T; otherwise, circle F.

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Eigenproblems II: Computation

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Numerical Methods I Eigenvalue Problems

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Eigenvalues and eigenvectors

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

Computational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science

22.4. Numerical Determination of Eigenvalues and Eigenvectors. Introduction. Prerequisites. Learning Outcomes

IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET

18.06 Problem Set 8 Solution Due Wednesday, 22 April 2009 at 4 pm in Total: 160 points.

Announcements Wednesday, November 7

Eigenvalues, Eigenvectors, and Diagonalization

Math 471 (Numerical methods) Chapter 4. Eigenvalues and Eigenvectors

Linear algebra & Numerical Analysis

Math 240 Calculus III

Lecture 10 - Eigenvalues problem

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

1 Last time: least-squares problems

Eigenvalues, Eigenvectors, and Diagonalization

Lecture 8 : Eigenvalues and Eigenvectors

MATH 583A REVIEW SESSION #1

4.2. ORTHOGONALITY 161

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Announcements Wednesday, November 7

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

Announcements Monday, November 13

ft-uiowa-math2550 Assignment NOTRequiredJustHWformatOfQuizReviewForExam3part2 due 12/31/2014 at 07:10pm CST

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

235 Final exam review questions

Linear Algebra in Actuarial Science: Slides to the lecture

Linear Algebra Exercises

2 Eigenvectors and Eigenvalues in abstract spaces.

Chapter 5. Eigenvalues and Eigenvectors

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

AM 205: lecture 6. Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization

EECS 275 Matrix Computation

Chapters 5 & 6: Theory Review: Solutions Math 308 F Spring 2015

MATH 304 Linear Algebra Lecture 34: Review for Test 2.

MA 265 FINAL EXAM Fall 2012

AM 205: lecture 6. Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization

The QR Decomposition

Math 110 Linear Algebra Midterm 2 Review October 28, 2017

Math 3191 Applied Linear Algebra

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Cheat Sheet for MATH461

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors

THE MATRIX EIGENVALUE PROBLEM

Jordan Canonical Form Homework Solutions

Numerical Solution of Linear Eigenvalue Problems

Computing Eigenvalues and/or Eigenvectors;Part 1, Generalities and symmetric matrices

EE731 Lecture Notes: Matrix Computations for Signal Processing

Math 4A Notes. Written by Victoria Kala Last updated June 11, 2017

Review problems for MA 54, Fall 2004.

Linear Algebra Review

MIDTERM. b) [2 points] Compute the LU Decomposition A = LU or explain why one does not exist.

In this section again we shall assume that the matrix A is m m, real and symmetric.

Computing Eigenvalues and/or Eigenvectors;Part 1, Generalities and symmetric matrices

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Linear Algebra. Carleton DeTar February 27, 2017

Math 205, Summer I, Week 4b:

On the Modification of an Eigenvalue Problem that Preserves an Eigenspace

Linear Algebra: Matrix Eigenvalue Problems

ECS231 Handout Subspace projection methods for Solving Large-Scale Eigenvalue Problems. Part I: Review of basic theory of eigenvalue problems

Eigenvalues, Eigenvectors, and Diagonalization

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

BASIC ALGORITHMS IN LINEAR ALGEBRA. Matrices and Applications of Gaussian Elimination. A 2 x. A T m x. A 1 x A T 1. A m x

Solutions to Final Exam

Review of Linear Algebra Definitions, Change of Basis, Trace, Spectral Theorem

Lecture notes: Applied linear algebra Part 1. Version 2

Transcription:

AM 205: lecture 22 Today: eigenvalue sensitivity, eigenvalue algorithms Reminder: midterm starts today Posted online at 5 PM on Thursday 13th Deadline at 5 PM on Friday 14th Covers material up to and including lecture 16 Open book. No collaboration allowed. Send questions as private messages on Piazza

Gershgorin s Theorem (from last time) Theorem: All eigenvalues of A C n n are contained within the union of the n Gershgorin disks of A

Sensitivity of Eigenvalue Problems We shall now consider the sensitivity of the eigenvalues to perturbations in the matrix A Suppose A is nondefective, and hence A = VDV 1 Let δa denote a perturbation of A, and let E V 1 δav, then V 1 (A + δa)v = V 1 AV + V 1 δav = D + E

Sensitivity of Eigenvalue Problems For a nonsingular matrix X, the map A X 1 AX is called a similarity transformation of A Theorem: A similarity transformation preserves eigenvalues Proof: We can equate the characteristic polynomials of A and X 1 AX (denoted p A (z) and p X 1 AX (z), respectively) as follows: p X 1 AX (z) = det(zi X 1 AX ) = det(x 1 (zi A)X ) = det(x 1 ) det(zi A) det(x ) = det(zi A) = p A (z), where we have used the identities det(ab) = det(a) det(b), and det(x 1 ) = 1/ det(x )

Sensitivity of Eigenvalue Problems The identity V 1 (A + δa)v = D + E is a similarity transformation Therefore A + δa and D + E have the same eigenvalues Let λ k, k = 1, 2,..., n denote the eigenvalues of A, and λ denote an eigenvalue of A + δa Then for some w C n, ( λ, w) is an eigenpair of (D + E), i.e. (D + E)w = λw

Sensitivity of Eigenvalue Problems This can be rewritten as w = ( λi D) 1 Ew This is a promising start because: we want to bound λ λ k for some k ( λi D) 1 is a diagonal matrix with entries 1/( λ λ k ) on the diagonal

Sensitivity of Eigenvalue Problems Taking norms yields w 2 ( λi D) 1 2 E 2 w 2, or ( λi D) 1 1 2 E 2 Note that the norm of a diagonal matrix is given by its largest entry (in abs. val.) 1 max v 0 Dv v (D 11 v 1, D 22 v 2,..., D nn v n ) = max v 0 v { } max D v ii max i=1,2,...,n v 0 v = max D ii i=1,2,...,n 1 This holds for any induced matrix norm, not just the 2-norm

Sensitivity of Eigenvalue Problems Hence ( λi D) 1 2 = 1/ λ λ k, where λ k of A closest to λ is the eigenvalue Therefore it follows from ( λi D) 1 1 2 E 2 that λ λ k = ( λi D) 1 1 2 E 2 = V 1 δav 2 V 1 2 δa 2 V 2 = cond(v ) δa 2 This result is known as the Bauer Fike Theorem

Sensitivity of Eigenvalue Problems Hence suppose we compute the eigenvalues, λ i, of the perturbed matrix A + δa Then Bauer Fike tells us that each λ i must reside in a disk of radius cond(v ) δa 2 centered on some eigenvalue of A If V is poorly conditioned, then even for small perturbations δa, the disks can be large: sensitivity to perturbations If A is normal then cond(v ) = 1, in which case the Bauer Fike disk radius is just δa 2

Sensitivity of Eigenvalue Problems Note that a limitation of Bauer Fike is that it does not tell us which disk λ i will reside in Therefore, this doesn t rule out the possibility of, say, all λ i clustering in just one Bauer Fike disk In the case that A and A + δa are hermitian, we have a stronger result

Sensitivity of Eigenvalue Problems Weyl s Theorem: Let λ 1 λ 2 λ n and λ 1 λ 2 λ n be the eigenvalues of hermitian matrices A and A + δa, respectively. Then max λ i λ i δa 2. i=1,...,n Hence in the hermitian case, each perturbed eigenvalue must be in the disk 2 of its corresponding unperturbed eigenvalue! 2 In fact, eigenvalues of a hermitian matrix are real, so disk here is actually an interval in R

Sensitivity of Eigenvalue Problems The Bauer Fike Theorem relates to perturbations of the whole spectrum We can also consider perturbations of individual eigenvalues Suppose, for simplicity, that A C n n is symmetric, and consider the perturbed eigenvalue problem (A + E)(v + v) = (λ + λ)(v + v) Expanding this equation, dropping second order terms, and using Av = λv gives A v + Ev λv + λ v

Sensitivity of Eigenvalue Problems Premultiply A v + Ev λv + λ v by v to obtain v A v + v Ev λv v + λv v Noting that v A v = (v A v) = v Av = λ v v = λv v leads to v Ev λv v, or λ = v Ev v v

Sensitivity of Eigenvalue Problems Finally, we obtain λ v Ev v 2 2 v 2 Ev 2 v 2 2 = E 2, so that λ E 2 We observe that perturbation bound does not depend on cond(v ) when we consider only an individual eigenvalue this individual eigenvalue perturbation bound is asymptotic; it is rigorous only in the limit that the perturbations 0

Algorithms for Eigenvalue Problems

Power Method

Power Method The power method is perhaps the simplest eigenvalue algorithm It finds the eigenvalue of A C n n with largest modulus 1: choose x 0 C n arbitrarily 2: for k = 1, 2,... do 3: x k = Ax k 1 4: end for Question: How does this algorithm work?

Power Method Assuming A is nondefective, then the eigenvectors v 1, v 2,..., v n provide a basis for C n Therefore there exist coefficients α i such that x 0 = n j=1 α jv j Then, we have x k = Ax k 1 = A 2 x k 2 = = A k x 0 = n n A k α j v j = α j A k v j = j=1 n α j λ k j v j j=1 j=1 j=1 n 1 [ ] k = λ k λj n α n v n + α j v j λ n

Power Method Then if λ n > λ j, 1 j < n, we see that x k λ k nα n v n as k This algorithm converges linearly: the error terms are scaled by a factor at most λ n 1 / λ n at each iteration Also, we see that the method converges faster if λ n is well-separated from the rest of the spectrum

Power Method However, in practice the exponential factor λ k n could cause overflow or underflow after relatively few iterations Therefore the standard form of the power method is actually the normalized power method 1: choose x 0 C n arbitrarily 2: for k = 1, 2,... do 3: y k = Ax k 1 4: x k = y k / y k 5: end for

Power Method Convergence analysis of the normalized power method is essentially the same as the un-normalized case Only difference is we now get an extra scaling factor, c k R, due to the normalization at each step n 1 [ ] k x k = c k λ k λj n α n v n + α j v j λ n j=1

Power Method This algorithm directly produces the eigenvector v n One way to recover λ n is to note that y k = Ax k 1 λ n x k 1 Hence we can compare an entry of y k and x k 1 to approximate λ n We also note two potential issues: 1. We require x 0 to have a nonzero component of v n 2. There may be more than one eigenvalue with maximum modulus

Power Method Issue 1: In practice, very unlikely that x 0 will be orthogonal to v n Even if x 0 v n = 0, rounding error will introduce a component of v n during the power iterations Issue 2: We cannot ignore the possibility that there is more than one max. eigenvalue In this case x k would converge to a member of the corresponding eigenspace

Power Method An important idea in eigenvalue computations is to consider the shifted matrix A σi, for σ R We see that (A σi)v i = (λ i σ)v i and hence the spectrum of A σi is shifted by σ, and the eigenvectors are the same For example, if all the eigenvalues are real, a shift can be used with the power method to converge to λ 1 instead of λ n

Power Method Python example: Consider power method and shifted power method for [ ] 4 1 A =, 1 2 which has eigenvalues λ 1 = 2.1623, λ 2 = 4.1623

Inverse Iteration

Inverse Iteration The eigenvalues of A 1 are the reciprocals of the eigenvalues of A, since Av = λv A 1 v = 1 λ v Question: What happens if we apply the power method to A 1?

Inverse Iteration Answer: We converge to the largest (in modulus) eigenvalue of A 1, which is 1/λ 1 (recall that λ 1 is the smallest eigenvalue of A) This is called inverse iteration 1: choose x 0 C n arbitrarily 2: for k = 1, 2,... do 3: solve Ay k = x k 1 for y k 4: x k = y k / y k 5: end for

Inverse Iteration Hence inverse iteration gives λ 1 without requiring a shift This is helpful since it may be difficult to determine what shift is required to get λ 1 in the power method Question: What happens if we apply inverse iteration to the shifted matrix A σi?

Inverse Iteration The smallest eigenvalue of A σi is (λ i σ), where i = arg min λ i σ, i=1,2,...,n and hence... Answer: We converge to λ = 1/(λ i σ), then recover λ i via λ i = 1 λ + σ Inverse iteration with shift allows us to find the eigenvalue closest to σ

Rayleigh Quotient Iteration

Rayleigh Quotient For the remainder of this section (Rayleigh Quotient Iteration, QR Algorithm) we will assume that A R n n is real and symmetric 3 The Rayleigh quotient is defined as r(x) x T Ax x T x If (λ, v) R R n is an eigenpair, then r(v) = v T Av v T v = λv T v v T v = λ 3 Much of the material generalizes to complex non-hermitian matrices, but symmetric case is simpler

Rayleigh Quotient Theorem: Suppose A R n n is a symmetric matrix, then for any x R n we have λ 1 r(x) λ n Proof: We write x as a linear combination of (orthogonal) eigenvectors x = n j=1 α jv j, and the lower bound follows from r(x) = x T n Ax x T x = j=1 λ jαj 2 n j=1 α2 j n j=1 λ α2 j 1 n j=1 α2 j = λ 1 The proof of the upper bound r(x) λ n is analogous

Rayleigh Quotient Corollary: A symmetric matrix A R n n is positive definite if and only if all of its eigenvalues are positive Proof: ( ) Suppose A is symmetric positive definite (SPD), then for any nonzero x R n, we have x T Ax > 0 and hence λ 1 = r(v 1 ) = v T 1 Av 1 v T 1 v 1 > 0 ( ) Suppose A has positive eigenvalues, then for any nonzero x R n x T Ax = r(x)(x T x) λ 1 x 2 2 > 0

Rayleigh Quotient But also, if x is an approximate eigenvector, then r(x) gives us a good approximation to the eigenvalue This is because estimation of an eigenvalue from an approximate eigenvector is an n 1 linear least squares problem: xλ Ax x R n is our tall thin matrix and Ax R n is our right-hand side Hence the normal equation for xλ Ax yields the Rayleigh quotient, i.e. x T xλ = x T Ax

Rayleigh Quotient Question: How accurate is the Rayleigh quotient approximation to an eigenvalue? Let s consider r as a function of x, so r : R n R r(x) x j = x j (x T Ax) x T x (x T Ax) x j (x T x) (x T x) 2 = 2(Ax) j x T (x T Ax)2x j x (x T x) 2 2 = x T x (Ax r(x)x) j (Note that the second equation relies on the symmetry of A)

Rayleigh Quotient Therefore r(x) = 2 x T (Ax r(x)x) x For an eigenpair (λ, v) we have r(v) = λ and hence r(v) = 2 v T (Av λv) = 0 v This shows that eigenvectors of A are stationary points of r

Rayleigh Quotient Suppose (λ, v) is an eigenpair of A, and let us consider a Taylor expansion of r(x) about v: r(x) = r(v) + r(v) T (x v) + 1 2 (x v)t H r (v)(x v) + H.O.T. = r(v) + 1 2 (x v)t H r (v)(x v) + H.O.T. Hence as x v the error in a Rayleigh quotient approximation is r(x) λ = O( x v 2 2) That is, the Rayleigh quotient approx. to an eigenvalue squares the error in a corresponding eigenvector approx.

Rayleigh Quotient Iteration The Rayleigh quotient gives us an eigenvalue estimate from an eigenvector estimate Inverse iteration gives us an eigenvector estimate from an eigenvalue estimate It is natural to combine the two, this yields the Rayleigh quotient iteration 1: choose x 0 R n arbitrarily 2: for k = 1, 2,... do 3: σ k = x T k 1 Ax k 1/x T k 1 x k 1 4: solve (A σ k I)y k = x k 1 for y k 5: x k = y k / y k 6: end for

Rayleigh Quotient Iteration Suppose, at step k, we have x k 1 v ɛ Then, from the Rayleigh quotient in line 3 of the algorithm, we have σ k λ = O(ɛ 2 ) In lines 4 and 5 of the algorithm, we then perform an inverse iteration with shift σ k to get x k Recall the eigenvector error in one inverse iteration step is scaled by ratio of second largest to largest eigenvalues of (A σ k I) 1

Rayleigh Quotient Iteration Let λ be the closest eigenvalue of A to σ k, then the magnitude of largest eigenvalue of (A σ k I) 1 is 1/ σ k λ The second largest eigenvalue magnitude is 1/ σ k ˆλ, where ˆλ is the eigenvalue of A second closest to σ k Hence at each inverse iteration step, the error is reduced by a factor σ k λ σ k ˆλ = σ k λ (σ k λ) + (λ ˆλ) const. σ k λ as σ k λ Therefore, we obtain cubic convergence as k : x k v (const. σ k λ ) x k 1 v = O(ɛ 3 )

Rayleigh Quotient Iteration A drawback of Rayleigh iteration: we can t just LU factorize A σ k I once since the shift changes each step Also, it s harder to pick out specific parts of the spectrum with Rayleigh quotient iteration since σ k can change unpredictably Python demo: Rayleigh iteration to compute an eigenpair of 5 1 1 A = 1 6 1 1 1 7

QR Algorithm

The QR Algorithm The QR algorithm for computing eigenvalues is one of the best known algorithms in Numerical Analysis 4 It was developed independently in the late 1950s by John G.F. Francis (England) and Vera N. Kublanovskaya (USSR) The QR algorithm efficiently provides approximations for all eigenvalues/eigenvectors of a matrix We will consider what happens when we apply the power method to a set of vectors this will then motivate the QR algorithm 4 Recall that here we focus on the case in which A R n n is symmetric

The QR Algorithm Let x (0) 1,..., x p (0) denote p linearly independent starting vectors, and suppose we store these vectors in the columns of X 0 We can apply the power method to these vectors to obtain the following algorithm: 1: choose an n p matrix X 0 arbitrarily 2: for k = 1, 2,... do 3: X k = AX k 1 4: end for

The QR Algorithm From our analysis of the power method, we see that for each i = 1, 2,..., p: ) x (k) i = (λ k nα i,n v n + λ k n 1α i,n 1 v n 1 + + λ k 1α i,1 v 1 n ( ) k n p = λ k λj ( ) k λj n p α i,jv j + α i,jv j j=n p+1 λ n p j=1 λ n p Then, if λ n p+1 > λ n p, the sum in green will decay compared to the sum in blue as k Hence the columns of X k will converge to a basis for span{v n p+1,..., v n }

The QR Algorithm However, this method doesn t provide a good basis: each column of X k will be very close to v n Therefore the columns of X k become very close to being linearly dependent We can resolve this issue by enforcing linear independence at each step

The QR Algorithm We orthonormalize the vectors after each iteration via a (reduced) QR factorization, to obtain the simultaneous iteration: 1: choose n p matrix Q 0 with orthonormal columns 2: for k = 1, 2,... do 3: X k = A ˆQ k 1 4: ˆQ k ˆR k = X k 5: end for The column spaces of ˆQ k and X k in line 4 are the same Hence columns of ˆQ k converge to orthonormal basis for span{v n p+1,..., v n }

The QR Algorithm In fact, we don t just get a basis for span{v n p+1,..., v n }, we get the eigenvectors themselves! Theorem: The columns of ˆQ k converge to the p dominant eigenvectors of A We will not discuss the full proof, but we note that this result is not surprising since: the eigenvectors of a symmetric matrix are orthogonal columns of ˆQ k converge to an orthogonal basis for span{v n p+1,..., v n } Simultaneous iteration approximates eigenvectors, we obtain eigenvalues from the Rayleigh quotient ˆQ T A ˆQ diag(λ 1,..., λ n )

The QR Algorithm With p = n, the simultaneous iteration will approximate all eigenpairs of A We now show a more convenient reorganization of the simultaneous iteration algorithm We shall require some extra notation: the Q and R matrices arising in the simultaneous iteration will be underlined Q k, R k (As we will see shortly, this is to distinguish between the matrices arising in the two different formulations...)