Convergence Theory for Iterative Eigensolvers Mark Embree Virginia Tech with Chris Beattie, Russell Carden, John Rossi, Dan Sorensen RandNLA Workshop Simons Institute September 28
setting for the talk Let A C n n be a large square matrix, potentially non-hermitian (A A ). Computing all eigenvalues of A is too expensive (and usually not needed). Thus we seek m n distinguished eigenvalues relevant to our application (largest, smallest, rightmost, etc.)
setting for the talk Let A C n n be a large square matrix, potentially non-hermitian (A A ). Computing all eigenvalues of A is too expensive (and usually not needed). Thus we seek m n distinguished eigenvalues relevant to our application (largest, smallest, rightmost, etc.) Projection Methods V C n = k-dimensional subspace of C n, the projection subspace for A The columns of V C n k for an orthonormal basis for V: V V = I, V AV C k k We hope some eigenvalues of V AV σ(v AV) = {θ,..., θ k } approximate some eigenvalues of A. σ(a) = {λ,..., λ n} For example, θ λ,... θ m λ m for some m k.
setting for the talk Let A C n n be a large square matrix, potentially non-hermitian (A A ). Computing all eigenvalues of A is too expensive (and usually not needed). Thus we seek m n distinguished eigenvalues relevant to our application (largest, smallest, rightmost, etc.) Projection Methods V C n = k-dimensional subspace of C n, the projection subspace for A The columns of V C n k for an orthonormal basis for V: V V = I, V AV C k k We hope some eigenvalues of V AV σ(v AV) = {θ,..., θ k } approximate some eigenvalues of A. σ(a) = {λ,..., λ n} For example, θ λ,... θ m λ m for some m k. This talk mainly describes established results for the deterministic case, with some thoughts from a RandNLA perspective along the way.
why consider A A? While Hermitian problems are common (SVD, quantum mechanics, etc.), many important applications lead to non-hermitian problems and subtler issues of spectral perturbation theory. Many examples: [Trefethen, E. 25]. atmospheric science Farrell... fluid flow stability Trefethen; Schmid & Henningson;... damped mechanical systems Cox & Zuazua,... control theory Hinrichsen & Pritchard,... data-driven modeling Antoulas, Beattie, Gugercin,... lasers Landau, Siegman,... ecology May, Caswell,... Markov chains Diaconis,... 2.5.5 - -2-3 -4-5 -.5 - -.5-2 -2.5-2 -.5 - -.5.5.5-6 -7-8 -9 - computed rightmost eigenvalues for stability of 2d flow over a backward facing step, n = 38,539 [E., Keeler 27]
An Overview of Projection-Based Eigensolvers
projection-based eigensolvers V C n = k-dimensional subspace of C n, the projection subspace for A. The columns of V C n k form an orthonormal basis for V: V V = I, V AV C k k We hope some eigenvalues of V AV σ(v AV) = {θ,..., θ k } approximate some eigenvalues of A. σ(a) = {λ,..., λ n} For example, θ λ,... θ m λ m for some m k.
projection-based eigensolvers V C n = k-dimensional subspace of C n, the projection subspace for A. The columns of V C n k form an orthonormal basis for V: V V = I, V AV C k k We hope some eigenvalues of V AV σ(v AV) = {θ,..., θ k } approximate some eigenvalues of A. σ(a) = {λ,..., λ n} For example, θ λ,... θ m λ m for some m k. Power method V = span{a p x} (minimal storage, easy to implement, can be slow) Subspace iteration (more storage, subtler to implement, computes repeated eigs) V = Range(A p X) for X C n k [Halko, Martinsson, Tropp 2] et al.
projection-based eigensolvers: krylov methods Power method V = span{a p x} (minimal storage, easy to implement, can be slow) Subspace iteration (more storage, subtler to implement, computes multiple eigs) V = Range(A p X) for X C n b Krylov subspace methods (growing subspace dimension; higher powers of A) V = span{x, Ax, A 2 x,..., A k x} Block Krylov methods (subspace dimension grows quickly: dim(v) kb) V = Range([X AX A 2 X A k X]) for X C n b SVD: [Musco & Musco 25], [Drineas et al. 28] Must balance benefit of large k with block size b, storage.
projection-based eigensolvers: krylov methods (extensions) Krylov subspace methods (growing subspace dimension; higher powers of A) V = span{x, Ax, A 2 x,..., A k x} Restarted Krylov (used in eigs: filter ψ improves starting vector) V = span{ψ(a)x, Aψ(A)x, A 2 ψ(a)x,..., A k ψ(a)x} Polynomial Preconditioned Krylov V = span{x, π(a)x, π(a) 2 x,..., π(a) k x} (very high degree polys, care needed) Shift-Invert Krylov (used in eigs: ideal for eigenvalues near µ) V = span{x, (A µi) )x, (A µi) 2 x,..., (A µi) (k ) x} Rational Krylov (helps for finding eigenvalues in a region) V = span{x, (A µ ) x, (A µ 2) x,..., (A µ k I) x}
preliminaries: spectral structure of A Distinct eigenvalues of A: λ, λ 2,..., λ n Spectral projectors P j and invariant subspaces U j : P j := (zi A) dz, U j := Range(P j ), 2πi Γ j Γ j is a contour in C containing λ j but no other distinct eigenvalues. If A = A and λ j is simple with unit eigenvector u j, then P j = u j u j. P j is a projector onto the invariant subspace U j, but P j need not be an orthogonal projector when A A. The spectral projectors give a resolution of the identity: n j= P j = I.
preliminaries: spectral structure of A Distinct eigenvalues of A: λ, λ 2,..., λ n Spectral projectors P j and invariant subspaces U j : P j := (zi A) dz, U j := Range(P j ), 2πi Γ j Γ j is a contour in C containing λ j but no other distinct eigenvalues. If A = A and λ j is simple with unit eigenvector u j, then P j = u j u j. P j is a projector onto the invariant subspace U j, but P j need not be an orthogonal projector when A A. The spectral projectors give a resolution of the identity: P g := P + + P m, U g := Range(P g), m = dim(u g). n j= P j = I. P b := I P g, U b := Range(P b ), dim(u b ) = n m.
preliminaries: angles between subspaces V = approximating subspace. For our problems, V = K k (A, x) := span{x, Ax, A 2 x,..., A k x}. U g = desired invariant subspace Measure convergence via the containment gap: δ(u g, V) = max u U g sin (u, V) = max u U g u v min v V u. u min u v v V (u, V) V We will monitor how δ(u g, K k (A, x)) develops as k increases.
example convergence behavior, A A δ(ug, Kk(A, x)) k < m warm-up -5 - sublinear convergence (starting vector bias) (nonorthogonal eigenvectors) steady convergence (eigenvalue location) -5 6 5 5 subspace dimension, k cf. GMRES convergence model of [Nevanlinna 993] example adapted from [Beattie, E., Rossi 24]
basic convergence model Building on [Saad 98, 983], [Jia 995], [Sorensen 22], [Beattie, E., Rossi 24], and others... Theorem [Beattie, E., Sorensen 25]. Suppose U g is reachable from the Krylov space K k (A, x). Then for k 2m, δ(u g, K k (A, x)) C C 2 min φ P k 2m max z Ω b α(z)φ(z). C = C (A, x) = measure of starting vector bias. C 2 = C 2(A, Ω b ) = measure of eigenvector departure from orthogonality. P k 2m = set of polynomials of degree k 2m or less. Ω b C contains the undesired eigenvalues. α(z) = (z λ ) (z λ m).
Invariant Subspaces reachable by Krylov Subspaces
reachable invariant subspaces If x C n lacks a component in the desired eigenvector, e.g., P x =, the desired eigenvalue/eigenvector is invisible to Krylov methods (in exact arithmetic). For example, in the power method A p x = n n λ p j P j x = + λ p j P j x, j= j=2 the eigenvalue λ has no influence. (We will address this more later.) A different problem arises when A has repeated eigenvalues with linearly indpendent eigenvectors (derogatory eigenvalues). A simple example: A = I (identity matrix). K k (A, x) = span{x, Ax,..., A k x} = span{x}. The Krylov method converges in one step (happy breakdown), exactly finding one copy of the eigenvalue λ = and eigenvector x.
reachable invariant subspaces A more perplexing example from Chris Beattie [Beattie, E., Rossi 24]: A =, x = c. By taking c large, we bias x toward the eigenvector [,,,, ] T. For any c the Krylov method breaks down (happily) at iteration k = 3, discovering the Jordan block and invariant subspace c V AV = S S, Range. The Krylov method finds the 3 3 Jordan block with eigenvector [,,,, ] T. Only a set of measure zero x can discover [,,,, ] T.
reachable invariant subspaces Unlucky choices of x C n can (in principle) prevent the Krylov method from seeing a desired (simple) eigenvector. This behavior is fragile to numerical computations, since infinitesimal perturbations to x will add a small component in the desired eigenvector. Single-vector Krylov methods can (in principle) find one Jordan block associated with each eigenvalue. This behavior is fragile to numerical computations, since infinitesimal perturbations split multiple eigenvalues. Block Krylov methods (with block size b, X C n b ) K k (A, X) = Range([X AX A 2 X A k X]), can find b linearly independent eigenvectors for a single eigenvalue.
How does the starting vector x affect convergence?
effect of starting vector on convergence Henceforth assume the desired invariant subspace U is reachable from x: U g K n(a, x). How does x influence convergence? For a single eigenpair (λ, u ) with spectral projector P, Saad [98] gives sin (u, K k (A, x)) P x min φ P k φ(λ )= (I P )ψ(a), The leading constant grows as the orientation of x toward u diminishes.
effect of starting vector on convergence Henceforth assume the desired invariant subspace U is reachable from x: U g K n(a, x). How does x influence convergence? For a single eigenpair (λ, u ) with spectral projector P, Saad [98] gives sin (u, K k (A, x)) P x min φ P k φ(λ )= (I P )ψ(a), The leading constant grows as the orientation of x toward u diminishes. For m-dimensional invariant subspaces U g, our bounds replace / P x with ψ(a)p b x C := max ψ P m ψ(a)p = max P b v gx v K m(a,x) P, gv the ratio of the bad to the good component in the worst approximation to U g from the m-dimensional Krylov space.
effect of starting vector on convergence A = symmetric matrix (n = 28) with 8 desired eigenvalues in [, 2]; 2 undesired eigenvalues in [, ]. θ = (x, U g), the angle between x and its best approximation in U g. δ(ug, Kk(A, x)) θ = 6 θ = 9 θ = 2 θ = 5 θ = θ = 3 iteration, k
effect of starting vector on convergence A = symmetric matrix (n = 28) with 8 desired eigenvalues in [, 2]; 2 undesired eigenvalues in [, ]. θ = (x, U g), the angle between x and its best approximation in U g. θ = 5 θ = 3 P b v C = max v K m(a,x) P gv δ(ug, Kk(A, x)) -5 - θ = 6 θ = 9 θ = 2 θ = 5 - - - [BER 24] bound -5 8 5 2 25 3 35 iteration, k
How do the eigenvalues of A affect convergence?
asymptotic convergence rate determined by eigenvalues min φ P k 2m max z Ω b α(z)φ(z) P k 2m = set of polynomials of degree k 2m or less. Ω b C contains the undesired eigenvalues. α(z) = (z λ ) (z λ m). The polynomial approximation problem gives convergence like C γ k for some constant C and rate γ. When A = A, Ω b = [λ m+, λ n], and use Chebyshev polynomials to compute the convergence rate γ. When Ω b is a simply connected open subset of C, use conformal mapping to approach the approximation problem.
potential theoretic determination of the convergence rate Step : Begin by identifying undesired ( ) and desired ( ) eigenvalues.
potential theoretic determination of the convergence rate Step 2: Bound bad eigenvalues with Ω b.
potential theoretic determination of the convergence rate Step 3: Conformally map C \ Ω b to the exterior of the unit disk.
potential theoretic determination of the convergence rate Step 4: Find the lowest level curve of the Green s function intersecting a good eigenvalues.
potential theoretic determination of the convergence rate Step 5: Invert map to obtain curves of constant convergence rate in the original domain. Black circles on final figure are Fejér points, asymptotically optimal interpolation points for φ.
convergence rate: and granularity of the spectrum If convergence is very slow, perhaps you are solving the wrong problem.
convergence rate: and granularity of the spectrum If convergence is very slow, perhaps you are solving the wrong problem. Consider m =, where we can use the elementary bound [Saad 98] sin (u, K k (A, x)) P x min φ P k φ(λ )= (I P )ψ(a). Suppose A = A and we seek leftmost eigenvalue λ, where λ < λ 2 λ n. The error bound suggests the progress made at each iteration is like γ := κ κ +, where κ := λn λ λ 2 λ. When A is a discretization of an unbounded operator, we expect λ n = A as n. The convergence rate goes to one as n. Thus Krylov subspace methods often perform poorly for PDE eigenvalue problems unless the set-up is modified.
convergence slows as the discretization improves Apply the Krylov method to discretizations of the Laplacian in one dimension. How does the convergence rate change as the discretization improves? n = 256-2 4 6 8
convergence slows as the discretization improves Apply the Krylov method to discretizations of the Laplacian in one dimension. How does the convergence rate change as the discretization improves? n = 52 n = 256-2 4 6 8
convergence slows as the discretization improves Apply the Krylov method to discretizations of the Laplacian in one dimension. How does the convergence rate change as the discretization improves? n = 496 n = 248 n = 24 n = 52 n = 256-2 4 6 8
Convergence of Krylov Subspace Projection The problem becomes immediately apparent if we attempt to run Krylov subspace projection on the operator itself, K k (L, f ) = span{f, Lf,..., L k f }. For Lu = u with Dirichlet boundary conditions, u() = u() =, take some starting vector f Dom(L), i.e., f () = f () =. In general Lf Dom(L), so we cannot build the next Krylov direction L 2 f = L(Lf ). The Krylov algorithm breaks down at the third step.
Convergence of Krylov Subspace Projection The problem becomes immediately apparent if we attempt to run Krylov subspace projection on the operator itself, K k (L, f ) = span{f, Lf,..., L k f }. For Lu = u with Dirichlet boundary conditions, u() = u() =, take some starting vector f Dom(L), i.e., f () = f () =. In general Lf Dom(L), so we cannot build the next Krylov direction L 2 f = L(Lf ). The Krylov algorithm breaks down at the third step. The operator setting suggests that we instead apply Krylov to L : K k (L, f ) = span{f, L f,..., L (k ) f }. In this case, L is a beautiful compact operator: (L f )(x) = f + C + C x, where we choose C and C so that (L f )() = (L f )() =.
krylov projection applied to the operator We run the Krylov method on L exactly in Mathematica. Denote the eigenvalue estimates at the kth iteration as θ (k) θ (k) 2 θ (k) k. j3 (k) j! 6jj 4 j = 8-4 -8 j = -2 2 3 4 5 6 7 8 9 2 3 4 5 6 k Observe superlinear convergence as k increases. For CG and GMRES applied to operators, see [Winther 98], [Nevanlinna 993], [Moret 997], [Olver 29], [Kirby 2]. For superlinear convergence in finite dimensional settings, see [van der Sluis, van der Vorst, 986], [van der Vorst, Vuik, 992], [Beattie, E., Rossi 24], [Simoncini, Szyld 25].
krylov projection applied to the operator We run the Krylov method on L exactly in Mathematica. Denote the eigenvalue estimates at the kth iteration as θ (k) θ (k) 2 θ (k) k. j3 (k) j! 6jj 4 j = 8-4 -8 j = -2 2 3 4 5 6 7 8 9 2 3 4 5 6 k Observe superlinear convergence as k increases. For CG and GMRES applied to operators, see [Winther 98], [Nevanlinna 993], [Moret 997], [Olver 29], [Kirby 2]. For superlinear convergence in finite dimensional settings, see [van der Sluis, van der Vorst, 986], [van der Vorst, Vuik, 992], [Beattie, E., Rossi 24], [Simoncini, Szyld 25]. This mode of computation is preferred for discretization matrices as well: the shift-invert Arnoldi method uses K k ((A µi), x).
polynomial preconditioning: a cheap spectral transformation Replace the conventional Krylov space K k (A, x) = span{x, Ax, A 2 x,..., A k x} with the polynomial preconditioned transformation K k (π(a), x) = span{x, π(a)x, π(a) 2 x,..., π(a) k x}. [Thornquist 26], [E., Loe, Morgan arxiv:86.82] Use the polynomial π to separate the interesting eigenvalues. Often increases matvecs, but decreases iterations (hence orthogonalization). For example, with Hermitian A, take π to be the degree-d MINRES residual polynomial [Paige & Saunders 975], which attains min p(a)b. p P d p()= This polynomial tends to separate smallest-magnitude eigenvalues.
polynomial preconditioning: a cheap spectral transformation Polynomial preconditioning: Hermitian A, MINRES polynomial..2.8.6.4.2 -.2 -.4.2.8.6.4.2 -.2 -.4.5.5 2.5.5 2.2.8.6.4.2 -.2 -.4.2.8.6.4.2 -.2 -.4.5.5 2 horizontal axis: σ(a).5.5 2 vertical axis σ(π(a))
How do the eigenvectors of A affect convergence?
bounding functions of a matrix If A is normal (A A = AA ), eigenvectors are orthogonal. For f analytic on σ(a), f (A) = max f (λ). λ σ(a) For nonnormal A, the situation is considerably more complicated. If A is diagonalizable, A = UΛU, then f (A) U U max f (λ). λ σ(a) For the numerical range (field of values) W (A) = {v Av : v = }, ( f (A) + ) 2 max f (z). z W (A) For the ε-pseudospectrum σ ε(a) = {z σ(a + E) for some E < ε}, f (A) L ε 2πε max f (z), z σ ε(a) where L ε is the boundary length of σ ε(a).
constant to account for nonnormality C 2 = C 2(A, Ω b ) comes from bounding f (A Ub ), f (z) = α(z)φ(z). Theorem [Beattie, E., Sorensen 25]. Suppose U g is reachable from the Krylov space K k (A, x). Then for k 2m, δ(u g, K k (A, x)) C C 2 min φ P k 2m max z Ω b α(z)φ(z). If Ω b = σ(a Ub ) (no defective eigenvalues), then C 2 = U b U + b, where the columns of U b C n (n m) are eigenvectors of A Ub. If Ω b = W (A Ub ) then C 2 = + 2. If Ω b = σ ε(a Ub ) then C 2 = L ε/(2πε). Tension: balance C 2 verses size of Ω b.
transient behavior of the power method Large coefficients in the expansion of x in the eigenvector basis can lead to cancellation effects in x k = A k x. Example: here different choices of α and β affect eigenvalue conditioning, α 4α 8αβ/2 A = 3/4 β, u =, u 2 =, u 3 = 2β/3 3/4. u3 u3 6 u2 x x 2 x 4 x 6 x 8 x x 7 5 x 3 x u x x,..., x 8 u2 u u2 x 3 x 5 x 7 x u3 x x 2 x 4 x 6 x 8 u α = β = α =, β = α =, β = [Trefethen & E. 25]
restarting krylov suspace methods an essential tool for controlling subspace dimension
restarting krylov methods restarted Arnoldi algorithm (eigs).5 undesired eigenvalues desired eigenvalues To compute m < k eigenvalues of A C n n, Arnoldi methods restrict A to act on the k-dimensional Krylov subspace.5 Ran(V) = span{x, Ax,..., A k x}. Compute eigenvalues of V AV (Ritz values), and order them by relevance; e.g., if seeking the rightmost eigenvalue of A, let Re θ Re θ 2 Re θ k. Exact shifts [Sorensen 992] restart the method, attempting to improve v with a polynomial filter having the unwanted Ritz values as roots: x + = (A θ m+ I) (A θ k I)x. -.5 - -.5 -.5 - -.5.5.5 To understand convergence, one must understand how the Ritz values are distributed over σ(a).
restarting krylov methods restarted Arnoldi algorithm (eigs).5 undesired eigenvalues desired eigenvalues To compute m < k eigenvalues of A C n n, Arnoldi methods restrict A to act on the k-dimensional Krylov subspace.5 Ran(V) = span{x, Ax,..., A k x}. Compute eigenvalues of V AV (Ritz values), and order them by relevance; e.g., if seeking the rightmost eigenvalue of A, let Re θ Re θ 2 Re θ k. Exact shifts [Sorensen 992] restart the method, attempting to improve v with a polynomial filter having the unwanted Ritz values as roots: x + = (A θ m+ I) (A θ k I)x. -.5 - Ritz values -.5 -.5 - -.5.5.5 To understand convergence, one must understand how the Ritz values are distributed over σ(a).
restarting krylov methods restarted Arnoldi algorithm (eigs).5 undesired eigenvalues desired eigenvalues To compute m < k eigenvalues of A C n n, Arnoldi methods restrict A to act on the k-dimensional Krylov subspace.5 Ran(V) = span{x, Ax,..., A k x}. Compute eigenvalues of V AV (Ritz values), and order them by relevance; e.g., if seeking the rightmost eigenvalue of A, let Re θ Re θ 2 Re θ k. Exact shifts [Sorensen 992] restart the method, attempting to improve v with a polynomial filter having the unwanted Ritz values as roots: x + = (A θ m+ I) (A θ k I)x. -.5 - filter root approx. eig. -.5 -.5 - -.5.5.5 To understand convergence, one must understand how the Ritz values are distributed over σ(a).
restarting krylov methods restarted Arnoldi algorithm (eigs).5 undesired desired >=.75 To compute m < k eigenvalues of A C n n, Arnoldi methods restrict A to act on the k-dimensional Krylov subspace.5.5.25 -.25 Ran(V) = span{x, Ax,..., A k x}. Compute eigenvalues of V AV (Ritz values), and order them by relevance; e.g., if seeking the rightmost eigenvalue of A, let Re θ Re θ 2 Re θ k. Exact shifts [Sorensen 992] restart the method, attempting to improve v with a polynomial filter having the unwanted Ritz values as roots: x + = (A θ m+ I) (A θ k I)x. -.5 - filter root approx. eig. -.5 -.75 - -.25 -.5 -.75 -.5 <=-2 -.5 - -.5.5.5 Shade log (magnitude of filter polynomial) To understand convergence, one must understand how the Ritz values are distributed over σ(a).
restarting krylov methods restarted Arnoldi algorithm (eigs).5 undesired desired >=.75 To compute m < k eigenvalues of A C n n, Arnoldi methods restrict A to act on the k-dimensional Krylov subspace.5.5.25 -.25 Ran(V) = span{x, Ax,..., A k x}. Compute eigenvalues of V AV (Ritz values), and order them by relevance; e.g., if seeking the rightmost eigenvalue of A, let Re θ Re θ 2 Re θ k. Exact shifts [Sorensen 992] restart the method, attempting to improve v with a polynomial filter having the unwanted Ritz values as roots: x + = (A θ m+ I) (A θ k I)x. To understand convergence, one must understand how the Ritz values are distributed over σ(a). -.5 - filter root approx. eig. -.5 -.5 - -.5.5.5 -.5 -.75 - -.25 -.5 -.75 <=-2 Shade log (magnitude of filter polynomial) Pushing the language Haim Avron used the previous talk, a standard Krylov method (fixed k) is a sketch-and-solve method, while restarted Krylov methods sketch-to-precondition.
restarting krylov methods restarted Arnoldi algorithm (eigs).5 undesired >=.75 To compute m < k eigenvalues of A C n n, Arnoldi methods restrict A to act on the k-dimensional Krylov subspace.5.5.25 -.25 Ran(V) = span{x, Ax,..., A k x}. Compute eigenvalues of V AV (Ritz values), and order them by relevance; e.g., if seeking the rightmost eigenvalue of A, let Re θ Re θ 2 Re θ k. Exact shifts [Sorensen 992] restart the method, attempting to improve v with a polynomial filter having the unwanted Ritz values as roots: x + = (A θ m+ I) (A θ k I)x. To understand convergence, one must understand how the Ritz values are distributed over σ(a). -.5 - filter root approx. eig. -.5 -.5 - -.5.5.5 -.5 -.75 - -.25 -.5 -.75 <=-2 Shade log (magnitude of filter polynomial) [Sorensen 992] proved convergence for A = A. The process fails for some A A [E. 29], [Duintjer Tebbens, Meurant 22]. Stringent sufficient conditions are known [Carden 2].
Do nonsymmetric matrices enjoy any kind of interlacing?
interlacing is a key to convergence theory for A = A Cauchy s interlacing theorem assures us that Ritz values cannot bunch up at the ends of the spectrum. iteration, k 2 9 8 7 6 5 4 3 2 9 8 7 6 5 4 3 2 2 3 4 5 6 7 8 9 eigenvalues (red lines) and Ritz values (black dots)
interlacing does not hold for A = A The absence of interlacing for non-hermitian problems is the major impediment to a full convergence theory and is the mechanism that allows the method to fail (in theory). iteration, k 2 9 8 7 6 5 4 3 2 9 8 7 6 5 4 3 2 2 3 4 5 6 7 8 9 eigenvalues (red lines) and Ritz values (black dots = real parts)
a pathologically terrible example A monster, built using the construction of [Duintjer Tebbens, Meurant 22]: A = 36288 2 4552 3 69344 4 84672 5 268 6 28224 7 26 64, x =. arxiv:8.234-8 -6-4 -2 2 4 6 8 2 3 4 5 6 7
ritz value localization for non-hermitian matrices Do non-hermitian matrices obey any kind of interlacing theorem? Ritz values must be contained within the numerical range W (A) = {v A v : v = }, a closed, convex subset of C that contains σ(a).
ritz value localization for non-hermitian matrices Do non-hermitian matrices obey any kind of interlacing theorem? Ritz values must be contained within the numerical range W (A) = {v A v : v = }, a closed, convex subset of C that contains σ(a). Consider an extreme example: A =, W (A) = Repeat the following experiment many times: { z C : z } 2. 2 Generate random two dimensional subspaces, V = Ran V, where V V = I. Form V AV C 2 2 and compute Ritz values {θ, θ 2} = σ(v AV). Identify the leftmost and rightmost Ritz values. Since σ(a) = {}, interlacing is meaningless here...
ritz values of a jordan block.8.8.6.6.4.4.2.2.2.2.4.4.6.6.8.8.6.4.2.2.4.6.8.8.8.6.4.2.2.4.6.8 leftmost Ritz value W (A) = { z C : z rightmost Ritz value } 2 2
ritz values of a jordan block.8.8.6.6.4.4.2.2.2.2.4.4.6.6.8.8.6.4.2.2.4.6.8.8.8.6.4.2.2.4.6.8 leftmost Ritz value rightmost Ritz value, random (complex) two dimensional subspaces
three matrices with identical W (A) Compute k = 4 Ritz values for these 8 8 matrices. γ γ3 (γ and γ3 set to give same W (A) for all examples; % = /8.) % %2 %3 %4 %5 %6 %7.
three matrices with identical W (A) Compute k = 4 Ritz values for these 8 8 matrices. γ γ3 % %2 %3 %4 %5 %6 %7 (γ and γ3 set to give same W (A) for all examples; % = /8.) Smallest magnitude of k = 4 Ritz values,, random complex subspaces..
ritz value localization, sorted by real part Using Schur s eigenvalue majorization theorem for Hermitian matrices, we can establish an interlacing-type result. Theorem (Carden & E. 22) Let θ,..., θ k denote the Ritz values of A C n n drawn from a k < n dimensional subspace, labeled by decreasing real part: Re θ Re θ k. Then for j =,..., k, µ n k+j + + µ n k j + Re θ j µ + + µ j, j where µ µ n are the eigenvalues of 2 (A + A ).
ritz value localization, sorted by real part Using Schur s eigenvalue majorization theorem for Hermitian matrices, we can establish an interlacing-type result. Theorem (Carden & E. 22) Let θ,..., θ k denote the Ritz values of A C n n drawn from a k < n dimensional subspace, labeled by decreasing real part: Re θ Re θ k. Then for j =,..., k, µ n k+j + + µ n k j + Re θ j µ + + µ j, j where µ µ n are the eigenvalues of 2 (A + A ). The fact that θ j W (A) gives the well-known bound µ Re θ j µ n, j =,..., k. The theorem provides sharper bounds for interior Ritz values. The interior eigenvalues of 2 (A + A ) give additional insight; cf. eigenvalue inclusion regions of [Psarrakos & Tsatsomeros, 22]. Theorem applies to any subspace Range(V): Krylov, block Krylov, etc.
three matrices with identical W (A) Three matrices with the same W (A), different interior structure; 2 trials. For k = 4, numbers on right indicate max Ritz values in each region. A 4 A 2 23 4 32 A 3 23 4 32
ritz value localization, sorted by magnitude The log-majorization of products of eigenvalues by products of singular values [Marshall, Olkin, Arnold 2] leads to a limit on Ritz value magnitudes. Theorem (Carden & E., 22) Let θ,..., θ k denote the Ritz values of A C n n drawn from a k < n dimensional subspace, labeled by decreasing magnitude: θ θ k. Then for j =,..., k, θ j (s s j ) /j, where s s n are the singular values of A.
ritz value localization, sorted by magnitude The log-majorization of products of eigenvalues by products of singular values [Marshall, Olkin, Arnold 2] leads to a limit on Ritz value magnitudes. Theorem (Carden & E., 22) Let θ,..., θ k denote the Ritz values of A C n n drawn from a k < n dimensional subspace, labeled by decreasing magnitude: θ θ k. Then for j =,..., k, θ j (s s j ) /j, where s s n are the singular values of A. Related results: Zvonimir Bujanovic [2] studies Ritz values of normal matrices from Krylov subspaces in his Ph.D. thesis (Zagreb). Jakeniah Christiansen [22] studies real Ritz values for n = 3 (SIURO).
some closing thoughts Krylov methods can further develop as a prominent tool for RandNLA. Polynomials are better than powers! Krylov methods have great advantages over power/subspace iteration. Block methods hold promise but additional subtleties. Large subspaces can be built rapidly; must maintain linear independence. Restarting is crucial in engineering computations, but analysis is tricky. Restarting controls the subspace dimension, refines the starting vector. Spectral transformations (shift-invert) can vastly accelerate convergence. You are not entirely constrained by the eigenvalue distribution of A. Non-Hermitian problems are solved everyday. The theory is incomplete and monsters are easy to construct, but the Krylov method (as implemented in eigs/arpack) works well. Can Random Matrix Theory shed light on Ritz value locations? What is the probability that A is stable if V AV is stable?