Convergence Theory for Iterative Eigensolvers

Similar documents
Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Key words. Krylov subspace methods, Arnoldi algorithm, eigenvalue computations, containment gap, pseudospectra

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

c 2009 Society for Industrial and Applied Mathematics

On prescribing Ritz values and GMRES residual norms generated by Arnoldi processes

On the Ritz values of normal matrices

Convergence of Polynomial Restart Krylov Methods for Eigenvalue Computations

Numerical Methods I Eigenvalue Problems

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

LARGE SPARSE EIGENVALUE PROBLEMS

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland

The Lanczos and conjugate gradient algorithms

Eigenvalue Problems CHAPTER 1 : PRELIMINARIES

Arnoldi Methods in SLEPc

ECS231 Handout Subspace projection methods for Solving Large-Scale Eigenvalue Problems. Part I: Review of basic theory of eigenvalue problems

Recent advances in approximation using Krylov subspaces. V. Simoncini. Dipartimento di Matematica, Università di Bologna.

On the Superlinear Convergence of MINRES. Valeria Simoncini and Daniel B. Szyld. Report January 2012

On the influence of eigenvalues on Bi-CG residual norms

PERTURBED ARNOLDI FOR COMPUTING MULTIPLE EIGENVALUES

Krylov subspace projection methods

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Iterative methods for Linear System

A HARMONIC RESTARTED ARNOLDI ALGORITHM FOR CALCULATING EIGENVALUES AND DETERMINING MULTIPLICITY

The Eigenvalue Problem: Perturbation Theory

Preconditioned inverse iteration and shift-invert Arnoldi method

c 2004 Society for Industrial and Applied Mathematics

Pseudospectra and Nonnormal Dynamical Systems

Eigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Scientific Computing: An Introductory Survey

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

THE STABILITY OF GMRES CONVERGENCE, WITH APPLICATION TO APPROXIMATE DEFLATION PRECONDITIONING. 16 September 2011

Algorithms that use the Arnoldi Basis

FEM and sparse linear system solving

Iterative solvers for saddle point algebraic linear systems: tools of the trade. V. Simoncini

Key words. linear equations, polynomial preconditioning, nonsymmetric Lanczos, BiCGStab, IDR

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

Approximating the matrix exponential of an advection-diffusion operator using the incomplete orthogonalization method

MULTIGRID ARNOLDI FOR EIGENVALUES

Charles University Faculty of Mathematics and Physics DOCTORAL THESIS. Krylov subspace approximations in linear algebraic problems

ABSTRACT. Professor G.W. Stewart

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers

HARMONIC RAYLEIGH RITZ EXTRACTION FOR THE MULTIPARAMETER EIGENVALUE PROBLEM

of dimension n 1 n 2, one defines the matrix determinants

Department of Computer Science, University of Illinois at Urbana-Champaign

Course Notes: Week 1

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Stabilization and Acceleration of Algebraic Multigrid Method

Index. for generalized eigenvalue problem, butterfly form, 211

6.4 Krylov Subspaces and Conjugate Gradients

MATH 583A REVIEW SESSION #1

ELA

Numerical Methods in Matrix Computations

Preface to the Second Edition. Preface to the First Edition

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Davidson Method CHAPTER 3 : JACOBI DAVIDSON METHOD

Inexact and Nonlinear Extensions of the FEAST Eigenvalue Algorithm

Alternative correction equations in the Jacobi-Davidson method

The rational Krylov subspace for parameter dependent systems. V. Simoncini

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Rational Krylov methods for linear and nonlinear eigenvalue problems

Math 504 (Fall 2011) 1. (*) Consider the matrices

Finding Rightmost Eigenvalues of Large, Sparse, Nonsymmetric Parameterized Eigenvalue Problems

Inexactness and flexibility in linear Krylov solvers

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

M.A. Botchev. September 5, 2014

Lecture notes: Applied linear algebra Part 2. Version 1

Iterative Methods for Linear Systems of Equations

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems

Model order reduction of large-scale dynamical systems with Jacobi-Davidson style eigensolvers

Numerical Methods for Solving Large Scale Eigenvalue Problems

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

A refined Lanczos method for computing eigenvalues and eigenvectors of unsymmetric matrices

ITERATIVE PROJECTION METHODS FOR SPARSE LINEAR SYSTEMS AND EIGENPROBLEMS CHAPTER 11 : JACOBI DAVIDSON METHOD

MICHIEL E. HOCHSTENBACH

Lecture 3: Inexact inverse iteration with preconditioning

Pseudospectra and Nonnormal Dynamical Systems

Analysis Preliminary Exam Workshop: Hilbert Spaces

The rate of convergence of the GMRES method

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems

The EVSL package for symmetric eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota

Solving Ill-Posed Cauchy Problems in Three Space Dimensions using Krylov Methods

The Conjugate Gradient Method

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems

randomized block krylov methods for stronger and faster approximate svd

Matrix functions and their approximation. Krylov subspaces

An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection For Reinforcement Learning

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

ILAS 2017 July 25, 2017

A hybrid reordered Arnoldi method to accelerate PageRank computations

Linear Algebra Review

Math 108b: Notes on the Spectral Theorem

No books, no notes, no calculators. You must show work, unless the question is a true/false, yes/no, or fill-in-the-blank question.

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

ON BEST APPROXIMATIONS OF POLYNOMIALS IN MATRICES IN THE MATRIX 2-NORM

PRECONDITIONED ITERATIVE METHODS FOR LINEAR SYSTEMS, EIGENVALUE AND SINGULAR VALUE PROBLEMS. Eugene Vecharynski. M.S., Belarus State University, 2006

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

The Life Cycle of an Eigenvalue Problem

Transcription:

Convergence Theory for Iterative Eigensolvers Mark Embree Virginia Tech with Chris Beattie, Russell Carden, John Rossi, Dan Sorensen RandNLA Workshop Simons Institute September 28

setting for the talk Let A C n n be a large square matrix, potentially non-hermitian (A A ). Computing all eigenvalues of A is too expensive (and usually not needed). Thus we seek m n distinguished eigenvalues relevant to our application (largest, smallest, rightmost, etc.)

setting for the talk Let A C n n be a large square matrix, potentially non-hermitian (A A ). Computing all eigenvalues of A is too expensive (and usually not needed). Thus we seek m n distinguished eigenvalues relevant to our application (largest, smallest, rightmost, etc.) Projection Methods V C n = k-dimensional subspace of C n, the projection subspace for A The columns of V C n k for an orthonormal basis for V: V V = I, V AV C k k We hope some eigenvalues of V AV σ(v AV) = {θ,..., θ k } approximate some eigenvalues of A. σ(a) = {λ,..., λ n} For example, θ λ,... θ m λ m for some m k.

setting for the talk Let A C n n be a large square matrix, potentially non-hermitian (A A ). Computing all eigenvalues of A is too expensive (and usually not needed). Thus we seek m n distinguished eigenvalues relevant to our application (largest, smallest, rightmost, etc.) Projection Methods V C n = k-dimensional subspace of C n, the projection subspace for A The columns of V C n k for an orthonormal basis for V: V V = I, V AV C k k We hope some eigenvalues of V AV σ(v AV) = {θ,..., θ k } approximate some eigenvalues of A. σ(a) = {λ,..., λ n} For example, θ λ,... θ m λ m for some m k. This talk mainly describes established results for the deterministic case, with some thoughts from a RandNLA perspective along the way.

why consider A A? While Hermitian problems are common (SVD, quantum mechanics, etc.), many important applications lead to non-hermitian problems and subtler issues of spectral perturbation theory. Many examples: [Trefethen, E. 25]. atmospheric science Farrell... fluid flow stability Trefethen; Schmid & Henningson;... damped mechanical systems Cox & Zuazua,... control theory Hinrichsen & Pritchard,... data-driven modeling Antoulas, Beattie, Gugercin,... lasers Landau, Siegman,... ecology May, Caswell,... Markov chains Diaconis,... 2.5.5 - -2-3 -4-5 -.5 - -.5-2 -2.5-2 -.5 - -.5.5.5-6 -7-8 -9 - computed rightmost eigenvalues for stability of 2d flow over a backward facing step, n = 38,539 [E., Keeler 27]

An Overview of Projection-Based Eigensolvers

projection-based eigensolvers V C n = k-dimensional subspace of C n, the projection subspace for A. The columns of V C n k form an orthonormal basis for V: V V = I, V AV C k k We hope some eigenvalues of V AV σ(v AV) = {θ,..., θ k } approximate some eigenvalues of A. σ(a) = {λ,..., λ n} For example, θ λ,... θ m λ m for some m k.

projection-based eigensolvers V C n = k-dimensional subspace of C n, the projection subspace for A. The columns of V C n k form an orthonormal basis for V: V V = I, V AV C k k We hope some eigenvalues of V AV σ(v AV) = {θ,..., θ k } approximate some eigenvalues of A. σ(a) = {λ,..., λ n} For example, θ λ,... θ m λ m for some m k. Power method V = span{a p x} (minimal storage, easy to implement, can be slow) Subspace iteration (more storage, subtler to implement, computes repeated eigs) V = Range(A p X) for X C n k [Halko, Martinsson, Tropp 2] et al.

projection-based eigensolvers: krylov methods Power method V = span{a p x} (minimal storage, easy to implement, can be slow) Subspace iteration (more storage, subtler to implement, computes multiple eigs) V = Range(A p X) for X C n b Krylov subspace methods (growing subspace dimension; higher powers of A) V = span{x, Ax, A 2 x,..., A k x} Block Krylov methods (subspace dimension grows quickly: dim(v) kb) V = Range([X AX A 2 X A k X]) for X C n b SVD: [Musco & Musco 25], [Drineas et al. 28] Must balance benefit of large k with block size b, storage.

projection-based eigensolvers: krylov methods (extensions) Krylov subspace methods (growing subspace dimension; higher powers of A) V = span{x, Ax, A 2 x,..., A k x} Restarted Krylov (used in eigs: filter ψ improves starting vector) V = span{ψ(a)x, Aψ(A)x, A 2 ψ(a)x,..., A k ψ(a)x} Polynomial Preconditioned Krylov V = span{x, π(a)x, π(a) 2 x,..., π(a) k x} (very high degree polys, care needed) Shift-Invert Krylov (used in eigs: ideal for eigenvalues near µ) V = span{x, (A µi) )x, (A µi) 2 x,..., (A µi) (k ) x} Rational Krylov (helps for finding eigenvalues in a region) V = span{x, (A µ ) x, (A µ 2) x,..., (A µ k I) x}

preliminaries: spectral structure of A Distinct eigenvalues of A: λ, λ 2,..., λ n Spectral projectors P j and invariant subspaces U j : P j := (zi A) dz, U j := Range(P j ), 2πi Γ j Γ j is a contour in C containing λ j but no other distinct eigenvalues. If A = A and λ j is simple with unit eigenvector u j, then P j = u j u j. P j is a projector onto the invariant subspace U j, but P j need not be an orthogonal projector when A A. The spectral projectors give a resolution of the identity: n j= P j = I.

preliminaries: spectral structure of A Distinct eigenvalues of A: λ, λ 2,..., λ n Spectral projectors P j and invariant subspaces U j : P j := (zi A) dz, U j := Range(P j ), 2πi Γ j Γ j is a contour in C containing λ j but no other distinct eigenvalues. If A = A and λ j is simple with unit eigenvector u j, then P j = u j u j. P j is a projector onto the invariant subspace U j, but P j need not be an orthogonal projector when A A. The spectral projectors give a resolution of the identity: P g := P + + P m, U g := Range(P g), m = dim(u g). n j= P j = I. P b := I P g, U b := Range(P b ), dim(u b ) = n m.

preliminaries: angles between subspaces V = approximating subspace. For our problems, V = K k (A, x) := span{x, Ax, A 2 x,..., A k x}. U g = desired invariant subspace Measure convergence via the containment gap: δ(u g, V) = max u U g sin (u, V) = max u U g u v min v V u. u min u v v V (u, V) V We will monitor how δ(u g, K k (A, x)) develops as k increases.

example convergence behavior, A A δ(ug, Kk(A, x)) k < m warm-up -5 - sublinear convergence (starting vector bias) (nonorthogonal eigenvectors) steady convergence (eigenvalue location) -5 6 5 5 subspace dimension, k cf. GMRES convergence model of [Nevanlinna 993] example adapted from [Beattie, E., Rossi 24]

basic convergence model Building on [Saad 98, 983], [Jia 995], [Sorensen 22], [Beattie, E., Rossi 24], and others... Theorem [Beattie, E., Sorensen 25]. Suppose U g is reachable from the Krylov space K k (A, x). Then for k 2m, δ(u g, K k (A, x)) C C 2 min φ P k 2m max z Ω b α(z)φ(z). C = C (A, x) = measure of starting vector bias. C 2 = C 2(A, Ω b ) = measure of eigenvector departure from orthogonality. P k 2m = set of polynomials of degree k 2m or less. Ω b C contains the undesired eigenvalues. α(z) = (z λ ) (z λ m).

Invariant Subspaces reachable by Krylov Subspaces

reachable invariant subspaces If x C n lacks a component in the desired eigenvector, e.g., P x =, the desired eigenvalue/eigenvector is invisible to Krylov methods (in exact arithmetic). For example, in the power method A p x = n n λ p j P j x = + λ p j P j x, j= j=2 the eigenvalue λ has no influence. (We will address this more later.) A different problem arises when A has repeated eigenvalues with linearly indpendent eigenvectors (derogatory eigenvalues). A simple example: A = I (identity matrix). K k (A, x) = span{x, Ax,..., A k x} = span{x}. The Krylov method converges in one step (happy breakdown), exactly finding one copy of the eigenvalue λ = and eigenvector x.

reachable invariant subspaces A more perplexing example from Chris Beattie [Beattie, E., Rossi 24]: A =, x = c. By taking c large, we bias x toward the eigenvector [,,,, ] T. For any c the Krylov method breaks down (happily) at iteration k = 3, discovering the Jordan block and invariant subspace c V AV = S S, Range. The Krylov method finds the 3 3 Jordan block with eigenvector [,,,, ] T. Only a set of measure zero x can discover [,,,, ] T.

reachable invariant subspaces Unlucky choices of x C n can (in principle) prevent the Krylov method from seeing a desired (simple) eigenvector. This behavior is fragile to numerical computations, since infinitesimal perturbations to x will add a small component in the desired eigenvector. Single-vector Krylov methods can (in principle) find one Jordan block associated with each eigenvalue. This behavior is fragile to numerical computations, since infinitesimal perturbations split multiple eigenvalues. Block Krylov methods (with block size b, X C n b ) K k (A, X) = Range([X AX A 2 X A k X]), can find b linearly independent eigenvectors for a single eigenvalue.

How does the starting vector x affect convergence?

effect of starting vector on convergence Henceforth assume the desired invariant subspace U is reachable from x: U g K n(a, x). How does x influence convergence? For a single eigenpair (λ, u ) with spectral projector P, Saad [98] gives sin (u, K k (A, x)) P x min φ P k φ(λ )= (I P )ψ(a), The leading constant grows as the orientation of x toward u diminishes.

effect of starting vector on convergence Henceforth assume the desired invariant subspace U is reachable from x: U g K n(a, x). How does x influence convergence? For a single eigenpair (λ, u ) with spectral projector P, Saad [98] gives sin (u, K k (A, x)) P x min φ P k φ(λ )= (I P )ψ(a), The leading constant grows as the orientation of x toward u diminishes. For m-dimensional invariant subspaces U g, our bounds replace / P x with ψ(a)p b x C := max ψ P m ψ(a)p = max P b v gx v K m(a,x) P, gv the ratio of the bad to the good component in the worst approximation to U g from the m-dimensional Krylov space.

effect of starting vector on convergence A = symmetric matrix (n = 28) with 8 desired eigenvalues in [, 2]; 2 undesired eigenvalues in [, ]. θ = (x, U g), the angle between x and its best approximation in U g. δ(ug, Kk(A, x)) θ = 6 θ = 9 θ = 2 θ = 5 θ = θ = 3 iteration, k

effect of starting vector on convergence A = symmetric matrix (n = 28) with 8 desired eigenvalues in [, 2]; 2 undesired eigenvalues in [, ]. θ = (x, U g), the angle between x and its best approximation in U g. θ = 5 θ = 3 P b v C = max v K m(a,x) P gv δ(ug, Kk(A, x)) -5 - θ = 6 θ = 9 θ = 2 θ = 5 - - - [BER 24] bound -5 8 5 2 25 3 35 iteration, k

How do the eigenvalues of A affect convergence?

asymptotic convergence rate determined by eigenvalues min φ P k 2m max z Ω b α(z)φ(z) P k 2m = set of polynomials of degree k 2m or less. Ω b C contains the undesired eigenvalues. α(z) = (z λ ) (z λ m). The polynomial approximation problem gives convergence like C γ k for some constant C and rate γ. When A = A, Ω b = [λ m+, λ n], and use Chebyshev polynomials to compute the convergence rate γ. When Ω b is a simply connected open subset of C, use conformal mapping to approach the approximation problem.

potential theoretic determination of the convergence rate Step : Begin by identifying undesired ( ) and desired ( ) eigenvalues.

potential theoretic determination of the convergence rate Step 2: Bound bad eigenvalues with Ω b.

potential theoretic determination of the convergence rate Step 3: Conformally map C \ Ω b to the exterior of the unit disk.

potential theoretic determination of the convergence rate Step 4: Find the lowest level curve of the Green s function intersecting a good eigenvalues.

potential theoretic determination of the convergence rate Step 5: Invert map to obtain curves of constant convergence rate in the original domain. Black circles on final figure are Fejér points, asymptotically optimal interpolation points for φ.

convergence rate: and granularity of the spectrum If convergence is very slow, perhaps you are solving the wrong problem.

convergence rate: and granularity of the spectrum If convergence is very slow, perhaps you are solving the wrong problem. Consider m =, where we can use the elementary bound [Saad 98] sin (u, K k (A, x)) P x min φ P k φ(λ )= (I P )ψ(a). Suppose A = A and we seek leftmost eigenvalue λ, where λ < λ 2 λ n. The error bound suggests the progress made at each iteration is like γ := κ κ +, where κ := λn λ λ 2 λ. When A is a discretization of an unbounded operator, we expect λ n = A as n. The convergence rate goes to one as n. Thus Krylov subspace methods often perform poorly for PDE eigenvalue problems unless the set-up is modified.

convergence slows as the discretization improves Apply the Krylov method to discretizations of the Laplacian in one dimension. How does the convergence rate change as the discretization improves? n = 256-2 4 6 8

convergence slows as the discretization improves Apply the Krylov method to discretizations of the Laplacian in one dimension. How does the convergence rate change as the discretization improves? n = 52 n = 256-2 4 6 8

convergence slows as the discretization improves Apply the Krylov method to discretizations of the Laplacian in one dimension. How does the convergence rate change as the discretization improves? n = 496 n = 248 n = 24 n = 52 n = 256-2 4 6 8

Convergence of Krylov Subspace Projection The problem becomes immediately apparent if we attempt to run Krylov subspace projection on the operator itself, K k (L, f ) = span{f, Lf,..., L k f }. For Lu = u with Dirichlet boundary conditions, u() = u() =, take some starting vector f Dom(L), i.e., f () = f () =. In general Lf Dom(L), so we cannot build the next Krylov direction L 2 f = L(Lf ). The Krylov algorithm breaks down at the third step.

Convergence of Krylov Subspace Projection The problem becomes immediately apparent if we attempt to run Krylov subspace projection on the operator itself, K k (L, f ) = span{f, Lf,..., L k f }. For Lu = u with Dirichlet boundary conditions, u() = u() =, take some starting vector f Dom(L), i.e., f () = f () =. In general Lf Dom(L), so we cannot build the next Krylov direction L 2 f = L(Lf ). The Krylov algorithm breaks down at the third step. The operator setting suggests that we instead apply Krylov to L : K k (L, f ) = span{f, L f,..., L (k ) f }. In this case, L is a beautiful compact operator: (L f )(x) = f + C + C x, where we choose C and C so that (L f )() = (L f )() =.

krylov projection applied to the operator We run the Krylov method on L exactly in Mathematica. Denote the eigenvalue estimates at the kth iteration as θ (k) θ (k) 2 θ (k) k. j3 (k) j! 6jj 4 j = 8-4 -8 j = -2 2 3 4 5 6 7 8 9 2 3 4 5 6 k Observe superlinear convergence as k increases. For CG and GMRES applied to operators, see [Winther 98], [Nevanlinna 993], [Moret 997], [Olver 29], [Kirby 2]. For superlinear convergence in finite dimensional settings, see [van der Sluis, van der Vorst, 986], [van der Vorst, Vuik, 992], [Beattie, E., Rossi 24], [Simoncini, Szyld 25].

krylov projection applied to the operator We run the Krylov method on L exactly in Mathematica. Denote the eigenvalue estimates at the kth iteration as θ (k) θ (k) 2 θ (k) k. j3 (k) j! 6jj 4 j = 8-4 -8 j = -2 2 3 4 5 6 7 8 9 2 3 4 5 6 k Observe superlinear convergence as k increases. For CG and GMRES applied to operators, see [Winther 98], [Nevanlinna 993], [Moret 997], [Olver 29], [Kirby 2]. For superlinear convergence in finite dimensional settings, see [van der Sluis, van der Vorst, 986], [van der Vorst, Vuik, 992], [Beattie, E., Rossi 24], [Simoncini, Szyld 25]. This mode of computation is preferred for discretization matrices as well: the shift-invert Arnoldi method uses K k ((A µi), x).

polynomial preconditioning: a cheap spectral transformation Replace the conventional Krylov space K k (A, x) = span{x, Ax, A 2 x,..., A k x} with the polynomial preconditioned transformation K k (π(a), x) = span{x, π(a)x, π(a) 2 x,..., π(a) k x}. [Thornquist 26], [E., Loe, Morgan arxiv:86.82] Use the polynomial π to separate the interesting eigenvalues. Often increases matvecs, but decreases iterations (hence orthogonalization). For example, with Hermitian A, take π to be the degree-d MINRES residual polynomial [Paige & Saunders 975], which attains min p(a)b. p P d p()= This polynomial tends to separate smallest-magnitude eigenvalues.

polynomial preconditioning: a cheap spectral transformation Polynomial preconditioning: Hermitian A, MINRES polynomial..2.8.6.4.2 -.2 -.4.2.8.6.4.2 -.2 -.4.5.5 2.5.5 2.2.8.6.4.2 -.2 -.4.2.8.6.4.2 -.2 -.4.5.5 2 horizontal axis: σ(a).5.5 2 vertical axis σ(π(a))

How do the eigenvectors of A affect convergence?

bounding functions of a matrix If A is normal (A A = AA ), eigenvectors are orthogonal. For f analytic on σ(a), f (A) = max f (λ). λ σ(a) For nonnormal A, the situation is considerably more complicated. If A is diagonalizable, A = UΛU, then f (A) U U max f (λ). λ σ(a) For the numerical range (field of values) W (A) = {v Av : v = }, ( f (A) + ) 2 max f (z). z W (A) For the ε-pseudospectrum σ ε(a) = {z σ(a + E) for some E < ε}, f (A) L ε 2πε max f (z), z σ ε(a) where L ε is the boundary length of σ ε(a).

constant to account for nonnormality C 2 = C 2(A, Ω b ) comes from bounding f (A Ub ), f (z) = α(z)φ(z). Theorem [Beattie, E., Sorensen 25]. Suppose U g is reachable from the Krylov space K k (A, x). Then for k 2m, δ(u g, K k (A, x)) C C 2 min φ P k 2m max z Ω b α(z)φ(z). If Ω b = σ(a Ub ) (no defective eigenvalues), then C 2 = U b U + b, where the columns of U b C n (n m) are eigenvectors of A Ub. If Ω b = W (A Ub ) then C 2 = + 2. If Ω b = σ ε(a Ub ) then C 2 = L ε/(2πε). Tension: balance C 2 verses size of Ω b.

transient behavior of the power method Large coefficients in the expansion of x in the eigenvector basis can lead to cancellation effects in x k = A k x. Example: here different choices of α and β affect eigenvalue conditioning, α 4α 8αβ/2 A = 3/4 β, u =, u 2 =, u 3 = 2β/3 3/4. u3 u3 6 u2 x x 2 x 4 x 6 x 8 x x 7 5 x 3 x u x x,..., x 8 u2 u u2 x 3 x 5 x 7 x u3 x x 2 x 4 x 6 x 8 u α = β = α =, β = α =, β = [Trefethen & E. 25]

restarting krylov suspace methods an essential tool for controlling subspace dimension

restarting krylov methods restarted Arnoldi algorithm (eigs).5 undesired eigenvalues desired eigenvalues To compute m < k eigenvalues of A C n n, Arnoldi methods restrict A to act on the k-dimensional Krylov subspace.5 Ran(V) = span{x, Ax,..., A k x}. Compute eigenvalues of V AV (Ritz values), and order them by relevance; e.g., if seeking the rightmost eigenvalue of A, let Re θ Re θ 2 Re θ k. Exact shifts [Sorensen 992] restart the method, attempting to improve v with a polynomial filter having the unwanted Ritz values as roots: x + = (A θ m+ I) (A θ k I)x. -.5 - -.5 -.5 - -.5.5.5 To understand convergence, one must understand how the Ritz values are distributed over σ(a).

restarting krylov methods restarted Arnoldi algorithm (eigs).5 undesired eigenvalues desired eigenvalues To compute m < k eigenvalues of A C n n, Arnoldi methods restrict A to act on the k-dimensional Krylov subspace.5 Ran(V) = span{x, Ax,..., A k x}. Compute eigenvalues of V AV (Ritz values), and order them by relevance; e.g., if seeking the rightmost eigenvalue of A, let Re θ Re θ 2 Re θ k. Exact shifts [Sorensen 992] restart the method, attempting to improve v with a polynomial filter having the unwanted Ritz values as roots: x + = (A θ m+ I) (A θ k I)x. -.5 - Ritz values -.5 -.5 - -.5.5.5 To understand convergence, one must understand how the Ritz values are distributed over σ(a).

restarting krylov methods restarted Arnoldi algorithm (eigs).5 undesired eigenvalues desired eigenvalues To compute m < k eigenvalues of A C n n, Arnoldi methods restrict A to act on the k-dimensional Krylov subspace.5 Ran(V) = span{x, Ax,..., A k x}. Compute eigenvalues of V AV (Ritz values), and order them by relevance; e.g., if seeking the rightmost eigenvalue of A, let Re θ Re θ 2 Re θ k. Exact shifts [Sorensen 992] restart the method, attempting to improve v with a polynomial filter having the unwanted Ritz values as roots: x + = (A θ m+ I) (A θ k I)x. -.5 - filter root approx. eig. -.5 -.5 - -.5.5.5 To understand convergence, one must understand how the Ritz values are distributed over σ(a).

restarting krylov methods restarted Arnoldi algorithm (eigs).5 undesired desired >=.75 To compute m < k eigenvalues of A C n n, Arnoldi methods restrict A to act on the k-dimensional Krylov subspace.5.5.25 -.25 Ran(V) = span{x, Ax,..., A k x}. Compute eigenvalues of V AV (Ritz values), and order them by relevance; e.g., if seeking the rightmost eigenvalue of A, let Re θ Re θ 2 Re θ k. Exact shifts [Sorensen 992] restart the method, attempting to improve v with a polynomial filter having the unwanted Ritz values as roots: x + = (A θ m+ I) (A θ k I)x. -.5 - filter root approx. eig. -.5 -.75 - -.25 -.5 -.75 -.5 <=-2 -.5 - -.5.5.5 Shade log (magnitude of filter polynomial) To understand convergence, one must understand how the Ritz values are distributed over σ(a).

restarting krylov methods restarted Arnoldi algorithm (eigs).5 undesired desired >=.75 To compute m < k eigenvalues of A C n n, Arnoldi methods restrict A to act on the k-dimensional Krylov subspace.5.5.25 -.25 Ran(V) = span{x, Ax,..., A k x}. Compute eigenvalues of V AV (Ritz values), and order them by relevance; e.g., if seeking the rightmost eigenvalue of A, let Re θ Re θ 2 Re θ k. Exact shifts [Sorensen 992] restart the method, attempting to improve v with a polynomial filter having the unwanted Ritz values as roots: x + = (A θ m+ I) (A θ k I)x. To understand convergence, one must understand how the Ritz values are distributed over σ(a). -.5 - filter root approx. eig. -.5 -.5 - -.5.5.5 -.5 -.75 - -.25 -.5 -.75 <=-2 Shade log (magnitude of filter polynomial) Pushing the language Haim Avron used the previous talk, a standard Krylov method (fixed k) is a sketch-and-solve method, while restarted Krylov methods sketch-to-precondition.

restarting krylov methods restarted Arnoldi algorithm (eigs).5 undesired >=.75 To compute m < k eigenvalues of A C n n, Arnoldi methods restrict A to act on the k-dimensional Krylov subspace.5.5.25 -.25 Ran(V) = span{x, Ax,..., A k x}. Compute eigenvalues of V AV (Ritz values), and order them by relevance; e.g., if seeking the rightmost eigenvalue of A, let Re θ Re θ 2 Re θ k. Exact shifts [Sorensen 992] restart the method, attempting to improve v with a polynomial filter having the unwanted Ritz values as roots: x + = (A θ m+ I) (A θ k I)x. To understand convergence, one must understand how the Ritz values are distributed over σ(a). -.5 - filter root approx. eig. -.5 -.5 - -.5.5.5 -.5 -.75 - -.25 -.5 -.75 <=-2 Shade log (magnitude of filter polynomial) [Sorensen 992] proved convergence for A = A. The process fails for some A A [E. 29], [Duintjer Tebbens, Meurant 22]. Stringent sufficient conditions are known [Carden 2].

Do nonsymmetric matrices enjoy any kind of interlacing?

interlacing is a key to convergence theory for A = A Cauchy s interlacing theorem assures us that Ritz values cannot bunch up at the ends of the spectrum. iteration, k 2 9 8 7 6 5 4 3 2 9 8 7 6 5 4 3 2 2 3 4 5 6 7 8 9 eigenvalues (red lines) and Ritz values (black dots)

interlacing does not hold for A = A The absence of interlacing for non-hermitian problems is the major impediment to a full convergence theory and is the mechanism that allows the method to fail (in theory). iteration, k 2 9 8 7 6 5 4 3 2 9 8 7 6 5 4 3 2 2 3 4 5 6 7 8 9 eigenvalues (red lines) and Ritz values (black dots = real parts)

a pathologically terrible example A monster, built using the construction of [Duintjer Tebbens, Meurant 22]: A = 36288 2 4552 3 69344 4 84672 5 268 6 28224 7 26 64, x =. arxiv:8.234-8 -6-4 -2 2 4 6 8 2 3 4 5 6 7

ritz value localization for non-hermitian matrices Do non-hermitian matrices obey any kind of interlacing theorem? Ritz values must be contained within the numerical range W (A) = {v A v : v = }, a closed, convex subset of C that contains σ(a).

ritz value localization for non-hermitian matrices Do non-hermitian matrices obey any kind of interlacing theorem? Ritz values must be contained within the numerical range W (A) = {v A v : v = }, a closed, convex subset of C that contains σ(a). Consider an extreme example: A =, W (A) = Repeat the following experiment many times: { z C : z } 2. 2 Generate random two dimensional subspaces, V = Ran V, where V V = I. Form V AV C 2 2 and compute Ritz values {θ, θ 2} = σ(v AV). Identify the leftmost and rightmost Ritz values. Since σ(a) = {}, interlacing is meaningless here...

ritz values of a jordan block.8.8.6.6.4.4.2.2.2.2.4.4.6.6.8.8.6.4.2.2.4.6.8.8.8.6.4.2.2.4.6.8 leftmost Ritz value W (A) = { z C : z rightmost Ritz value } 2 2

ritz values of a jordan block.8.8.6.6.4.4.2.2.2.2.4.4.6.6.8.8.6.4.2.2.4.6.8.8.8.6.4.2.2.4.6.8 leftmost Ritz value rightmost Ritz value, random (complex) two dimensional subspaces

three matrices with identical W (A) Compute k = 4 Ritz values for these 8 8 matrices. γ γ3 (γ and γ3 set to give same W (A) for all examples; % = /8.) % %2 %3 %4 %5 %6 %7.

three matrices with identical W (A) Compute k = 4 Ritz values for these 8 8 matrices. γ γ3 % %2 %3 %4 %5 %6 %7 (γ and γ3 set to give same W (A) for all examples; % = /8.) Smallest magnitude of k = 4 Ritz values,, random complex subspaces..

ritz value localization, sorted by real part Using Schur s eigenvalue majorization theorem for Hermitian matrices, we can establish an interlacing-type result. Theorem (Carden & E. 22) Let θ,..., θ k denote the Ritz values of A C n n drawn from a k < n dimensional subspace, labeled by decreasing real part: Re θ Re θ k. Then for j =,..., k, µ n k+j + + µ n k j + Re θ j µ + + µ j, j where µ µ n are the eigenvalues of 2 (A + A ).

ritz value localization, sorted by real part Using Schur s eigenvalue majorization theorem for Hermitian matrices, we can establish an interlacing-type result. Theorem (Carden & E. 22) Let θ,..., θ k denote the Ritz values of A C n n drawn from a k < n dimensional subspace, labeled by decreasing real part: Re θ Re θ k. Then for j =,..., k, µ n k+j + + µ n k j + Re θ j µ + + µ j, j where µ µ n are the eigenvalues of 2 (A + A ). The fact that θ j W (A) gives the well-known bound µ Re θ j µ n, j =,..., k. The theorem provides sharper bounds for interior Ritz values. The interior eigenvalues of 2 (A + A ) give additional insight; cf. eigenvalue inclusion regions of [Psarrakos & Tsatsomeros, 22]. Theorem applies to any subspace Range(V): Krylov, block Krylov, etc.

three matrices with identical W (A) Three matrices with the same W (A), different interior structure; 2 trials. For k = 4, numbers on right indicate max Ritz values in each region. A 4 A 2 23 4 32 A 3 23 4 32

ritz value localization, sorted by magnitude The log-majorization of products of eigenvalues by products of singular values [Marshall, Olkin, Arnold 2] leads to a limit on Ritz value magnitudes. Theorem (Carden & E., 22) Let θ,..., θ k denote the Ritz values of A C n n drawn from a k < n dimensional subspace, labeled by decreasing magnitude: θ θ k. Then for j =,..., k, θ j (s s j ) /j, where s s n are the singular values of A.

ritz value localization, sorted by magnitude The log-majorization of products of eigenvalues by products of singular values [Marshall, Olkin, Arnold 2] leads to a limit on Ritz value magnitudes. Theorem (Carden & E., 22) Let θ,..., θ k denote the Ritz values of A C n n drawn from a k < n dimensional subspace, labeled by decreasing magnitude: θ θ k. Then for j =,..., k, θ j (s s j ) /j, where s s n are the singular values of A. Related results: Zvonimir Bujanovic [2] studies Ritz values of normal matrices from Krylov subspaces in his Ph.D. thesis (Zagreb). Jakeniah Christiansen [22] studies real Ritz values for n = 3 (SIURO).

some closing thoughts Krylov methods can further develop as a prominent tool for RandNLA. Polynomials are better than powers! Krylov methods have great advantages over power/subspace iteration. Block methods hold promise but additional subtleties. Large subspaces can be built rapidly; must maintain linear independence. Restarting is crucial in engineering computations, but analysis is tricky. Restarting controls the subspace dimension, refines the starting vector. Spectral transformations (shift-invert) can vastly accelerate convergence. You are not entirely constrained by the eigenvalue distribution of A. Non-Hermitian problems are solved everyday. The theory is incomplete and monsters are easy to construct, but the Krylov method (as implemented in eigs/arpack) works well. Can Random Matrix Theory shed light on Ritz value locations? What is the probability that A is stable if V AV is stable?