Hessenberg eigenvalue eigenvector relations and their application to the error analysis of finite precision Krylov subspace methods

Similar documents
The Lanczos and conjugate gradient algorithms

THE EIGENVALUE PROBLEM

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

On the influence of eigenvalues on Bi-CG residual norms

M.A. Botchev. September 5, 2014

LARGE SPARSE EIGENVALUE PROBLEMS

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

Iterative methods for Linear System

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Krylov Subspace Methods that Are Based on the Minimization of the Residual

ON ORTHOGONAL REDUCTION TO HESSENBERG FORM WITH SMALL BANDWIDTH

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

Krylov subspace projection methods

AMS526: Numerical Analysis I (Numerical Linear Algebra)

FEM and sparse linear system solving

Preconditioned inverse iteration and shift-invert Arnoldi method

Eigenvalue Problems CHAPTER 1 : PRELIMINARIES

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

Math 504 (Fall 2011) 1. (*) Consider the matrices

Index. for generalized eigenvalue problem, butterfly form, 211

Course Notes: Week 1

6.4 Krylov Subspaces and Conjugate Gradients

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland

MATHEMATICS 217 NOTES

Linear Algebra Review

Numerical Methods I Eigenvalue Problems

Numerical Methods in Matrix Computations

Conceptual Questions for Review

Math 215 HW #9 Solutions

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

Matrices, Moments and Quadrature, cont d

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Equality: Two matrices A and B are equal, i.e., A = B if A and B have the same order and the entries of A and B are the same.

Eigenvalue and Eigenvector Problems

Large-scale eigenvalue problems

On prescribing Ritz values and GMRES residual norms generated by Arnoldi processes

Fundamentals of Engineering Analysis (650163)

Eigenvalues and Eigenvectors

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Introduction to Iterative Solvers of Linear Systems

Introduction to Matrix Algebra

Introduction to Arnoldi method

Conjugate Gradient (CG) Method

Properties of Matrices and Operations on Matrices

Matrices A brief introduction

ECS231 Handout Subspace projection methods for Solving Large-Scale Eigenvalue Problems. Part I: Review of basic theory of eigenvalue problems

APPLIED NUMERICAL LINEAR ALGEBRA

Scientific Computing: An Introductory Survey

Eigenvalues and eigenvectors

Linear Algebra 1. M.T.Nair Department of Mathematics, IIT Madras. and in that case x is called an eigenvector of T corresponding to the eigenvalue λ.

Chapter 5 Eigenvalues and Eigenvectors

235 Final exam review questions

Eigenvalues and Eigenvectors. Review: Invertibility. Eigenvalues and Eigenvectors. The Finite Dimensional Case. January 18, 2018

EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Diagonalizing Matrices

The quadratic eigenvalue problem (QEP) is to find scalars λ and nonzero vectors u satisfying

Linear Algebra: Lecture Notes. Dr Rachel Quinlan School of Mathematics, Statistics and Applied Mathematics NUI Galway

3 Matrix Algebra. 3.1 Operations on matrices

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17

Math 1553, Introduction to Linear Algebra

Eigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis

Foundations of Matrix Analysis

Chapter 6: Orthogonality

and let s calculate the image of some vectors under the transformation T.

Symmetric and anti symmetric matrices

Matrix Vector Products

Math 108b: Notes on the Spectral Theorem

Mathematical Methods wk 2: Linear Operators

1 Extrapolation: A Hint of Things to Come

4. Determinants.

Krylov Subspace Methods for Large/Sparse Eigenvalue Problems

Eigenvalues and Eigenvectors A =

IDR(s) as a projection method

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

Introduction to Applied Linear Algebra with MATLAB

ANSWERS. E k E 2 E 1 A = B

Linear algebra II Tutorial solutions #1 A = x 1

CAAM 335 Matrix Analysis

Lecture Notes 6: Dynamic Equations Part C: Linear Difference Equation Systems

Lecture Summaries for Linear Algebra M51A

Solutions to Final Exam

Model reduction of large-scale dynamical systems

Definitions for Quizzes

c c c c c c c c c c a 3x3 matrix C= has a determinant determined by

Numerical Solution of Linear Eigenvalue Problems

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

2 Eigenvectors and Eigenvalues in abstract spaces.

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

Background Mathematics (2/2) 1. David Barber

Block Bidiagonal Decomposition and Least Squares Problems

Introduction. Chapter One

Numerical Methods for Solving Large Scale Eigenvalue Problems

On Solving Large Algebraic. Riccati Matrix Equations

Inverses of regular Hessenberg matrices

Math 315: Linear Algebra Solutions to Assignment 7

Transcription:

Hessenberg eigenvalue eigenvector relations and their application to the error analysis of finite precision Krylov subspace methods Jens Peter M. Zemke Minisymposium on Numerical Linear Algebra Technical University Hamburg Harburg 10.07.2002

Table Of Contents The Menagerie of Krylov Methods 2 A Unified Matrix Description of Krylov Methods 3 Perturbed Krylov Decompositions 4 A Short Excursion on Matrix Structure 7 Error Analysis Revisited 16 Open Questions 26 HE 2 R EA(FPKM) Jens Zemke 1

The Menagerie of Krylov Methods o Lanczos based methods (short term methods) o Arnoldi based methods (long term methods) o eigensolvers: Av = vλ o linear system solvers: Ax = b o (quasi-) orthogonal residual approaches: (Q)OR o (quasi-) minimal residual approaches: (Q)MR Extensions: o Lanczos based methods: o look-ahead o product-type (LTPMs) o applied to normal equations (CGN) o Arnoldi based methods: o restart (thin/thick, explicit/implicit) o truncation (standard/optimal) HE 2 R EA(FPKM) Jens Zemke 2

A Unified Matrix Description of Krylov Methods Krylov methods as projection onto simpler matrices: o Q H Q = I, Q H AQ = H Hessenberg (Arnoldi), o ˆQ H Q = I, ˆQ H AQ = T tridiagonal (Lanczos) Introduce computed (condensed) matrix C = T, H Q 1 AQ = C AQ = QC Iteration implied by unreduced Hessenberg structure: AQ k = Q k+1 C k, Q k = [q 1,..., q k ], C k K (k+1) k Stewart: Krylov Decomposition Iteration spans Krylov subspace (q = q 1 ): span{q k } = K k = span{q, Aq,..., A k 1 q} HE 2 R EA(FPKM) Jens Zemke 3

Perturbed Krylov Decompositions A Krylov decomposition analogue holds true in finite precision: AQ k = Q k+1 C k F k = Q k C k + q k+1 c k+1,k e T k F k = Q k C k + M k F k We have to investigate the impacts of the method on o the structure of the basis Q k (local orthogonality/duality) o the structure of the computed C k, C k o the size/structure of the error term F k Convergence theory: o is usually based on inductively proven properties: orthogonality, bi-orthogonality, A-conjugacy,... What can be said about these properties? Standard error analysis: o splits into forward and backward error analysis. Does this analysis apply to Krylov methods? HE 2 R EA(FPKM) Jens Zemke 4

All methods fit pictorially into: C k A Q k Q k = 0 F k. This is a perturbed Krylov decomposition, as subspace equation. HE 2 R EA(FPKM) Jens Zemke 5

Examination of the methods has to be done according to o methods directly based on the Krylov decomposition o methods based on a split Krylov decomposition o LTPMs The matrix C k plays a crucial role: o C k is Hessenberg or even tridiagonal (basics), o C k may be blocked or banded (block Krylov methods), o C k may have humps, spikes,... (more sophisticated) The error analysis and convergence theory splits further up: o knowledge on Hessenberg (tridiagonal) matrices o knowledge on orthogonality, duality, conjugacy,... We start with results on Hessenberg matrices. HE 2 R EA(FPKM) Jens Zemke 6

A Short Excursion on Matrix Structure J Λ Jordan matrix of A V right eigenvector-matrix, AV = V J Λ ˆV V H left eigenvector-matrix, ˆV H A = J Λ ˆV H ˇV V T alternate eigenvector-matrix, ˇV T A = J Λ ˇV T χ A (λ) det(λi A) characteristic polynomial of A R(λ) (λi A) 1 resolvent C k (A) kth compound matrix of A adj (A) classical adjoint, adjugate of A A ij A with row i and column j deleted S, S i sign matrices The adjoint of λi A fulfils adj (λi A)(λI A) = det(λi A)I. Suppose that λ is not contained in the spectrum of A. HE 2 R EA(FPKM) Jens Zemke 7

We form the resolvent of λ and obtain adj (λi A) = det (λi A) R(λ) = V ( χ A (λ) Jλ Λ 1 ) ˆV H. The shifted and inverted Jordan matrix looks like J 1 λ λ i = S i E i S i S i (λ λ i ) 1 (λ λ i ) 2... (λ λ i ) k (λ λ i ) 1.... (λ λ i ) 1 S i, The multiplication with the characteristic polynomial allows to cancel the terms with negative exponent. The resulting expression is a source of eigenvalue eigenvector relations. HE 2 R EA(FPKM) Jens Zemke 8

We express the adjugate with the aid of compound matrices, adj A SC n 1 (A T )S. Then we have equality P C n 1 (λi A T ) = (SV S) G (S ˆV H S) (SV S) χ A (λ) E (S ˆV H S). The elements of the compound matrix P are polynomials in λ of the form p ij = p ij (λ; A) det L ji, where L λi A. The elements of G are obviously given by rational functions in λ, since G = χ A (λ) ( i E i ). Many terms cancel, the elements of G are polynomials. We divide by the maximal factor and compute the limes λ λ i. HE 2 R EA(FPKM) Jens Zemke 9

The choice of eigenvectors is based on the non-zero positions i, j in the matrix (the sign matrices are left out): v i (i, j) V ˆV H ˆv H j Amongst others, the well-known result on eigenvalue eigenvector relations by Thompson and McEnteggert is included. This is one of the basic results used in Paige s analysis of the finite precision symmetric Lanczos method. We consider here only the special case of non-derogatory eigenvalues. HE 2 R EA(FPKM) Jens Zemke 10

Theorem: Let A K n n. Let λ l = λ l+1 =... = λ l+k be a geometrically simple eigenvalue of A. Let k+1 be the algebraic multiplicity of λ. Let ˆv H l and v l+k be the corresponding left and right eigenvectors with appropriate normalization. Then holds true. v jlˇv i,l+k = ( 1) (j+i+k) p ji (λ l ; A) λ s λ l (λ l λ s ) The minus one stems from the sign matrices, the polynomial from the definition of the adjoint as matrix of cofactors and the denominator by division with the maximal factor. This setting matches every eigenvalue of non-derogatory A. HE 2 R EA(FPKM) Jens Zemke 11

Unreduced Hessenberg matrices are non-derogatory matrices. This is easily seen by a simple rank argument. In the following let H = H m be unreduced Hessenberg of size m m, rank(h θi) m 1. Many polynomials can be evaluated in case of Hessenberg matrices: Theorem: The polynomial p ji, i j has degree (i 1) + (m j) and can be evaluated as follows: p ji (θ; H) = θi H 1:i 1 R i+1:j 1 0 θi H j+1:m = ( 1) i+j χ H1:i 1 (θ) diag(h i:j, 1)χ Hj+1:m (θ). HE 2 R EA(FPKM) Jens Zemke 12

Denote by H(m) the set of unreduced Hessenberg matrices of size m m. The general result on eigenvalue eigenvector relations can be simplified to read: Theorem: Let H H(m). Let i j. Let θ be an eigenvalue of H with multiplicity k + 1. Let s be the unique left eigenvector and ŝ H be the unique right eigenvector to eigenvalue θ. Then holds true. ( 1) k š(i)s(j) = χ H1:i 1 χ Hj+1:m (θ) χ (k+1) H 1:m j 1 l=i h l+1,l (1) Remark: We ignored the implicit scaling in the eigenvectors imposed by the choice of eigenvector-matrices, i.e. by Š T S = I. HE 2 R EA(FPKM) Jens Zemke 13

Among these relations of special interest is the case of index pairs (i, m), (1, m) and (1, m), (1, j): ( 1) k š(i)s(m) = ( 1) k š(1)s(m) = ( 1) k š(1) s(j) = χ H1:i 1 χ (k+1) H 1:m 1 χ (k+1) H 1:m χ Hj+1:m χ (k+1) H 1:m (θ) (θ) (θ) m 1 l=i m 1 l=1 j 1 l=1 h l+1,l, h l+1,l, h l+1,l. These relations are used to derive relations between eigenvalues and one eigenvector. They are also of interest for the understanding of the convergence of Krylov methods, at least in context of Krylov eigensolvers. HE 2 R EA(FPKM) Jens Zemke 14

Theorem: Let H H(m). Let θ be an eigenvalue of H. Then ŝ = š defined by non-zero š(1) and the relations š(i) š(1) = χ H i 1 (θ) i 1 l=1 h l+1,l i m, is (up to scaling) the unique left eigenvector of H to eigenvalue θ. Theorem: Let H H(m). Let θ be an eigenvalue of H. Then s defined by non-zero s(m) and the relations s(j) s(m) = χ H j+1:m (θ) ml=j+1 h l,l 1 j m, is (up to scaling) the unique right eigenvector of H to eigenvalue θ. Since the polynomials remain unchanged, merely the eigenvalue moves, this helps to explain convergence behaviour (even in finite precision). HE 2 R EA(FPKM) Jens Zemke 15

Error Analysis Revisited For simplicity we assume that the perturbed Krylov decomposition M k = AQ k Q k C k + F k is diagonalisable, i.e. that A and C k are diagonalisable. Theorem: The recurrence of the basis vectors in eigenparts is given by ( ) λi θ j ˆv H i y j + ˆv i HF ks j ˆv H i q k+1 = c k+1,k s kj i, j(, k). This local error amplification formula consists of: o the left eigenpart of q k+1 : ˆv H i q k+1, o a measure of convergence: ( λ i θ j ) ˆv H i y j, o an error term: ˆv H i F ks j, o an amplification factor: c k+1,k s kj. HE 2 R EA(FPKM) Jens Zemke 16

A R 100 100 normal, eigenvalues equidistant in [0, 1]. 10 5 Floating point Arnoldi. Example (4a) normal matrix Absolute values in logarithmic scale 10 0 10 5 10 10 10 15 10 20 10 25 real convergence estimated residual actual eigenpart 1 sqrt(eps) eps 0 10 20 30 40 50 60 70 80 90 100 Step number of floating point Arnoldi Behaviour of CGS-Arnoldi, MGS-Arnoldi, DO-Arnoldi, convergence to largest eigenvalue. HE 2 R EA(FPKM) Jens Zemke 17

A R 100 100 non-normal, eigenvalues equidistant in [0, 1]. 10 5 Floating point Arnoldi. Example (4b) non normal matrix Absolute values in logarithmic scale 10 0 10 5 10 10 10 15 10 20 10 25 real convergence estimated residual actual eigenpart 1 sqrt(eps) eps 0 10 20 30 40 50 60 70 80 90 100 Step number of floating point Arnoldi Behaviour of CGS-Arnoldi, MGS-Arnoldi, DO-Arnoldi, convergence to largest eigenvalue. HE 2 R EA(FPKM) Jens Zemke 18

A = A T R 100 100, random entries in [0, 1]. Perron root well separated. 10 5 Floating point symmetric Lanczos. Example (5a) non negative matrix Absolute values in logarithmic scale 10 0 10 5 10 10 10 15 10 20 10 25 real convergence estimated residual actual eigenpart 1 sqrt(eps) eps 0 5 10 15 20 25 30 Step number of floating point symmetric Lanczos Behaviour of symmetric Lanczos, convergence to eigenvalue of largest modulus. HE 2 R EA(FPKM) Jens Zemke 19

A = A T R 100 100, random entries in [0, 1]. Perron root well separated. 10 5 Floating point symmetric Lanczos. Example (5b) non negative matrix Absolute values in logarithmic scale 10 0 10 5 10 10 10 15 10 20 10 25 real convergence estimated residual actual eigenpart 1 sqrt(eps) eps 0 10 20 30 40 50 60 70 80 Step number of floating point symmetric Lanczos Behaviour of symmetric Lanczos, convergence to eigenvalue of largest and second largest modulus. HE 2 R EA(FPKM) Jens Zemke 20

The formula depends on the Ritz pair of the actual step. eigenvector basis we can get rid of the Ritz vector : Using the I = SS 1 = SŠ T e l = SŠ T e l k j=1 š lj s j. Theorem: The recurrence between vectors q l and q k+1 is given by k j=1 c k+1,k s kj š lj ˆv i H λ i θ q k+1 = ˆv i H q l + ˆv i H F k j ( ) k šlj s j. j=1 λ i θ j For l = 1 we obtain a formula that reveals how the errors affect the recurrence from the beginning: k j=1 c k+1,k s kj š 1j ˆv i H λ i θ q k+1 = ˆv i H q 1 + ˆv i H F k j ( ) k š1j s j. j=1 λ i θ j HE 2 R EA(FPKM) Jens Zemke 21

Interpretation: The size of the deviation depends on the size of the first component of the left eigenvector ŝ j of C k and the shape and size of the right eigenvector s j. Next step: Application of the eigenvector eigenvalue relation ( 1) k š(i)s(j) = χ H1:i 1 χ Hj+1:m (θ) χ (k+1) H 1:m j 1 l=i h l+1,l. Theorem: The recurrence between basis vectors q 1 and q k+1 can be described by k j=1 s j kp=1 c p+1,p ( ) ( ) ˆv i H θs θ j λi θ q k+1 = ˆv i H q 1 + ˆv i H F k j k j=1 ( š1j λ i θ j ) s j HE 2 R EA(FPKM) Jens Zemke 22

A result from polynomial interpolation (Lagrange): k j=1 l j 1 ( ) ( ) = θj θ l λi θ j = 1 χ Ck (λ i ) 1 χ Ck (λ i ) k j=1 l j (λ i θ l ( ) θj θ l l j The following theorem holds true: Theorem: The recurrence between basis vectors q 1 and q k+1 can be described by ˆv H i q k+1 = χ C k (λ i ) kp=1 c p+1,p ˆv H i q 1 + ˆv H i F k k j=1 ( š1j λ i θ j ) s j. HE 2 R EA(FPKM) Jens Zemke 23

Similarly we can get rid of the eigenvectors s j in the error term: ( e T k š1j l j=1 λ i θ j ) s j = ( k š1j s lj j=1 λ i θ j ) lp=1 c p+1,p χ Cl+1:k (λ i ) = χ Ck (λ i ) This results in the following theorem: Theorem: The recurrence between basis vectors q 1 and q k+1 can be described by ˆv i H q k+1 = χ C k (λ i ) kp=1 ˆv i H c q 1 + ˆv i H k lp=1 c p+1,p χ Cl+1:k (λ i ) f l p+1,p χ Ck (λ i ) = χ C k (λ i ) kp=1 c p+1,p ˆv H i q 1 + k l=1 l=1 χ C l+1:k (λ i ) kp=l+1 ˆv i H c f l p+1,p. HE 2 R EA(FPKM) Jens Zemke 24

Multiplication by the right eigenvectors v i and summation gives the familiar result Theorem: The recurrence of the basis vectors of a finite precision Krylov method can be described by q k+1 = χ C k (A) kp=1 c p+1,p q 1 + k l=1 χ C l+1:k (A) kp=l+1 f l c p+1,p. This result holds true even for non-diagonalisable matrices A, C k. The method can be interpreted as an additive mixture of several instances of the same method with several starting vectors. A severe deviation occurs when one of the characteristic polynomials χ Cl+1:k (A) becomes large compared to χ Ck (A). HE 2 R EA(FPKM) Jens Zemke 25

Open Questions o Can Krylov methods be forward or backward stable? o If so, which can? o Are there any matrices A for which Krylov methods are stable? o Does the stability depend on the starting vector? o Are there any a priori results on the behaviour to be expected and the rate of convergence? HE 2 R EA(FPKM) Jens Zemke 26