Contents. 1 Repeated Gram Schmidt Local errors Propagation of the errors... 3

Similar documents
Alternative correction equations in the Jacobi-Davidson method

Alternative correction equations in the Jacobi-Davidson method. Mathematical Institute. Menno Genseberger and Gerard L. G.

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

QR-decomposition. The QR-decomposition of an n k matrix A, k n, is an n n unitary matrix Q and an n k upper triangular matrix R for which A = QR

Krylov subspace projection methods

Quizzes for Math 304

A JACOBI-DAVIDSON ITERATION METHOD FOR LINEAR EIGENVALUE PROBLEMS. GERARD L.G. SLEIJPEN y AND HENK A. VAN DER VORST y

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

The Lanczos and conjugate gradient algorithms

Davidson Method CHAPTER 3 : JACOBI DAVIDSON METHOD

LARGE SPARSE EIGENVALUE PROBLEMS

Chapter 3 Transformations

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

CAAM 336 DIFFERENTIAL EQUATIONS IN SCI AND ENG Examination 1

Linear Algebra. Paul Yiu. Department of Mathematics Florida Atlantic University. Fall A: Inner products

MAT Linear Algebra Collection of sample exams

ITERATIVE PROJECTION METHODS FOR SPARSE LINEAR SYSTEMS AND EIGENPROBLEMS CHAPTER 11 : JACOBI DAVIDSON METHOD

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

5.3 The Power Method Approximation of the Eigenvalue of Largest Module

MATH Linear Algebra

A Jacobi Davidson Method for Nonlinear Eigenproblems

Inexactness and flexibility in linear Krylov solvers

Lecture 11. Linear systems: Cholesky method. Eigensystems: Terminology. Jacobi transformations QR transformation

Lecture 2 Decompositions, perturbations

Numerical Analysis Lecture Notes

The 'linear algebra way' of talking about "angle" and "similarity" between two vectors is called "inner product". We'll define this next.

Homework 5. (due Wednesday 8 th Nov midnight)

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Solutions to Review Problems for Chapter 6 ( ), 7.1

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland

Lecture 9 Least Square Problems

Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence)

Section 6.4. The Gram Schmidt Process

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

The Gram Schmidt Process

The Gram Schmidt Process

MATH 304 Linear Algebra Lecture 34: Review for Test 2.

Class notes: Approximation

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

Stability of the Gram-Schmidt process

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17

Linear Algebra 2 Spectral Notes

a s 1.3 Matrix Multiplication. Know how to multiply two matrices and be able to write down the formula

Preconditioned inverse iteration and shift-invert Arnoldi method

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Homework 11 Solutions. Math 110, Fall 2013.

REVIEW FOR EXAM III SIMILARITY AND DIAGONALIZATION

Algorithms that use the Arnoldi Basis

Designing Information Devices and Systems II

Course Notes: Week 1

Index. for generalized eigenvalue problem, butterfly form, 211

EXAM. Exam 1. Math 5316, Fall December 2, 2012

MATH 115A: SAMPLE FINAL SOLUTIONS

Large-scale eigenvalue problems

Review problems for MA 54, Fall 2004.

Rational Krylov methods for linear and nonlinear eigenvalue problems

Numerical Methods in Matrix Computations

Arnoldi Methods in SLEPc

MATH Spring 2011 Sample problems for Test 2: Solutions

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

NEW A PRIORI FEM ERROR ESTIMATES FOR EIGENVALUES

Lecture notes: Applied linear algebra Part 1. Version 2

AMS526: Numerical Analysis I (Numerical Linear Algebra)

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

c 2005 Society for Industrial and Applied Mathematics

Chapter 6 Inner product spaces

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

5 Compact linear operators

Applied Linear Algebra in Geoscience Using MATLAB

Math 411 Preliminaries

MATH 304 Linear Algebra Lecture 23: Diagonalization. Review for Test 2.

Krylov Subspaces. Lab 1. The Arnoldi Iteration

MATH 235: Inner Product Spaces, Assignment 7

Math 261 Lecture Notes: Sections 6.1, 6.2, 6.3 and 6.4 Orthogonal Sets and Projections

IDR(s) Master s thesis Goushani Kisoensingh. Supervisor: Gerard L.G. Sleijpen Department of Mathematics Universiteit Utrecht

Diagonalizing Matrices

Krylov Subspace Methods that Are Based on the Minimization of the Residual

5.) For each of the given sets of vectors, determine whether or not the set spans R 3. Give reasons for your answers.

Linear Algebra March 16, 2019

An Arnoldi Method for Nonlinear Symmetric Eigenvalue Problems

6.4 Krylov Subspaces and Conjugate Gradients

The parallel computation of the smallest eigenpair of an. acoustic problem with damping. Martin B. van Gijzen and Femke A. Raeven.

Chapter 6: Orthogonality

Numerical Analysis Preliminary Exam 10 am to 1 pm, August 20, 2018

Simple iteration procedure

Linear Algebra. Session 12

Math 2B Spring 13 Final Exam Name Write all responses on separate paper. Show your work for credit.

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Notes on Householder QR Factorization

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS

Jacobi s Ideas on Eigenvalue Computation in a modern context

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition

Assignment #9: Orthogonal Projections, Gram-Schmidt, and Least Squares. Name:

The Conjugate Gradient Method

UNIT 6: The singular value decomposition.

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

Transcription:

Contents 1 Repeated Gram Schmidt 1 1.1 Local errors.................................. 1 1.2 Propagation of the errors.......................... 3

Gram-Schmidt orthogonalisation Gerard Sleijpen December 7, 2001 1 Repeated Gram Schmidt A sequence x 1, x 2,... of vectors of dimension n is orthogonalized by the Gram-Schmidt process into a sequence v 1, v 2,... of orthonormal vectors such that, for each k, the vectors v 1,..., v k span the same space as the first k vectors x i. The construction of the vectors v k is recursive. If V k is the matrix with columns v 1,..., v k then v k+1 is constructed by the Gram-Schmidt process as follows: x = x k+1 V k (V k x k+1), v k+1 = x/ x 2. In exact arithmetic, the operator I V k Vk projects any n-vector on the space orthogonal to span(v). This will not be the case in rounded arithmetic for two reasons: 1. Local errors. The application of I V k Vk to x k+1 will introduce rounding errors. In particular, the computed v k+1 will not be orthogonal to V k. 2. Propagation of the errors. The operator I V k Vk is not an exact orthogonal projector (see 1). Therefore, even an exact application of this operator does not lead to a vector v k+1 that is orthogonal to V k. The negative effects of both aspects can be diminished by repeating the Gram- Schmidt orthogonalization. 1.1 Local errors In rounded arithmetic, we have (neglecting O(u 2 ) terms) x (1) x + x = x VV x V 1 + 2 with 1 nu V x and 2 ku V V x + u x. The error v in v x/ x 2 can be bounded by ( v 2 n k x 2 + k ) k V x 2 + 1 u (n + k) k x 2 u + u. (1) x 2 x 2 x 2 If V is nearly orthogonal then x 2 / x 2 is the reciprocal of the sine of the angle φ between x and the space spanned by V. Mathematical Institute, Utrecht University, P.O. Box 80.010, 3508 TA Utrecht, the Netherlands. E-mail: sleijpen@math.uu.nl Version: December, 1998 1

The component of the error v in span(v) is essentially equal to V V 1 and can be bounded in norm by n ku/ sin(φ) if I V V 2 1. Therefore to keep the loss of orthogonality due to local rounding errors less than δ, with 0 < δ 1, the computable quantity 1/ sin(φ) = x 2 / x 2 should be less than δ/(n ku). If this requirement does not hold then another Gram-Schmidt can be applied. This produces the vector x (2) x + (I VV ) 2 + V 1 + 2 with 1 nu V x and 2 ku V V x + u x. In the estimate of the perturbation, we assumed that x (1) 2 x 2, which is acceptable if, say, x 2 / x 2 0.1/(n ku). Note that, the vector x is, up to machine precision, in the span of V k if x 2 / x 2 > 0.1/(n ku). The error terms V 1 + 2 are in the order of machine precision, relative to x 2. The term (I VV ) 2 can be a factor 1/ sin(φ) larger, but is orthogonal to V and does not contribute to a loss of orthogonality. Therefore, the vector x (2) will be, up to machine precision, orthogonal to V. Note that h V x = h 1 + V x (1) where h 1 V x + 1 and x = x (2) + Vh (I VV ) 2 + O(u). Notes 1 1. The estimate for 2 can be refined. Since fl(γ + ηα) = (γ + ηα(1 + ξ))(1 + ξ) = γ + ηα + ηαξ + (γ + ηα)ξ, we see that and V j [v 1,..., v j]. Hence 2 u V h + u k j=2 V jh j where h j V j x, 2 2 u V 2 h 2 + u j V jh j 2 u ( k + k 1) h 2 1.25 k u h 2. 2. De error vector 2 will have some randomness. Therefore an estimate V 2 2 2 2 is rather pessimistic and V k 2 2 2 2 would be more realistic. n 3. With modified Gram Schmidt, errors are also subject to subsequential orthogonalization. Therefore, with the modified approach, the component of the error 2 in space spanned by V can be significantly smaller. However, the major advantage of the modified approach is in the 1 term. For the error in the intermediate terms, the sine of the angle between the intermediate vectors x V jh j and the space spanned by V is of importance rather than the sine between x and this space. If, for instance, the angle between x and v 1 is small, while the angle with the space spanned by the other vectors v i is non-small, then only the error in the inner product v1 x is of significance and the k term in the estimate (1) for 1 2 can be skipped. But also in this approach the error is proportional to 1/ sin(φ). If x has a small angle with V, but large angles with all v i (as, for instance, for x = k i=1 vi + ɛv k+1), then, also with the modified approach, the k will show up. 4. If we have a good estimate w V h for the vector in the space spanned by V that is close to x (close to Vh, where h V x) then orthogonalization of x to w, followed by the Gram-Schmidt procedure, β w x w w, x = x wβ, h = V x, x = x Vh, h = hβ + h, is stable (i.e., the rounding errors will be in the order of machine precision): the vector x will be nearly orthogonal to V. Therefore, the orthogonalization of x to V will be stable. The rounding errors in the computation of x will be largely diminished by the subsequential orthogonalization. The solution of the Jacobi Davidson correction vector is orthogonal to the Ritz vector. Therefore, the angle between this correction vector and the search subspace is large and there is usually no need to repeat the Gram-Schmidt orthogonalization. 2

function [V,H]=... RepGramSchmidt(X,kappa,delta,i0) [n,k]=size(x); v=x(:,1); gamma=norm(v); H=gamma; V=v/gamma; for j=2:k v=x(:,j); h=v *v; v=v-v*h; gamma=norm(v); beta=norm(h); for i=2:i0 if gamma<delta*beta gamma>kappa*beta break hc=v *v; h=h+hc; v=v-v*hc; gamma=norm(v); beta=norm(hc); H=[H,h;zeros(1,j)]; if gamma>delta*beta H(j,j)=gamma; V=[V,v/gamma]; return Figure 1: Matlab code for repeated Gram Schmidt. For i0=1, we have classical Gram Scmidt. The parameter delta determines when a vector x is consider to be in the span of V. 1.2 Propagation of the errors Consider an n by k matrix V and the interaction matrix M V V. Lemma 2 If M is non-singular then VM 1 V is the orthogonal projection onto the subspace span(v), and I VM 1 V projects onto the orthogonal complement of this subspace. We assume that ɛ E 2 < 1 where E I M. Then M is non singular. With i-times repeated Gram-Schmidt applied to a vector x we int to approximate the component x (I VM 1 V )x 3

of x that is orthogonal to span(v). Therefore, with the result x (i) (I VV ) i x for x of i sweeps with (classical) Gram-Schmidt, we are interested in the error x (i) x. Lemma 3 For each i = 0, 1, 2,... we have (I VV ) i (I VM 1 V ) = VM 1 E i V (2) and (I VV ) i (I VV ) i+1 = VE i V. (3) Proof. A simple induction argument using (I VV )V = VE leads to (3). With (3), we find that (I VV ) i I = V(I + E + E 2 +... + E i 1 )V, and, with a Neumann expansion M 1 = (I E) 1 = I + E + E 2 +..., we obtain (2). Hence x (i) x 2 = VM 1 E i V x 2 = M 1 2 E i V x 2 ɛ i M 1 2 V x 2 ɛi 1 ɛ V x 2. here we used the fact that commutativity of M = V V and E = I M implies that (VM 1 E i V ) VM 1 E i V = (M 1 2 E i V) (M 1 2 E i V). The computable quantity τ 1 V x 2 / x (1) 2 is close to the cotangens τ of the angle between x and span(v): τ = VM 1 V x 2 x 2 = M 1 2 V x 2. x 2 The relative error in x (i) can be bounded in terms of τ: Theorem 4 Now we can relate τ to τ 1 : x (i) x 2 x 2 ɛ i τ. τ 1 = V x 2 x (1) 2 1 1 ɛ M 1 2 V x 2 x 2 x x (1) 2 1 τ 1 ɛ 1 ɛτ Similarly Therefore: τ 1 = V x 2 x (1) 2 τ 1 ɛ 1 + ɛτ. Corollary 5 If ɛ(τ + 1) 1 then τ τ 1. 4

To estimate how much the computable cotangens τ i V x (i 1) 2 / x (i) 2 for x (i) reduces in a Gram Schmidt sweep, note that x (i) x (i+1) 1 2 x (i) ɛ i M 2 V x 2 2 x 2 x x (i) 2 ɛi τ 1 1 ɛ 1 ɛ i τ ɛi τ. Hence τ i+1 V x (i) 2 x (i+1) = EV x (i 1) 2 V x (i 1) 2 2 x (i+1) ɛ 2 x (i) 2 x (i) x (i+1) 2 ɛ 1 ɛ i τ τ i ɛ i τ. The following result tells us what the effect is of expanding a basis of a subspace with a vector that is not exactly orthogonal to this space. Theorem 6 Consider a vector v such that v 2 = 1. Put V + [V, v]. Then, with ɛ V V I 2 and δ V v 2, we have that V+V + I 2 1(ɛ + 2 ɛ 2 + 4δ 2 ) min(ɛ + δ, ɛ + δ2 ). (4) ɛ Proof. If µ i are the eigenvalues of E = I V V and ν i are the components of V v in the direction of the associated eigenvectors of E then the eigenvalues λ j of E + I V+V + satisfy λ j = νi 2. (5) λ i j µ i Since max µ i ɛ we have that i ν 2 i λ µ i i ν 2 i λ µ + = δ2 λ ɛ for λ ɛ. (6) Then, λ + 1(ɛ+ 2 ɛ 2 + 4δ 2 ) ɛ satisfies λ + = δ2 λ + ɛ. From (5) and (6) we can conclude that λ j λ + for all eigenvalues λ j of E +, which proves the theorem. Notes 7 The estimate in (4) is based on a worst case situation: all eigenvalues µ i of E are allowed to be equal to ɛ. In practise, the eigenvalues will be more or less equally distributed over negative and positive values and the factor 4 in (4) can be replaced by a smaller value. In numerical experiments 1.5 appeared to be appropriate. Corollary 8 If we expand V with v x (i) / x (i) 2, V + [V, v], then we have that Proof. Note that δ V v 2 ɛτ i and V +V + I 2 ɛ(1 + min(τ i, τ 2 i )). δ = V v 2 = V x (i) 2 x (i) 2 = EV x (i 1) 2 x (i) 2 ɛτ i. 5

Discussion 9 0. Apparently δ ɛ i τ. If i is such that ɛ i τ u, we have orthogonality up to machine precision. 1. If, say, τ i ɛ i 1 τ 0.1 then the loss of orthogonality is hardly affected by expansion with x (i) / x (i) 2. 2. If τ 10 10 then x may considered to be in the span of V: lucky breakdown. Therefore, we may assume that τ < 10 10 and we take tol 10 11. 3. If ɛ tol then ɛτ 10 1 1. In particular τ 2 10 1 and we may assume that expansion with x (2) / x (2) 2 leads to negligible non-orthogonality (see 1; twice is enough). 4. If τ > 10 8 we have to repeat Gram-Schmidt in order to avoid pollution by local rounding errors (see 1.1). 5. Suppose we expand V k by v k+1 x (i) / x (i) 2. Since the τ i are computable, we may estimate the loss of orthogonality Vk V k I k 2 by ɛ k that can recursively be computed as ( ) ɛ k+1 = 1ɛ 2 k 1 + 1 + 4τi 2. We may add a modest multiple of u to accommodate for the local rounding errors (see 4). 6. If, for each k, we select i such that τ i tol/ɛ k then δ tol and we know a priori that V mv m I m 2 m tol. The recursively computed upper bound ɛ m may be much smaller than m tol. criterion τ i tol/ɛ k is a dynamical one. 7. If v k+1,... v k+j have been formed by two sweeps of Gram Schmidt than these vectors do not form an orthogonality problem (see 1 and 3). Therefore, if the next expansion vector requires two sweeps of Gram-Schmidt, the second sweep can be restricted to the vectors v 1,..., v k. Unfortunately, the second sweep can not be restricted to the vectors created by unrepeated Gram-Schmidt, since a vector that is sufficiently orthogonal to its predecessors need not to be orthogonal enough to the subsequential ones. 8. In the standard strategy, Gramm-Schmidt is repeated if the sine of the angle is less than κ with κ = 0.5 as popular choice. This is equivalent to the criterion repeat if τ i 1 κ 2 /κ. For the popular choice of κ = 0.5, τ i will be less than 1.733 and we allow ɛ k to grow with a factor 1 2 1 (1 + + 4τi 2 ) = 2.303 in each expansion step. In k = 45 steps the orthogonality may be completely lost, i.e., Vk V k I 2 1). Remark 10 Rather than orthogonalizing to full precision, as is the aim of repeated Gramm Schmidt, one can also orthogonalize with the operator I VM 1 V, or with some accurate and convenient approximation of this operator: The x = x Vh where h h + Eh with h V x. (7) Here, we assumed that E 2 u which justifies the approximation of M 1 = (I E) 1 by I + E. The approach in (7) avoids large errors due to loss of orthogonality in the columns of V. It can not diminish local rounding errors. However, the criterion for keeping local rounding errors small is much less strict than the criterion for keeping errors small that are due to a non-orthogonal V. 6

To compute E we have to evaluate inner products that form the coordinates of V x (1), as in the second sweep of Gram Schmidt. However, in the variant in (7), we do not have to update the vector x (1) to form x (2). Instead, we have to update the low dimensional vector h in all subsequential orthogonalization steps. As a more efficient variant, we can store in E only those vectors V x (1) for which the cotangens V x 2 / x (1) 2 is non-small, e.g., as in the criterion for repeating Gram Schmidt. With L the lower triangular part of E and U the upper triangular part, we have that (I +U)(I +L) = I +L+U +UL and UL 2 E 2 2 u if E 2 u. Therefore, if we neglect errors of order u, then we have that VM 1 V = V((I + E)V ) = (V(I + U))(V(I + U)). Note that... + VU is precisely the update of the orthogonalization that has been skipped. If, for some reason (as restart), there is some need to form the more accurate orthogonal basis then it can easily be done. If V has to be updated by some k by l matrix S, then V = V(S + US) efficiently incorporates the postponed orthogonalization. In the Arnoldi process, the upper Hessenberg matrix H should also be update. Since U is upper triangular, the updated matrix (I U)H(I + U) is also upper Hessenberg. 7

function [V,H,E]=... RepGramSchmidt(X,kappa,kappa0,delta) [n,k]=size(x); v=x(:,1); gamma=norm(v); H=gamma; V=v/gamma; E=0; for j=2:k v=x(:,j); h=v *v; h=h-e*h; v=v-v*h; gamma=norm(v); beta=norm(h); hcs=zeros(j-1,1); if gamma>delta*beta if gamma<kappa0*beta hc=v *v; h=h+hc; v=v-v*hc; beta=norm(hc); elseif gamma<kappa*beta hcs = (V *v)/gamma; H=[H,h;zeros(1,j)]; if gamma>delta*beta H(j,j)=gamma; V=[V,v/gamma]; E=[E,hcs;hcs,0]; return Figure 2: Matlab code for Gram Schmidt with modified projections. The parameter delta determines when a vector x is consider to be in the span of V. kappa0 determines the size of local errors (kappa0 = 1.e -3 means that we accept an error of size 10 +3 u). kappa determines when to modify the projections. 8