Contents 1 Repeated Gram Schmidt 1 1.1 Local errors.................................. 1 1.2 Propagation of the errors.......................... 3
Gram-Schmidt orthogonalisation Gerard Sleijpen December 7, 2001 1 Repeated Gram Schmidt A sequence x 1, x 2,... of vectors of dimension n is orthogonalized by the Gram-Schmidt process into a sequence v 1, v 2,... of orthonormal vectors such that, for each k, the vectors v 1,..., v k span the same space as the first k vectors x i. The construction of the vectors v k is recursive. If V k is the matrix with columns v 1,..., v k then v k+1 is constructed by the Gram-Schmidt process as follows: x = x k+1 V k (V k x k+1), v k+1 = x/ x 2. In exact arithmetic, the operator I V k Vk projects any n-vector on the space orthogonal to span(v). This will not be the case in rounded arithmetic for two reasons: 1. Local errors. The application of I V k Vk to x k+1 will introduce rounding errors. In particular, the computed v k+1 will not be orthogonal to V k. 2. Propagation of the errors. The operator I V k Vk is not an exact orthogonal projector (see 1). Therefore, even an exact application of this operator does not lead to a vector v k+1 that is orthogonal to V k. The negative effects of both aspects can be diminished by repeating the Gram- Schmidt orthogonalization. 1.1 Local errors In rounded arithmetic, we have (neglecting O(u 2 ) terms) x (1) x + x = x VV x V 1 + 2 with 1 nu V x and 2 ku V V x + u x. The error v in v x/ x 2 can be bounded by ( v 2 n k x 2 + k ) k V x 2 + 1 u (n + k) k x 2 u + u. (1) x 2 x 2 x 2 If V is nearly orthogonal then x 2 / x 2 is the reciprocal of the sine of the angle φ between x and the space spanned by V. Mathematical Institute, Utrecht University, P.O. Box 80.010, 3508 TA Utrecht, the Netherlands. E-mail: sleijpen@math.uu.nl Version: December, 1998 1
The component of the error v in span(v) is essentially equal to V V 1 and can be bounded in norm by n ku/ sin(φ) if I V V 2 1. Therefore to keep the loss of orthogonality due to local rounding errors less than δ, with 0 < δ 1, the computable quantity 1/ sin(φ) = x 2 / x 2 should be less than δ/(n ku). If this requirement does not hold then another Gram-Schmidt can be applied. This produces the vector x (2) x + (I VV ) 2 + V 1 + 2 with 1 nu V x and 2 ku V V x + u x. In the estimate of the perturbation, we assumed that x (1) 2 x 2, which is acceptable if, say, x 2 / x 2 0.1/(n ku). Note that, the vector x is, up to machine precision, in the span of V k if x 2 / x 2 > 0.1/(n ku). The error terms V 1 + 2 are in the order of machine precision, relative to x 2. The term (I VV ) 2 can be a factor 1/ sin(φ) larger, but is orthogonal to V and does not contribute to a loss of orthogonality. Therefore, the vector x (2) will be, up to machine precision, orthogonal to V. Note that h V x = h 1 + V x (1) where h 1 V x + 1 and x = x (2) + Vh (I VV ) 2 + O(u). Notes 1 1. The estimate for 2 can be refined. Since fl(γ + ηα) = (γ + ηα(1 + ξ))(1 + ξ) = γ + ηα + ηαξ + (γ + ηα)ξ, we see that and V j [v 1,..., v j]. Hence 2 u V h + u k j=2 V jh j where h j V j x, 2 2 u V 2 h 2 + u j V jh j 2 u ( k + k 1) h 2 1.25 k u h 2. 2. De error vector 2 will have some randomness. Therefore an estimate V 2 2 2 2 is rather pessimistic and V k 2 2 2 2 would be more realistic. n 3. With modified Gram Schmidt, errors are also subject to subsequential orthogonalization. Therefore, with the modified approach, the component of the error 2 in space spanned by V can be significantly smaller. However, the major advantage of the modified approach is in the 1 term. For the error in the intermediate terms, the sine of the angle between the intermediate vectors x V jh j and the space spanned by V is of importance rather than the sine between x and this space. If, for instance, the angle between x and v 1 is small, while the angle with the space spanned by the other vectors v i is non-small, then only the error in the inner product v1 x is of significance and the k term in the estimate (1) for 1 2 can be skipped. But also in this approach the error is proportional to 1/ sin(φ). If x has a small angle with V, but large angles with all v i (as, for instance, for x = k i=1 vi + ɛv k+1), then, also with the modified approach, the k will show up. 4. If we have a good estimate w V h for the vector in the space spanned by V that is close to x (close to Vh, where h V x) then orthogonalization of x to w, followed by the Gram-Schmidt procedure, β w x w w, x = x wβ, h = V x, x = x Vh, h = hβ + h, is stable (i.e., the rounding errors will be in the order of machine precision): the vector x will be nearly orthogonal to V. Therefore, the orthogonalization of x to V will be stable. The rounding errors in the computation of x will be largely diminished by the subsequential orthogonalization. The solution of the Jacobi Davidson correction vector is orthogonal to the Ritz vector. Therefore, the angle between this correction vector and the search subspace is large and there is usually no need to repeat the Gram-Schmidt orthogonalization. 2
function [V,H]=... RepGramSchmidt(X,kappa,delta,i0) [n,k]=size(x); v=x(:,1); gamma=norm(v); H=gamma; V=v/gamma; for j=2:k v=x(:,j); h=v *v; v=v-v*h; gamma=norm(v); beta=norm(h); for i=2:i0 if gamma<delta*beta gamma>kappa*beta break hc=v *v; h=h+hc; v=v-v*hc; gamma=norm(v); beta=norm(hc); H=[H,h;zeros(1,j)]; if gamma>delta*beta H(j,j)=gamma; V=[V,v/gamma]; return Figure 1: Matlab code for repeated Gram Schmidt. For i0=1, we have classical Gram Scmidt. The parameter delta determines when a vector x is consider to be in the span of V. 1.2 Propagation of the errors Consider an n by k matrix V and the interaction matrix M V V. Lemma 2 If M is non-singular then VM 1 V is the orthogonal projection onto the subspace span(v), and I VM 1 V projects onto the orthogonal complement of this subspace. We assume that ɛ E 2 < 1 where E I M. Then M is non singular. With i-times repeated Gram-Schmidt applied to a vector x we int to approximate the component x (I VM 1 V )x 3
of x that is orthogonal to span(v). Therefore, with the result x (i) (I VV ) i x for x of i sweeps with (classical) Gram-Schmidt, we are interested in the error x (i) x. Lemma 3 For each i = 0, 1, 2,... we have (I VV ) i (I VM 1 V ) = VM 1 E i V (2) and (I VV ) i (I VV ) i+1 = VE i V. (3) Proof. A simple induction argument using (I VV )V = VE leads to (3). With (3), we find that (I VV ) i I = V(I + E + E 2 +... + E i 1 )V, and, with a Neumann expansion M 1 = (I E) 1 = I + E + E 2 +..., we obtain (2). Hence x (i) x 2 = VM 1 E i V x 2 = M 1 2 E i V x 2 ɛ i M 1 2 V x 2 ɛi 1 ɛ V x 2. here we used the fact that commutativity of M = V V and E = I M implies that (VM 1 E i V ) VM 1 E i V = (M 1 2 E i V) (M 1 2 E i V). The computable quantity τ 1 V x 2 / x (1) 2 is close to the cotangens τ of the angle between x and span(v): τ = VM 1 V x 2 x 2 = M 1 2 V x 2. x 2 The relative error in x (i) can be bounded in terms of τ: Theorem 4 Now we can relate τ to τ 1 : x (i) x 2 x 2 ɛ i τ. τ 1 = V x 2 x (1) 2 1 1 ɛ M 1 2 V x 2 x 2 x x (1) 2 1 τ 1 ɛ 1 ɛτ Similarly Therefore: τ 1 = V x 2 x (1) 2 τ 1 ɛ 1 + ɛτ. Corollary 5 If ɛ(τ + 1) 1 then τ τ 1. 4
To estimate how much the computable cotangens τ i V x (i 1) 2 / x (i) 2 for x (i) reduces in a Gram Schmidt sweep, note that x (i) x (i+1) 1 2 x (i) ɛ i M 2 V x 2 2 x 2 x x (i) 2 ɛi τ 1 1 ɛ 1 ɛ i τ ɛi τ. Hence τ i+1 V x (i) 2 x (i+1) = EV x (i 1) 2 V x (i 1) 2 2 x (i+1) ɛ 2 x (i) 2 x (i) x (i+1) 2 ɛ 1 ɛ i τ τ i ɛ i τ. The following result tells us what the effect is of expanding a basis of a subspace with a vector that is not exactly orthogonal to this space. Theorem 6 Consider a vector v such that v 2 = 1. Put V + [V, v]. Then, with ɛ V V I 2 and δ V v 2, we have that V+V + I 2 1(ɛ + 2 ɛ 2 + 4δ 2 ) min(ɛ + δ, ɛ + δ2 ). (4) ɛ Proof. If µ i are the eigenvalues of E = I V V and ν i are the components of V v in the direction of the associated eigenvectors of E then the eigenvalues λ j of E + I V+V + satisfy λ j = νi 2. (5) λ i j µ i Since max µ i ɛ we have that i ν 2 i λ µ i i ν 2 i λ µ + = δ2 λ ɛ for λ ɛ. (6) Then, λ + 1(ɛ+ 2 ɛ 2 + 4δ 2 ) ɛ satisfies λ + = δ2 λ + ɛ. From (5) and (6) we can conclude that λ j λ + for all eigenvalues λ j of E +, which proves the theorem. Notes 7 The estimate in (4) is based on a worst case situation: all eigenvalues µ i of E are allowed to be equal to ɛ. In practise, the eigenvalues will be more or less equally distributed over negative and positive values and the factor 4 in (4) can be replaced by a smaller value. In numerical experiments 1.5 appeared to be appropriate. Corollary 8 If we expand V with v x (i) / x (i) 2, V + [V, v], then we have that Proof. Note that δ V v 2 ɛτ i and V +V + I 2 ɛ(1 + min(τ i, τ 2 i )). δ = V v 2 = V x (i) 2 x (i) 2 = EV x (i 1) 2 x (i) 2 ɛτ i. 5
Discussion 9 0. Apparently δ ɛ i τ. If i is such that ɛ i τ u, we have orthogonality up to machine precision. 1. If, say, τ i ɛ i 1 τ 0.1 then the loss of orthogonality is hardly affected by expansion with x (i) / x (i) 2. 2. If τ 10 10 then x may considered to be in the span of V: lucky breakdown. Therefore, we may assume that τ < 10 10 and we take tol 10 11. 3. If ɛ tol then ɛτ 10 1 1. In particular τ 2 10 1 and we may assume that expansion with x (2) / x (2) 2 leads to negligible non-orthogonality (see 1; twice is enough). 4. If τ > 10 8 we have to repeat Gram-Schmidt in order to avoid pollution by local rounding errors (see 1.1). 5. Suppose we expand V k by v k+1 x (i) / x (i) 2. Since the τ i are computable, we may estimate the loss of orthogonality Vk V k I k 2 by ɛ k that can recursively be computed as ( ) ɛ k+1 = 1ɛ 2 k 1 + 1 + 4τi 2. We may add a modest multiple of u to accommodate for the local rounding errors (see 4). 6. If, for each k, we select i such that τ i tol/ɛ k then δ tol and we know a priori that V mv m I m 2 m tol. The recursively computed upper bound ɛ m may be much smaller than m tol. criterion τ i tol/ɛ k is a dynamical one. 7. If v k+1,... v k+j have been formed by two sweeps of Gram Schmidt than these vectors do not form an orthogonality problem (see 1 and 3). Therefore, if the next expansion vector requires two sweeps of Gram-Schmidt, the second sweep can be restricted to the vectors v 1,..., v k. Unfortunately, the second sweep can not be restricted to the vectors created by unrepeated Gram-Schmidt, since a vector that is sufficiently orthogonal to its predecessors need not to be orthogonal enough to the subsequential ones. 8. In the standard strategy, Gramm-Schmidt is repeated if the sine of the angle is less than κ with κ = 0.5 as popular choice. This is equivalent to the criterion repeat if τ i 1 κ 2 /κ. For the popular choice of κ = 0.5, τ i will be less than 1.733 and we allow ɛ k to grow with a factor 1 2 1 (1 + + 4τi 2 ) = 2.303 in each expansion step. In k = 45 steps the orthogonality may be completely lost, i.e., Vk V k I 2 1). Remark 10 Rather than orthogonalizing to full precision, as is the aim of repeated Gramm Schmidt, one can also orthogonalize with the operator I VM 1 V, or with some accurate and convenient approximation of this operator: The x = x Vh where h h + Eh with h V x. (7) Here, we assumed that E 2 u which justifies the approximation of M 1 = (I E) 1 by I + E. The approach in (7) avoids large errors due to loss of orthogonality in the columns of V. It can not diminish local rounding errors. However, the criterion for keeping local rounding errors small is much less strict than the criterion for keeping errors small that are due to a non-orthogonal V. 6
To compute E we have to evaluate inner products that form the coordinates of V x (1), as in the second sweep of Gram Schmidt. However, in the variant in (7), we do not have to update the vector x (1) to form x (2). Instead, we have to update the low dimensional vector h in all subsequential orthogonalization steps. As a more efficient variant, we can store in E only those vectors V x (1) for which the cotangens V x 2 / x (1) 2 is non-small, e.g., as in the criterion for repeating Gram Schmidt. With L the lower triangular part of E and U the upper triangular part, we have that (I +U)(I +L) = I +L+U +UL and UL 2 E 2 2 u if E 2 u. Therefore, if we neglect errors of order u, then we have that VM 1 V = V((I + E)V ) = (V(I + U))(V(I + U)). Note that... + VU is precisely the update of the orthogonalization that has been skipped. If, for some reason (as restart), there is some need to form the more accurate orthogonal basis then it can easily be done. If V has to be updated by some k by l matrix S, then V = V(S + US) efficiently incorporates the postponed orthogonalization. In the Arnoldi process, the upper Hessenberg matrix H should also be update. Since U is upper triangular, the updated matrix (I U)H(I + U) is also upper Hessenberg. 7
function [V,H,E]=... RepGramSchmidt(X,kappa,kappa0,delta) [n,k]=size(x); v=x(:,1); gamma=norm(v); H=gamma; V=v/gamma; E=0; for j=2:k v=x(:,j); h=v *v; h=h-e*h; v=v-v*h; gamma=norm(v); beta=norm(h); hcs=zeros(j-1,1); if gamma>delta*beta if gamma<kappa0*beta hc=v *v; h=h+hc; v=v-v*hc; beta=norm(hc); elseif gamma<kappa*beta hcs = (V *v)/gamma; H=[H,h;zeros(1,j)]; if gamma>delta*beta H(j,j)=gamma; V=[V,v/gamma]; E=[E,hcs;hcs,0]; return Figure 2: Matlab code for Gram Schmidt with modified projections. The parameter delta determines when a vector x is consider to be in the span of V. kappa0 determines the size of local errors (kappa0 = 1.e -3 means that we accept an error of size 10 +3 u). kappa determines when to modify the projections. 8