E. de Sturler 1. Delft University of Technology. Abstract. introduced by Vuik and Van der Vorst. Similar methods have been proposed by Axelsson

Size: px

Start display at page:

Download "E. de Sturler 1. Delft University of Technology. Abstract. introduced by Vuik and Van der Vorst. Similar methods have been proposed by Axelsson"

Dulcie Walters
6 years ago
Views:

1 Nested Krylov Methods Based on GCR E. de Sturler 1 Faculty of Technical Mathematics and Informatics Delft University of Technology Mekelweg 4 Delft, The Netherlands Abstract Recently the GMRESR method for the solution of linear systems of equations has been introduced by Vuik and Van der Vorst. Similar methods have been proposed by Axelsson and Vassilevski [1] and Saad (FGMRES 2 ) [13]. GMRESR involves an outer and an inner method. The outer method is GCR, which is used to compute the optimal approximation over a given set of direction vectors in the sense that the residual is minimized. The inner method is GMRES, which at each step computes a new direction vector by approximately solving the residual equation. This direction vector is then used by the outer algorithm to compute a new approximation. However, the optimality of the approximation over the space of search directions is ignored in the inner GMRES algorithm. This leads to suboptimal corrections to the solution in the outer algorithm. Therefore we propose to preserve the orthogonality relations of GCR in the inner GMRES algorithm. This gives optimal corrections to the solution and also leads to solving the residual equation in a smaller subspace and with an `improved' operator, which should also lead to faster convergence. However, this involves using Krylov methods with a singular, non-symmetric operator. We will discuss some important properties of this. We will show by experiments, that in terms of matrix vector multiplications this modication (almost) always leads to better convergence. Because we do more orthogonalizations, it does not always give an improved performance in time. This depends on the cost of the matrix vector multiplication relative to the cost of orthogonalizations. Of course we can also use other methods than GMRES as the inner method, especially methods with short recurrences like BICGSTAB seem interesting. The experimental results indicate, that especially for such methods it is advantageous to preserve the orthogonality in the inner method. Key words. non-symmetric linear systems, iterative solvers, Krylov subspace, GCR, GMRES, GMRESR, BICGSTAB. AMS(mos) subject classication. 65F10. 1 Introduction For the solution of systems of linear equations the so-called Krylov subspace methods are very popular. However for general matrices no Krylov method can satisfy a global optimality requirement and have short recurrences [8]. Therefore either restarted or truncated versions of optimal methods, like GMRES [14], are used or methods with short recurrences, which do not satisfy a global optimality requirement, like BiCG [9], BICGSTAB [17], BICGSTAB(l) [15], CGS [16] 1 The author wishes to acknowledge Shell Research B.V. and STIPT for the nancial support of his research. 2 Since FGMRES and GMRESR are very similar, the ideas presented will be relevant for FGMRES as well. 1

2 or QMR [11], which satises some quasi-minimization requirement. Recently Vuik and Van der Vorst introduced a new type of method, GMRESR [18], which is a nested GMRES method. Among various other nice properties GMRESR is exible in the sense that a global minimization over some specic part of the Krylov space is done and the dimension of this space and the vectors that span it are controlled by the algorithm, see gure 1. GMRESR: 1. Select x 0, m, tol r 0 = b ; Ax 0, k =0 2. while kr k k 2 >toldo k = k +1 u k = P m k (A)r k;1 c k = Au k for i =1 ::: k; 1 do i = c T i c k c k = c k ; i c i u k = u k ; i u i c k = c k =kc k k 2 u k = u k =kc k k 2 x k = x k;1 + u k c T k r k;1 r k = r k;1 ; c k c T k r k;1 GCR: 1. Select x 0, tol r 0 = b ; Ax 0, k =0 2. while kr k k 2 > tol do k = k +1 u k = r k;1 c k = Au k for i =1,k ; 1 do i = c T i c k c k = c k ; i c i u k = u k ; i u i c k = c k =kc k k u k = u k =kc k k x k = x k;1 +(c T k r k;1)u k r k = r k;1 ; (c T k r k;1)c k P m k (A)r k indicates the GMRES polynomial, that is implicitly constructed in m steps of GMRES Figure 1: the GMRESR algorithm Figure 2: the GCR algorithm The GMRESR algorithm consists of GCR [7], gure 2, as the outer algorithm and m steps of GMRES as an inner method: the computation of P m k (A)r k;1 in Figure 1. The inner GMRES method computes a new search direction by approximately solving the residual equation and then the outer GCR algorithm minimizes the residual over the new search direction and all previously kept search directions u i. The algorithm can be explained as follows. Suppose we are given the system of equations Ax = b, wherea is a real, nonsingular, linear operator and b is a given right hand side. Furthermore we have the two matrices U k =(u 1 u 2 :::u k ) and = AU k (1) with the property that C T k = I k and we want to solve the following minimization problem: Then the solution of this minimization problem is given by min kb ; Axk 2 : (2) x2range(u k ) x = U k C T k b (3) and r = b ; Ax satises r = b ; C T k b r? range( ): (4) 2

3 In fact we have constructed a part of the inverse of A, more precisely the inverse of A over the subspace range( ). This inverse is given by A ;1 C T k = U k C T k : (5) This principle underlies the GCR method, see gure 2, where for the preconditioned case A denotes the explicitly preconditioned operator. In GCR the matrices U k and are constructed such, that the range(u k ) equals the Krylov space K k (A r 0 )=spanfr 0 Ar 0 ::: A k;1 r 0 g, where r 0 = b ; Ax 0 and x 0 is an initial approximation to the solution. Provided GCR does not break down, it is a nite method and at step k it solves the minimization problem (2), with range(u k )=K k (A r 0 ). It is obvious from equations (1)-(2), that in the step u k+1 = r k (in GCR) we can replace r k byany other vector and the algorithm still works in the sense that it computes the minimization problem (2). Only range(u k )=K k (A r 0 ) no longer holds. In general the better the choice of u k+1 the faster convergence will be. The optimal choice of course is u k+1 = e k,wheree k is the error in x k. Therefore the key idea is to choose a suitable approximation to the error e k. In order to nd approximations to e k,we use the relation Ae k = r k (6) and any method which gives an approximate solution to this equation can be used to nd good choices for u k+1.wemay also vary such methods from one step to the next and we can regard them as preconditioners. These observations were made in [18] and lead to the formulation of the GMRESR method, gure 1, or rather methods since many variants are possible. However, since we already have an optimal x 2 range(u k ), (b ; Ax)? range(au k ), we need an approximation u k+1 to e k, such that c k+1 = Au k+1 is also orthogonal to range( ). This is done explicitly by the orthogonalization loop in the outer (GCR) algorithm. Because this is not taken into account in the inner method, the `wrong' minimization problem is solved, since it also considers spurious corrections. Consider step k + 1 in GMRESR we have computed the matrices U k, and the vectors x k and r k. Then, in the inner GMRES loop (after m iterations) we solve the minimization problem and in the outer loop we set min kr y2ir m k ; AV m yk 2 (7) u k+1 = V m y ; U k C T k AV m y=k(i ; C T k )AV m yk 2 c k+1 = (I ; Ck T )AV m y=k(i ; Ck T )AV m yk 2 r k+1 = r k ; (rk T c k+1 )c k+1 : (8) It is obvious, that for optimality we should have solved the minimization problem which is equivalent to min r k ; (I ; C T y2ir m k )AV (9) my2 min (I ; C T y2ir m k )[r k ; AV m y] (10) 2 since r k? range( ). The results of the minimization problems (7) and (10) are equal only if range(av m )? range( ), which is generally not the case. Also in GMRESR we search 3

4 in principle the entire Krylov space K(A r k )=spanfr k Ar k A 2 r k :::g for an approximation u k+1 e k, whereas e k 2 K(A r k ) \ A ;1 range( )? (11) and therefore r k 2 K(A Ar k ) \ range( )? : (12) Another disadvantage of GMRESR is that the inner loop is essentially a restarted GMRES with the same operator A. It therefore also displays some of the problems of restarted GM- RES, most notably it can have the tendency to stagnate in the inner method (see Numerical Experiments). From this we infer,thatitisfavourable to preserve the GCR orthogonality relations also in the inner (GMRES) method, at least for convergence speed (not necessarily for overall performance), see theorem 1 in the next section. From equations (8), (9), (11) and (12) we come to the idea of using (I ; Ck T )A as the operator in the Krylov method in the inner loop. This would generate corrections to the residual c k+1 2 K(A b) \ range( )?.But, it will search for updates to the approximation u k+1 2fr k (I ; Ck T )Ar k :::grange( )? whereas in general e k 62 range( )? : We should note, however, that corrections to the residual c k+1 correspond to corrections to the approximation u k+1 = A ;1 c k+1 2 K(A r k ) \ A ;1 range( )? which corresponds to (11). This may seem like a problem, since we have introduced the operator A ;1. But over the space spanf(i ; C T k )Ar k [(I ; C T k )A]2 r k :::g the inverse of A is known, because we know the inverse of A over range( ) (5) : A ;1 (I ; C T k )A = A ;1 A ; A ;1 C T k A = I ; U k C T k A: (13) In Krylov methods the solution itself is not needed to continue the iteration, only the residual. Therefore we can use a Krylov method like iteration for the residual and nd the corresponding updates to the solution vector using (13). This leads to an extension of the GMRESR method, which has an improved performance for many problems. In fact this leads to a nested method, where the `inner' Krylov method computes an approximation to the error in the `outer' Krylov method and the information acquired in the `outer' method is used to speed up the convergence in the `inner' method. In this article we will consider as inner methods GMRES and BICGSTAB. BICGSTAB has the advantage that its cost per iteration is low, so that we have a relatively cheap method to compute approximations to the error. Using GMRES in the inner method has the following attractive features. First, at the end of the inner iteration the error is minimized over the space spanned by both the search vectors of the outer method and of the inner method, see theorem 1. Second, using a few parameters and truncation we can control the sizes of the Krylov subspaces in both the inner and the outer iteration and hence optimize the length of recurrences and memory requirements versus convergence speed, see [4] or [5] and [10]. In the next section we will discuss the implications of the orthogonalization in the inner method more formally. It will be proved that this leads to an optimal approximation over the space spanned by both the outer and the inner iteration vectors. It also introduces a potential problem: the possibility of breakdown in the generation of the Krylov space in the inner iteration, since we iterate with a singular operator. We will show, however, that such a breakdown is rare, because it can only happen after a specic (generally large) number of iterations. Furthermore, we will also show how to remedy such a breakdown. In most of the following discussion we 4

5 will consider m (not xed) steps of GMRES to be the inner method. Because the optimality of GMRES and the fact, that for a nonsingular operator GMRES does not break down, permit some statements about optimality and isolate the breakdown of the Krylov space generated by (I ; Ck T )A from the other possibilities of breakdown in various Krylov methods. 2 Consequences of inner orthogonalization We will now formally dene the elements of our discourse, repeating some of the previous section for the sake of clarity. Denition 1 The matrix A is a nonsingular, linear operator and the vector b is the given right hand side of the system of equations Ax = b. We dene the Krylov space K(B x) = spanfx Bx B 2 x :::g and the Krylov subspace K s (B x) =spanfx Bx B 2 x : : : B s;1 xg for any linear operator B and any vector x. Furthermore the matrices U k and satisfy the relations U k = (u 1 u 2 :::u k ) (14) = AU k with the property (15) C T k = I k (16) and u i c i 2 K(A b). We also have the vectors x k and r k satisfying Next, we dene the following operators x k : min kb ; Axk 2, (17) x2range(u k ) x k = U k C T k b (18) r k = b ; C T k b (19) r k? range( ): (20) P Ck = C T k (21) and A Ck =(I ; P Ck )A: (22) Because range( ) K(A b), K(A b) \ range( )? is an invariant subspace of A Ck. We use m (not xed) steps of the GMRES algorithm to compute the optimal approximation to r k in the space spanfa Ck r k A 2 r k ::: A m r k g = K m (A Ck A Ck r k ). This leads to the optimal approximation to the solution over the (global) space as follows. range(u k )+A ;1 K m (A Ck A Ck r k ) Theorem 1 Let A, U k,, r k, x k, A Ck and P Ck be as in denition 1. Let fr k A Ck r k A 2 r k ::: A m r k g be independent and fv 1 ::: v m+1 g be an orthonormal basis for K m+1 (A Ck r k ), with v 1 = r k =kr k k 2, generated bym steps of GMRES. This denes the relation A Ck V m = V m+1 Hm.Let y be dened by y : min ~y2ir m kr k ; A Ck V m ~yk 2 = min ~y2ir m kr k ; V m+1 Hm ~yk 2 : (23) 5

6 Then the minimal residual solution of the (inner) GMRES method: (A ;1 A Ck V m y) gives the outer approximation x k+1 = x k + A ;1 A Ck V m y (24) which is also the solution to the global minimization problem x opt : min kb ; A~xk 2 (25) range(v m) ~x2range(u k ) Proof: The solution x opt to the global minimization problem satises x opt : min ~x2range(u k ) range(v m) kb ; A~xk 2, x opt = U k z + V m y : min z2ir k y2ir m kb ; AU k z ; AV m yk 2 : (26) Furthermore, we have by construction of GMRES A Ck V m = V m+1hm, AV m = P Ck AV m + V m+1hm so that the minimization problem in (26) is equivalent to min kb ; z + C z2ir k k T AV m y ; V m+1hm yk 2 : (27) y2ir m Because range( )? range(v m+1 ), (27) is equivalent to z + C T k AV m y = C T k b, z = C T k (b ; AV m y) and by (19) (28) y : min ~y2ir m k(i ; C T k )b ; V m+1 Hm ~yk 2, y : min ~y2ir m kr k ; V m+1 Hm ~yk 2 : (29) This results in x opt = U k z + V m y (30) where y is given by (29), which isequivalent to(23)andz is given by (28). For x k+1 wehave, using (18), (23) and (24) x k+1 = x k + A ;1 (I ; C T k )AV m y (23, 24) = U k C T k b + V my ; U k C T k AV my (18) = U k C T k (b ; AV my)+v m y = U k z + V m y: 2 (31) It also follows from this theorem, that the GCR optimization (in the outer iteration) is given by (24), so that the residual computed in the inner GMRES iteration equals the residual of the outer GCR iteration: r k+1 = b ; Ax k+1 = b ; Ax k ; A Ck V m y = r k ; A Ck V m y = r k ; V m+1 Hm y (32) So the outer method only needs to compute the new u k+1 and c k+1 by c k+1 = (A Ck V m y)=ka Ck V m yk 2 (33) u k+1 = (A ;1 A Ck V m y)=ka Ck V m yk 2 (34) where A Ck V m y and A ;1 A Ck V m y =(I ;U k C T k A)V my have been computed already as the residual update resp. the solution in the inner iteration, see (23) and (24). The outer GCR method consequently becomes very simple. We will now consider the possibility of breakdown, generating a Krylov space with a singular operator. The following theorem may not be a surprising one, but it is given because it captures the essence of the following discussion. 6

7 Theorem 2 Let B be alinear operator, fy By B 2 y : : : B m yg is independent and K m+1 (B y) is an invariant subspace ofb so that fy By B 2 y : : : B m+1 yg is dependent. Then 1. B is nonsingular over K m+1 (B y) and therefore has an inverse over this space ifandonly if fby B 2 y : : : B m+1 yg is independent. 2. Let B m+1 y be givenby B m+1 y = mx i=0 i B i y (35) where B 0 = I. Then B is nonsingular over spanfb j y B j+1 y : : : B m yg and therefore B has an inverse over this space if and only if in (35) j 6=0and i =0, 0 i j ; Let B m+1 y be given by (35) and j 6=0and i =0, 0 i j ; 1. Then B is singular over spanfb j;1 y B j y ::: B m yg. Morespecically and therefore (B j;1 y ; z) 2 null(b): 9z 2 spanfb j y ::: B m yg : Bz = B j y Proof: 1. Assume that B is nonsingular over K m+1 (B y), then dim BK m+1 (B y) = dim K m+1 (B y) and hence fby B 2 y ::: B m+1 yg is independent. Conversely, assume that fby B 2 y ::: B m+1 yg is independent. Then 8x 2 BK m+1 (B y): x = P m i=0 i B i+1 y, where the i are uniquely determined and we can construct ~x = P mi=0 i B i y, so that B~x = x. Because the i are unique and both fy By ::: B m yg and fby B 2 y : : : B m+1 yg are independent, this denes a one-to-one correspondence and hence B must be nonsingular over K m+1 (B y). 2. First assume that in (35) j 6= 0 and i = 0 0 i j ; 1. Then B m+1 y 62 spanfb j+1 y ::: B m yg since fb j y : : : B m yg is independent. Let y = B j y, then fy By ::: B m;j yg is independent, K m+1;j (B y) is an invariant subspace of B and fby B 2 y ::: B m+1;j yg is independent. So that by part 1 of this theorem we can conclude that B is nonsingular over K m+1;j (B y) =spanfb j y B j+1 y : : : B m yg and therefore B has an inverse over this space. Second assume that spanfb j y B j+1 y ::: B m yg is an invariant subspace of B and B is nonsingular over this space. Then obviously B m+1 y 2 spanfb j y B j+1 y : : : B m yg so that in (35) i =0 0ij ; 1. Again let y = B j y so that fy By ::: B m;j yg is independent and K m+1;j (B y) isaninvariant subspace of B. Then by part 1 of this theorem fby B 2 y ::: B m+1;j yg = fb j+1 y B j+2 y : : : B m+1 yg is also independent. Therefore B m+1 y 62 spanfb j+1 y B j+2 y ::: B m yg so that j 6=0. 3. Since B is nonsingular over spanfb j y B j+1 y : : : B m yg, B has an inverse over spanfb j y B j+1 y ::: B m yg, so that part 3 of the theorem follows trivially. 2 Now although GMRES is still optimal in the sense that at each iteration it computes the minimal residual solution over the generated Krylov space, the generation of the Krylov space itself, from a singular operator, may break down. The following simple example shows, that this 7

8 may happen before the solution is found, even though both the solution and the right hand side are in the range of the given (singular) operator. Dene the matrix A: A =(e 2 e 3 e 4 0) where e i denotes the i th Cartesian basis vector. Note that A =(I ; e 1 e T 1 )(e 2 e 3 e 4 e 1 ), which is the same type of operator as A Ck, an orthogonal projection times a nonsingular operator. Now consider the system of equations Ax = e 3. Then GMRES (or any other Krylov method) will search for a solution in the space spanfe 3 Ae 3 A 2 e 3 :::g = spanfe 3 e :::g so we have a breakdown of the Krylov space and the solution is not contained in it, even though the solution exists, is an element of the range of A: Ae 2 = e 3 and the right handsideisinthe orthogonal complement of null(a). So the singular unsymmetric case is quite dierent from the symmetric one. We will now dene breakdown of the Krylov space for the inner GMRES iteration. Denition 2 Let A, U k,, A Ck and r k be as in denition 1. Let fr k A Ck r k ::: A m;1 r k g be independent and fv 1 v 2 ::: v m g be an orthonormal basis for K m (A Ck r k ),withv 1 = r k =kr k k 2, generated bym ; 1 steps of GMRES. We say to have a breakdown of the Krylov subspace if A Ck v m 2 range(v m ), since this implies we can no longer expand the Krylov subspace. We callitalucky breakdown if v 1 2 range(a Ck V m ),because we then have found the solution (the inverse of A is known over the space range(a Ck V m )). We call it a true breakdown if v 1 62 range(a Ck V m ),because then the solution is not contained in the Krylov subspace. The following theorem relates true breakdown to the invariance of the sequence of subspaces in the inner method for the operator A Ck. Part 3 indicates, that it is always known, whether a breakdown is true or lucky. Theorem 3 Let A, U k,, A Ck and r k be as in denition 1. Let fr k A Ck r k ::: A m;1 r k g be independent and fv 1 v 2 ::: v i g be an orthonormal basis for K i (A Ck r k ),fori =1 m,with v 1 = r k =kr k k 2, generated bym ; 1 steps of GMRES. Then at step m: 1. A true breakdown occurs if and only if range(a Ck V m;1 ) is an invariant subspace ofa Ck. 2. A true breakdown occurs if and only if A Ck v m 2 range(a Ck V m;1 ). 3. A breakdown occurs if and only if we can dene H m by A Ck V m = V m H m, furthermore it is a true breakdown if and only if H m is singular. Proof: 1. Assume that a true breakdown occurs. By assumption fr k ::: A m;1 r k g is independent and breakdown implies, that K m (A Ck r)isaninvariant subspace of A Ck. So v 1 62 range(a Ck V m ) implies that A Ck is singular over K m (A Ck r k ). So by theorem 2 fa Ck r k ::: A m r k g is dependent and therefore A m r k 2 spanfa Ck r k ::: A m;1 r k g = range(a Ck V m;1 )andrange(a Ck V m;1 )isaninvariant subspace of A Ck. Conversely assume that range(a Ck V m;1 )=spanfa Ck r k ::: A m;1 r k g is an invariant subspace of A Ck. Obviously then spanfr k ::: A m;1 r k g is an invariant subspace of A Ck,so that a breakdown will occur. Furthermore range(a Ck V m )= spanfa Ck r k ::: A m r k g 8

9 and since A m r k 2 spanfa Ck r k ::: A m;1 r k g by assumption, we have range(a Ck V m ) = range(a Ck V m;1 ). Because fr k ::: A m;1 r k g is independent by assumption, r k 62 range(a Ck V m ) so that a true breakdown occurs. 2. Assume a true breakdown occurs. Then by part 1 of this proof range(a Ck V m;1 )is an invariant subspace and hence range(a Ck V m )=range(a Ck V m;1 ), so that A Ck v m 2 range(a Ck V m;1 ). Assume, on the other hand, that A Ck v m 2 range(a Ck V m;1 ). Then A Ck v m 2 spanfa Ck r k ::: A m;1 r k grange(v m ), so that a breakdown occurs. Furthermore A Ck v m 2 range(a Ck V m;1 ) implies range(a Ck V m )=range(a Ck V m;1 ). Because fr k ::: A m;1 r k g is independent by assumption, r k 62 range(a Ck V m;1 )=range(a Ck V m ), consequently a true breakdown occurs. 3. Assume rst that a breakdown occurs. Then by denition 2, A Ck v m 2 range(v m ), GMRES denes the relation A Ck V m;1 = V mhm;1 and therefore we can dene an H m 2 IR mm such that A Ck V m = V m H m. Second, assume that an H m exists, which satises A Ck V m = V m H m. Then A Ck v m 2 range(v m ) so that a breakdown occurs. Assume that a true breakdown occurs. Let H m be nonsingular, then 9x : H m x = e 1, where e 1 denotes the rst Cartesian basis vector, so that v 1 = V m H m x = A Ck V m x ) v 1 2 range(a Ck V m ): This contradicts our assumptions, which implies that H m must be singular. Assume that H m is singular. Then dim V m H m = dim A Ck V m < m, which implies that fa Ck r k ::: A m r k g is dependent. This, in turn leads to range(a Ck V m ) = spanfa Ck r k ::: A m;1 r k g = range(a Ck V m;1 ), so that a true breakdown occurs (see above or theorem 2. 2 From the previous theorem and its proof, one can already conclude that a true breakdown occurs, if and only if A Ck is singular over K m (A Ck r k ). From denition 1 we know null(a Ck )= range(u k ). We will make this more explicit in the following theorem, which relates true breakdown to the intersection of the inner search space and the outer search space. Theorem 4 Let A, U k,, A Ck and r k be as in denition 1. Let fr k A Ck r k ::: A m;1 r k g be independent and fv 1 v 2 ::: v i g be an orthonormal basis for K i (A Ck r k ),fori =1 m,with v 1 = r k =kr k k 2, generated bym ; 1 steps of GMRES. A true breakdown occurs at step m if and only if 9u 6= 02 range(v m ):u2 range(u k ): Proof: Let u 6= 0 2 range(u k ) and u 2 range(v m ). By denition 1, A Ck u = 0, consequently dim(range(a Ck V m )) <m)fa Ck r k ::: A m r k g is dependent. This implies (see proof of theorem 3 or theorem 2) that range(a Ck V m;1 )isaninvariant subspace of A Ck, so that a true breakdown occurs. Assume that a true breakdown occurs. Then by theorem 3 range(a Ck V m;1 )isaninvariant subspace of A Ck, which implies that fa Ck r k ::: A m r k g is dependent. Therefore 9u 6= 0 2 spanfr k ::: A m;1 r k g = range(v m ):u 2 null(a Ck ) ) u 2 range(u k ): 2 9

10 The following theorem indicates, that breakdown cannot occur in the inner GMRES method, before the total number of iterations exceeds the dimension of the Krylov space K(A b). This means that, in practice, a breakdown will be rare. Theorem 5 Let A, U k,, A Ck and r k be as in denition 1. Let P m = dim(k(a b)), that is m = maxfs : fb Ab : : : A s;1 s bg is independent. Dene P s (A)b = i=0 i A i b with s 6=0 let u i = P li ;1(A)b i =1 ::: k l = max i=1 k l i <m then r k =(I ; P Ck )b = P l (A)b. Wecall l the total number of iterations. Then if j + l<m, j<m; l then fr k A Ck r k ::: A j r k g is independent : and therefore no breakdown occurs in the rst j steps of GMRES. Proof: By denition c i = Au i ) c i 2 spanfab ::: A l bg: Further r k = P l (A)b ) A Ck r k = (I ; P Ck )P 0 (A)b = P l+1 l+1(a)b and analogously we nd A i r k = P l+i (A)b for i =1 ::: j: So that we have fr k A Ck r k ::: A j r k g = fp l (A)b P l+1 (A)b : : : P l+j (A)bg is independent by denition of m. Therefore no breakdown can occur. 2 We will now show how a true breakdown can be overcome. There are basically two ways to continue: In the inner iteration: by nding a suitable vector to expand the Krylov space. In the outer iteration: by computing the solution of the inner iteration just before breakdown and continue by making one LSQR-step (see below) in the outer iteration. Continuation in the inner iteration. The following theorem indicates how we can nd a suitable vector to expand the Krylov subspace. Theorem 6 Let A, U k,, A Ck and r k be as in denition 1, let fr k A Ck r k ::: A m;1 r k g be independent and fv 1 v 2 ::: v i g be an orthonormal basis for K i (A Ck r k ),fori =1 m,with v 1 = r k =kr k k 2, generated bym ; 1 steps of GMRES. If a true breakdown occurs then which implies 9c 2 range( ):A Ck c 62 range(a Ck V m;1 ) 9c i 2fc 1 ::: c k g : A Ck c i 62 range(a Ck V m;1 ): Proof: By theorem 3 we have A Ck v m 2 range(a Ck V m;1 ). Now assume, that we would stop after the last regular GMRES step, (m ; 1), and compute r k+1 using (32) and c k+1 using (33). Then from r k+1 = r k ; A Ck V m;1 y = kr k k 2 v 1 ; V mhm;1 y, wehave r k+1 2 range(v m ) and therefore A Ck r k+1 2 range(a Ck V m;1 ). Because A is a nonsingular operator r k+1 2 spanfar k+1 ::: A p r k+1 g for some p ) r k+1 = P p j=1 ja j r k+1 ) r k+1 =(I ; P Ck )r k+1 = P p j=1 j(i ; P Ck )A j r k+1 and since r k+1 6=0,r k+1? range( )andr k+1? range(a Ck V m;1 ): 9j :(I ; P Ck )A j r k+1 6= 0 and (I ; P Ck )A j r k+1 62 range(a Ck V m;1 ): 10

11 A Ck r k+1 2 range(a Ck V m;1 ),now let s be dened as s :(I ; P Ck )A s r k+1 62 range(a Ck V m;1 ) 1 j<s:(i ; P Ck )A j r k+1 2 range(a Ck V m;1 ) and then (I ; P Ck )A s r k+1 =(I ; P Ck )A(I ; P Ck )A s;1 r k+1 +(I ; P Ck )AP Ck A s;1 r k+1 where by assumption (I ; P Ck )A s;1 r k+1 2 range(a Ck V m;1 ) ) (I ; P Ck )A(I ; P Ck )A s;1 r k+1 2 range(a Ck V m;1 ). This implies that (I ; P Ck )AP Ck A s;1 r k+1 62 range(a Ck V m;1 ) and since P Ck A s;1 r k+1 2 range( )wehave which in turn implies 9c 2 range( ):A Ck c 62 range(a Ck V m;1 ) 9c i 2fc 1 ::: c k g : A Ck c i 62 range(a Ck V m;1 ): 2 Not any c 2 range( ) will do, as is shown by the following example: Let u 1 = b, then A Ck r k = A Ck (b ; C T k b) =;A Ck C T k b: Therefore we simplymust try the c i until one of them works. Continuation in the outer iteration. Another way to continue after a true breakdown in the inner GMRES iteration is to compute the inner iteration solution just before the breakdown and apply an LSQR-switch (see below) in the outer GCR iteration. The following theorem states the reason why the LSQR-switch must be applied. Theorem 7 Let A, U k,, A Ck and r k be as in denition 1, let fr k A Ck r k ::: A m;1 r k g be independent and fv 1 v 2 ::: v i g be an orthonormal basis for K i (A Ck r k ), for i = 1 m, with v 1 = r k =kr k k 2, generated bym ; 1 steps of GMRES. If a true breakdown occurs at step m and we compute the solution at the previous GMRES step and switch to the outer method and make one GCR step (24), (32), (33) and (34). So we compute r k+1 =r k ; V mhm;1 y and c k+1 =A Ck V m;1 y=ka Ck V m;1 yk. Then we will have stagnation in the next inner GMRES iterations, that is r k+1? spanfa Ck+1 r k+1 A 2 +1 r k+1 :::g: Proof: We have c k+1 2 range(a Ck V m;1 ), furthermore r k+1 2 range(v m )anda Ck r k+1 2 range(a Ck V m;1 )by theorem 3. A Ck+1 =(I ;c k+1 c T )A k+1 and from this and theorem 3 follows that range(a Ck V m;1 ) is also a invariant subspace of A Ck+1 and A Ck+1 r k+1 2 range(a Ck V m;1 ): 2 The reason for the stagnation is, that the new residual r k+1 remains in the same Krylov space K(A Ck r k ), which contains a u 6= 02 range(u k ). So we have to `leave' this Krylov space. We can do this using the so-called LSQR-switch, which was introduced in [18] to remedy stagnation in the inner method. Just as in the GMRESR method, stagnation in the inner method will result in a breakdown in the outer GCR method, because the residual cannot be updated. The following theorem indicates, that the LSQR-switch always works. 11

12 Theorem 8 If stagnation occurs in the inner GMRES method, that is then we can continue by (LSQR switch) setting min ~y2ir m;1 kr k+1 ; A Ck V m;1 ~yk 2 = kr k+1 k 2 c k+2 = (I ; +1 C T k+1)aa T r k+1 and u k+2 = (A ;1 (I ; +1 C T k+1)aa T r k+1 where is a normalization constant. This leads to r k+2 = r k+1 ; (r T k+1c k+2 )c k+2 and x k+2 = x k+1 +(r T k+1c k+2 )u k+2 which always gives an improved approximation. Therefore these vectors can also be used as the start vectors for a new Krylov subspace in the inner GMRES method instead if desired. Proof: (following the proof in [18]) because r k+1? range(+1 ) and so c T k+2r k+1 = r T k+1aa T r k+1 c T k+2r k+1 = ka T r k+1 k 2 2 6= 0: So this step will always give an improvement to the residual. This also proves, that (I ; +1 C T k+1 )AAT r k+1 62 range(a Ck V m;1 ), because r k+1? range(a Ck V m;1 ), so that we may also use this vector to build a Krylov space in the inner iteration after a true breakdown in the previous inner iteration. 2 3 Implementation A straightforward implementation of the GCR-method and an orthogonalizing inner method will obviously result in a large number of vector updates, innerproducts and vector scalings. We will therefore describe an implementation which strongly reduces this work. This implementation is mainly due to [10]. First we will consider the outer GCR method, next the inner GMRES method and nally the implementation of BICGSTAB as the inner algorithm. Instead of the matrices U k and (see denition 1) we will use in the actual implementation the matrices ^Uk, Ck, N k, Z k and d k which are dened below. Denition 3 Let A, U k,, b and r k be as in denition 1, then ^Uk, Ck, N k, Z k and d k are dened as follows. = Ck N k where (36) N k = diag(kc 1 k ;1 kc 2 2k ;1 ::: kc 2 kk ;1 2 ) (37) A ^U k = C k Z k (38) where Z k is assumed tobe upper-triangular. Finally d k is dened bytherelation r k = b ; d k (39) 12

13 From this the approximate solution x k, corresponding to r k, is implicitly represented as x k = ^Uk Z ;1 k d k: (40) Using this relation x k can be computed at the end of the complete iteration or before truncation (see end of this chapter). The implicit representation of U k saves all the intermediate updates of previous u i to a new u k+1, which saves about 50% of the computational costs in the outer GCR iteration. Initialization and Restart. If the method is started with an initial guess x 0 we can compute r 0 = b ; Ax 0 and then continue analogously as before and change the updates to r k = r 0 ; C k d k and x k = x 0 + ^U k Z ;1 k d k Also after a restart (see Convergence check at the end at the end of this section) we compute a new result by adding a correction to the previously computed x k. GMRES as inner iteration. Assume we have madek outer iterations, so that ^U k, C k and r k are given. Then, in the inner GMRES iteration the orthogonal matrix V m+1 is constructed such that C k T V m+1 = O and AV m = Ck B m + V m+1 Hm (41) B m = N 2 k C T k AV m (42) This algorithm is equivalent to the usual GMRES algorithm, except that the vectors Av i are rst orthogonalized on Ck.From (41) and (42) it is obvious that AV m ; Ck B m = A Ck V m = V m+1 Hm, cf. theorem 1. Next we compute y according to (23) and we set (cf. (33) without normalization):! c k+1 = V m+1 Hm y (43) ^u k+1 = V m y (44) and this leads to A^u k+1 = AV m y = Ck B m y + V m+1hm y = Ck B m y +c k+1, so that if we set z k+1 = B my the relation A 1 ^U k+1 = C k+1 Z k+1 is again satised. It follows from theorem 1 that the new residual of the outer iteration equals the nal residual of the inner iteration r k+1 = rm inner and is given by r k+1 = r k ; c k+1 (45) so that d k+1 =1. Obviously the residual norm only needs to be computed once. From (45), if we replace the new residual of the outer iteration r k+1 by the residual of the inner iteration, we see an important relation, that holds more generally r inner m c k+1 = r k ; r inner m : (46) This relation is important, since in general (when other Krylov methods are used for the inner iteration) c k+1 or c k+1 cannot be computed from u k+1, because u k+1 is not computed explicitly, nor does a relation like (43) exist. However the (inner) residual is always known, because it is needed for the inner iteration itself, see also the part on BICGSTAB. Therefore in our current implementation (45) is replaced by r k+1 = r inner m = r k ; V m+1 Hm y (47) 13

14 GMRES with A Ck (m steps): (after k outer iterations) 1. Select tol v 1 = r k =kr k k 2 2. for j =1 ::: mdo v j+1 = Av j for i =1 ::: k do b i j = c T i v j+1 v j+1 = v j+1 ; b i j c i for i =1 ::: j do h i j = vi T v j+1 v j+1 = v j+1 ; h i j v i h j+1 j = kv j+1 k 2 v j+1 = h ;1 j+1 j v j+1 Solve min ym kkr k k 2 e 1 ; Hm y m k 2 ^u k+1 = V m y m r inner = r k ; V m+1hm y m : Z 1::k k+1 = B m y m GCRO, generic version: 1. Select x 0, tol r 0 = b ; Ax 0, k =0 2. while kr k k 2 > tol do call inner(! ^u k+1 r inner Z 1::k k+1 ) c k+1 = r k ; r inner (N k+1 ) k+1 = kc k+1 k 2 Z k+1 k+1 =1 d k+1 =(c T r k+1 k)=(n k+1 ) 2 k+1 r k+1 = r k ; d k+1 c k+1 k = k +1 x k = x 0 + ^Uk Z ;1 k d k GCRO, with GMRES: 1. Select x 0, tol r 0 = b ; Ax 0, k =0 2. while kr k k 2 > tol do call gmres(! ^u k+1 r inner, kr inner k 2 Z 1::k k+1 ) c k+1 = r k ; r inner (N k+1 ) k+1 =(kr k k 2 ;kr inner k 2 ) 1=2 Z k+1 k+1 =1 d k+1 =1 r k+1 = r inner k = k +1 x k = x 0 + ^U k Z ;1 k d k Figure 3: with A Ck the inner GMRES algorithm Figure 4: GCRO, a generic version and a special version for inner GMRES with A Ck and (43) by (46). Finally, we need to compute the new coecient ofn k+1, kc k+1 k ;1 2, in order to satisfy the relations in denition 3. The outer iteration then consists only of (45) or (47), d k+1 = 1, the computation of kc k+1 k ;1 2 and (40) at the end. Algorithms for nested GMRES with inner orthogonalization. In gure 3 and gure 4 we give an outline of the inner GMRES algorithm for A Ck and two versions of the outer GCR algorithm: GCRO (because of the implicit inner orthogonalization). One generic version with an arbitrary inner method, with the only requirement, that on output the relation A^u k+1 = r k ; r inner + C k Z 1::k k+1 holds. The other version explicitly uses the relations, which hold after an inner iteration of (m steps of) GMRES, to make a few more optimizations. However, for some problems (with GMRES for A Ck as the inner method) the generic version turns out to be slightly more stable and is then to be preferred. For the experiments discussed later, we used the generic version, even with an inner GMRES for A Ck. BICGSTAB as inner iteration. For the sake of simplicity we will only consider nonpreconditioned BICGSTAB here (or explicitly preconditioned). The algorithm as given in [17] is listed in gure 5. The algorithm contains two matrix vector multiplications with A, which 14

15 BICGSTAB: 1. Select x 0, tol r 0 = b ; Ax 0,^r 0 = r 0 0 = =! 0 =1 v 0 = p 0 =0,i =0 2. while kr i k 2 >toldo i = i +1 i =(^r 0 r i;1 ) =( i = i;1 )(=! i;1 ) p i = r i;1 + (p i;1 ;! i;1 v i;1 ) v i = Ap i = i =(^r 0 v i ) s = r i;1 ; v i t = As! i =(t s)=(t t) x i = x i;1 + p i +! i s r i = s ;! i t BICGSTAB in GCR with orthogonalization: ( after k outer iterations ) 1. Select x 0, maxit, tol, rtol r 0 = b ; Ax 0,^r 0 = r 0 0 = =! 0 =1 v 0 = p 0 =0,i =0 2. while kr i k 2 > max(rtol kr 0 k 2 tol) and i < maxit do i = i +1 i =(^r 0 r i;1 ) =( i = i;1 )(=! i;1 ) p i = r i;1 + (p i;1 ;! i;1 v i;1 ) v i = Ap i y 1 = N 2 k C T k v i v i = v i ; Ck y 1 = i =(^r 0 v i ) s = r i;1 ; v i t = As y 2 = N 2 k C T k t t = t ; Ck y 2! i =(t s)=(t t) x i = x i;1 + p i +! i s r i = s ;! i t z k+1 = z k+1 + y 1 +! i y 2 Figure 5: the BICGSTAB algorithm Figure 6: the BICGSTAB algorithm in GCR with orthogonalization must be replaced by A Ck. The coecients of the orthogonalizations must be saved to satisfy relation (38) at the end of the iteration. This avoids the explicit correction of x i, which will become ^u k+1 in the outer iteration. The implementation is as follows. On initialization we set z k+1 =0,x 0 = 0 and hence r 0 = rk outer. After the statement v i = Ap i we add: and after t = As: y 1 = N 2 k C T k v i v i = v i ; Ck y 1 y 2 = N 2 k C T k t t = t ; y 2 Further, at the end of the iteration, we setz k+1 = z k+1 + y 1 +! i y 2. From this it is easily veried that the relation Ax i = r 0 ; r i + C k z k+1 is satised at the end of each iteration. If some stopping criterion is satised after step i we set ^u k+1 = x i (48)! c k+1 = r 0 ; r i = rk outer ; ri inner cf. (46) (49) z k+1 = z k+1 1 (50) and the (outer iteration) relation (38) is again satised. In this case we have to compute the residual update explicitly. By construction the columns of Ck+1 are orthogonal. So we only 15

16 have to compute kc k+1 k ;1 2 and set: N k+1 = diag(kc 1 k ;1 2 k+1k ;1 2 (51) d k+1 = kc k+1 k ;2 2 c T k+1r k (52) r k+1 = r k ; d k+1 c k+1 (53) After this the relations of denition 3 are again satised. Experience with BICGSTAB as inner iteration indicates, that limiting the number of inner iterations and using a large relative tolerance (w. r. t. to the outer residual norm), gives the optimal interplay between the outer and inner iteration. An often employed strategy is to have a maximum of 25 inner iterations and a relative tolerance of 10 ;2. The inner algorithm is given in gure 6. Truncation in general. When the number of outer iterations becomes large, the multiplication by A Ck will be expensive and we mayrunoutofmemory. Therefore we consider the truncation of the outer iteration. We will give only a general description here, for a more detailed discussion see [4] or [5] and [10]. We choose a matrix W l 2 IR kl,wherel<k, such that: There are now two ways to implement the truncation: W T l W l = I l : (54) 1. First we compute the current approximation according to (40): x k = ^U k Z ;1 k d k, then we compute the new matrices U l and C l as follows: C l = W l = Ck N k W l = Ck Wl where Wl = N k W l (55) U l = U k W l = ^U k Z ;1 k In order to satisfy the relations of denition 3 we set W l (56) x 0 = x k r 0 = r k C l = C l ^Ul = U l Z l = I l d l =0 N l = diag (1 ::: 1) (57) and we replace relation (39) by r l = r 0 ; C l d l, after this we have to add corrections to x 0 so that (40) must be replaced by x l = x 0 + ^Ul Z ;1 l d l 2. The second method does not require the computation of x k and is therefore more in line with the overall algorithm. Instead we usew 1 to compute an implicit representation of x k. w 1 = ;1 N ;1 k d k where = kn ;1 k d kk 2 (58) The other vectors of W l can then be chosen freely as long as they satisfy (54). Furthermore we compute Cl, ^Ul, N l, Z l and r l according to (55)-(57). Finally we setd l = e 1, so that the relation r l = b ; C l d l is satised: r l = b ; C l d l = b ; N k w 1 = b ; N k ;1 N ;1 k d k = b ; d k = r k : Then x l is implicitly represented by x l = ^U l Z ;1 d l (= U l d l ). 16

17 Parallel implementation. Both the GMRESR method and the method just described (GCRO) will run eciently on (large) distributed memory computers if they use a variant of GMRES, that reduces the cost of global communication in the innerproducts, as described in [3] and [2]. Furthermore GCRO with inner GMRES has the advantage that it converges in almost the same number of iterations as (full) GMRES (see Numerical Experiments), however it has much less innerproducts, which require expensive global communication. GCRO with inner GMRES also uses much less memory than GMRES for an equal number of iterations (matrix vector products). Because the cost of global communication is often the bottleneck for fast parallel implementations of GMRES on large distributed memory computers and on these computers memory restrictions can be a severe constraint, GCRO with inner GMRES will perform much better than GMRES for many problems on large distributed memory computers. Convergence check at the end. Since the residual is never explicitly computed from the approximate solution, it is advisable to check at the end, when the solution is nally computed, the true residual. If the norm of the true residual turns out to be larger then the prescribed tolerance, we rst reorthogonalize the true residual on C k and compute the corresponding approximate solution. Generally this is sucient to get the norm of the true residual below the prescribed tolerance. If it is not then the process can be simply restarted, while keeping the old C k and ^U k in order to preserve optimality in these directions. It turns out that (to our experience) only a few inner iterations suce. 4 Numerical Experiments We will discuss the results of three numerical experiments, which concern the solution of two dimensional convection diusion type problems on regular grids, discretized by a nite volume technique, resulting in a pentadiagonal matrix. The system is preconditioned with ILU applied to the scaled system, see [6],[12]. These problems are solved by the following iterative methods: (full)gmres BICGSTAB GMRESR(m): m indicates the number of inner GMRES iterations for each outer iteration. GCRO(m), which is GCR with m GMRES iterations for A Ck as inner method. GMRESRSTAB: GMRESR with BICGSTAB as inner method. GCROSTAB: GCR with BICGSTAB for A Ck as inner method. We will compare the convergence of these methods both with respect to time (on one processor of the Convex C3840) and with respect to the numberofmatrixvector multiplications. This makes sense since the main trade o between (full)gmres, the GCRO variants and the GMRESR variants is less iterations against less work per iteration. Which one converges faster in time, then highly depends on the relative cost of the matrix vector multiplication and preconditioning. Problem 1. The rst problem solves the equation on the unit square, where b is given by ;(u xx + u yy )+b(u x + u y )=f b(x y)=( 1 for (x y) 2 [0:5 0:6] [0:5 0:6] 50 elsewhere and homogeneous Dirichlet boundary conditions. The right hand side is chosen such that the solution is given by u(x y) = sin(x) sin(y). The stepsize in both x; and y;direction is 1=

18 log( r ) (full)gmres gcro(m) gmresr(m) number of iterations (#matvec) Figure 7: convergence vs. number of iterations for problem 1 The convergence history for problem 1 is given in gures 7 and 8 for GMRES, GCRO(m) and GMRESR(m), for m =5andm = 10. Figure 7 shows that GMRES converges fastest (in iterations), which is of course to be expected, followed by GCRO(5), GMRESR(5), GCRO(10) and GMRESR(10). From gure 7 we can see that GCRO(m) converges much smoother and faster than GMRESR(m) and follows the convergence of GMRES relatively well. The vertical `steps' of GMRESR(m) are caused by the optimization in the outer iteration, which does not involve a matrix vector multiplication. This gure also shows another important dierence the GM- RESR(m) variants tend to lose their superlinear convergence behaviour, at least during certain stages of the convergence history. This seems to be caused by stagnation or slow convergence in the inner iteration, which (of course) essentially behaves like a restarted GMRES. Figure 8 gives the convergence with respect to time. GCRO(5) is the fastest, which is not surprising in view of the fact, that it converges almost as fast as GMRES, but against much lower cost per iteration. Also we see that GCRO(10), while slower than GMRESR(5) is still faster than GMRESR(10). This shows, that for this example the improvement in convergence outweighs the extra work in orthogonalizations. Another feature, which can be seen from this last gure, is that (at least in time) relatively short inner iterations are best for GCRO(m) and GMRESR(m). We will also see this in the other examples. Apparently the interplay between outer- and inner method and a suciently large number of of outer iterations is important. Problem 2. The second problem is dened by the discretization of on [0 1] [0 4], where ;(u xx + u yy )+bu x + cu y =0 b(x y)= 8>< >: 100 for 0 y 1 ;100 for 1 <y for 2 <y 3 ;100 for 3 <y 4 and c = 100. The boundary conditions are u =1ony =0,u =0ony =4,u 0 =0onx =0and u 0 =0onx = 1, where u 0 denotes the (outward) normal derivative, see gure 9. The stepsize in x-direction is 1=100 and in y-direction is 1=50. 18

19 log( r ) (full)gmres gcro(m) gmresr(m) bicgstab time (s) Figure 8: convergence vs. time for problem 1 1 u=0 u =0 1 x b b u=1 u=0 b b 0 1 u = y u=1 a=1.e4 a=1.e-5 a= u=1 1 u=1 f=100 Figure 9: Problem 2 Figure 10: Problem 3 The results of problem 2 are given in gures 11 and 12. In this example, the dierence between (full) GMRES and GCRO(5) in convergence against number of matrix vector multiplications is even less than in the previous one and GCRO(5) converges (almost) exactly like GMRES. Figure 11 shows very well the dierence in convergence behaviour between GMRESR(m) and GCRO(m). GMRESR(m) shows a tendency to stagnate in the inner iteration (especially for m = 10) and therefore converges much slower (loss of superlinear convergence) than GCRO(m). GCRO(m) preserves the superlinear convergence of GMRES quite well. This is explained by the `global' optimization in GCRO(m) over both the inner and the outer search vectors (the latter form a `best' sample of the entire, previously searched Krylov subspace). So we mayviewthisas a semi-full GMRES. Figure 12 shows that, again, GCRO(5) is the fastest in CPU-time, but that GMRESR(5) is faster than GCRO(10), although they take about the same number of iterations. Also observe that GMRES is the slowest method in CPU-time, due to the large number of orthogonalizations. This illustrates the trade-os between extra work (orthogonalizations) and faster convergence. Problem 3. The third problem is taken from [17]. It solves the equation ;(au x ) x ; (au y ) y + bu x = f 19

20 log( r ) (full)gmres gcro(m) gmresr(m) number of iterarions (#matvec) Figure 11: convergence vs. number of iterations for problem 2 on the unit square, where b is given by b =2e 2(x2 +y 2 ). Along the boundaries we have Dirichlet boundary conditions, see gure 10. The functions a and f are also given in gure 10 f =0 everywhere, except for the small square in the centre, where f = 100. The stepsize in both x; and y;direction is 1=128. The results of problem 3 are given in gures 13, 14 and 15. Figure 13 gives the convergence plot for GMRES, GCRO(m) and GMRESR(m). Here we used m = 10 (optimal) and m = 50 to highlight the dierence in convergence behaviour in the inner (GMRES) iteration of GMRESR(m) and GCRO(m). GMRESR(50) stagnates in the inner GMRES iteration, whereas GCRO(50) displays almost the same convergence behaviour as GCRO(10) and full GMRES. In gure 14 the convergence plot is given for GMRES, BICGSTAB and BICGSTAB as the inner method in GMRESR (GMRESRSTAB) and GCRO (GCROSTAB). The following strategy gave the best results for the BICGSTAB variants: For GMRESRSTAB the inner iteration was ended after either 20 steps or a relative reduction of the residual norm by a factor 0:01. For GCROSTAB the inner iteration was ended after 25 steps or a relative reduction of the residual norm by a factor 0:01. The convergence behaviour of GMRESRSTAB for this example is somewhat typical for GM- RESRSTAB in general (albeit very bad in this particular case). A reason for this erratic behaviour may be, that the convergence of BICGSTAB depends on the (implicit) (bi-)orthogonality to some basis. This relation is destroyed after each outer iteration. Now if the inner iteration does not yield a good approximation, the outer iteration will not give much improvement either and the method becomes `trapped'. At the start, the same seems to hold for GCROSTAB, however, after a few outer GCR iterations the `improved' operator A Ck somehow yields a better convergence than BICGSTAB by itself. We have also observed this for other tests, although it may also happen, that GCROSTAB converges worse than BICGSTAB. In gure 15 the convergence versus CPU-time is given for GCROSTAB, BICGSTAB, GCRO(10) and GMRESR(10). GCROSTAB gives the best convergence in time, it is approximately 20% faster than BICGSTAB, notwithstanding the extra work in orthogonalizations. Although GCRO(10) converges in less iterations than GMRESR(10), in time GMRESR(10) is faster. So in this case the decrease in 20

21 log( r ) (full)gmres gcro(m) gmresr(m) time (s) Figure 12: convergence vs. time for problem 2 iterations does not outweigh the extra work in orthogonalizations. For completeness we mention, that GMRESRSTAB took almost 15 seconds to converge, whereas GMRES took about 20 seconds. 5 Conclusions From the GMRESR methods we have derived a modied set of methods, which preserve the optimality of the outer method in the inner iteration. This optimality is lost in the inner iteration of GMRESR since it essentially uses `restarted' (inner) iterations which cannot take advantage of the `convergence history'. Therefore, GMRESR(m) may lose the superlinear convergence behaviour of (full) GMRES, due to the stagnation or slow convergence of the inner, restarted GMRES(m), which may occur at each inner iteration. In contrast, the GCRO variants exploit the `convergence history' to generate a search space in the inner iteration, that does not interfere with the previous minimization of the error over the space of outer search vectors. If then GMRES(m) is used as the inner method (! GCRO(m)), we do a global minimization of the error over both the inner search space and the outer search space. Where the set of outer search vectors is a sample of the entire, previously searched Krylov subspace. From this point ofviewwemaysay, that GCRO(m) is a semi-full GMRES. This probably accounts for the smooth convergence, the preservation of superlinear convergence (if GMRES does) and the absence of stagnation, which may occur in the inner method of GMRESR. In many practical problems, the convergence behaviour in iterations of GCRO(m) is almost the same as that of GMRES. Apparently the subset of Krylov subspace vectors, that are maintained, approximates the entire Krylov subspace, that has been generated, suciently well. Because GCRO(m) takes much less innerproducts and needs less storage for the same number of iterations than GMRES, it is better suited for implementation on large, distributed memory, parallel computers. Although there is the possibility of breakdown in the inner method for GCRO, this seems to occur rarely as is indicated by theorem 5 indeed it has never happened in any of our experiments. Moreover, in case the generation of the Krylov subspace does breakdown, we have suggested two ways to continue. 21

GMRESR: A family of nested GMRES methods

GMRESR: A family of nested GMRES methods Report 91-80 H.A. van der Vorst C. Vui Technische Universiteit Delft Delft University of Technology Faculteit der Technische Wisunde en Informatica Faculty of Technical