Preconditioned Conjugate Gradient-Like Methods for. Nonsymmetric Linear Systems 1. Ulrike Meier Yang 2. July 19, 1994

Size: px

Start display at page:

Download "Preconditioned Conjugate Gradient-Like Methods for. Nonsymmetric Linear Systems 1. Ulrike Meier Yang 2. July 19, 1994"

Gwenda Gray
5 years ago
Views:

1 Preconditioned Conjugate Gradient-Like Methods for Nonsymmetric Linear Systems Ulrike Meier Yang 2 July 9, 994 This research was supported by the U.S. Department of Energy under Grant No. DE-FG2-85ER25. 2 Center for Supercomputing Research and Development, University of Illinois at Urbana{Champaign

2 Abstract Linear systems with large sparse nonsymmetric matrices arise in many applications. It is therefore important to be able to solve them rapidly and eciently. Whereas the solution of symmetric positive denite systems has been extensively explored, and many fast and stable solvers have been developed, there still is a large need for reliable iterative solvers for nonsymmetric indenite systems. The solvers developed to date are either slow converging or converge only for very special cases. Often, it is hard or impossible to predict wjether the solver will converge at all and if it delivers correct results in case of convergence when attempting to solve a specic problem. This paper investigates some preconditioned conjugate gradient-like algorithms which can be implemented easily and in some cases show superior convergence. However, they lack robustness. The algorithms have been implemented and applied to a variety of test problems. The results are presented here.

3 TABLE OF CONTENTS CHAPTER PAGE INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :. Background and Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : :.2 Outline : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 2 CONJUGATE GRADIENT-LIKE METHODS : : : : : : : : : : : : : : : : : : : 3 2. The Classical Conjugate Gradient Algorithm : : : : : : : : : : : : : : : : : The Bi-Conjugate Gradient Algorithm : : : : : : : : : : : : : : : : : : : : : The Conjugate Gradient Squared (CGS) Algorithm : : : : : : : : : : : : : : The Conjugate Gradient Stabilized (BICGSTAB) Algorithm : : : : : : : : : 2.5 Conjugate Gradient on the Normal Equations : : : : : : : : : : : : : : : : : The Generalized Minimal Residual Algorithm (GMRES) : : : : : : : : : : : 5 3 PRECONDITIONERS : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 3. Incomplete LU Factorization with Regard to Structure (ILU(k)) : : : : : : Modied Incomplete LU Factorization (MILU(k)) : : : : : : : : : : : : : : : Incomplete LU Factorization with Regard to Element Size (ILUT(k)) : : : : 9 4 NUMERICAL EXPERIMENTS : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 4. Implementation Details : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Preconditioning : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Stopping Criterion : : : : : : : : : : : : : : : : : : : : : : : : : : : : Three-Dimensional Elliptic Problems : : : : : : : : : : : : : : : : : : : : : : Description of the Problems : : : : : : : : : : : : : : : : : : : : : : : Numerical Results : : : : : : : : : : : : : : : : : : : : : : : : : : : : Harwell/Boeing Collection (RUA) : : : : : : : : : : : : : : : : : : : : : : : Description of the Matrices : : : : : : : : : : : : : : : : : : : : : : : Numerical Results and Observations : : : : : : : : : : : : : : : : : : 26 5 CONCLUSIONS : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 29 A ELLIPTIC PROBLEMS: FIGURES AND TABLES : : : : : : : : : : : : : : : : 3 B HARWELL/BOEING COLLECTION: FIGURES AND TABLES : : : : : : : : : 43

4 REFERENCES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 53 2

5 CHAPTER INTRODUCTION Linear systems with large sparse nonsymmetric matrices arise in many applications. It is therefore important to be able to solve them rapidly and eciently. Whereas the solution of symmetric positive denite systems has been extensively explored, and many fast and stable solvers have been developed, there still is a large need for reliable iterative solvers for nonsymmetric indenite systems. The solvers developed to date are either slow converging or converge only for very special cases. Often, it is hard or impossible to predict wjether the solver will converge at all and if it delivers correct results in case of convergence when attempting to solve a specic problem. This report investigates some preconditioned conjugate gradient-like algorithms which can be implemented easily and in some cases show superior convergence. However, they lack robustness. The algorithms have been implemented and applied to a variety of test problems. The results are presented here.. Background and Motivation The conjugate gradient algorithm is a very powerful method for solving symmetric positive denite sparse linear systems, especially when it is used with a preconditioner. However, this algorithm fails in general for nonsymmetric or indenite linear systems. Several attempts have been made to come up with a generalization of this method for the nonsymmetric case. We will consider here some of these attempts and describe their characteristics, the motivation for using them, and their advantages and drawbacks. In the CG algorithm, the residual vector is minimized in each iteration step with respect to some suitable norm. During the process, the residual vectors are constructed in such a way that they are orthogonal to each other with regard to the Euclidian inner product. Additionally, because of the symmetry of the matrix, the residual vectors fulll a three-term recursion, which is another powerful characteristic of the algorithm. It is not possible to maintain both properties for a nonsymmetric linear system. There are several ways to deal with this situation. One can maintain both properties by considering the normal equations instead of the original system. The new matrices A T A (CGNR) or AA T (CGNE) are symmetric positive denite, and the CG algorithm can therefore be applied to them. This approach is considered in [6]. The disadvantage of using the normal equations is that the condition number which plays a role for

6 the convergence of CG increases signicantly. These algorithms would be expected to converge very slowly. It is therefore important to precondition the system, in order to improve the condition number. This has been investigated to some degree in [6] where an incomplete LU factorization as well as its modied form are used. Other preconditioned forms of CGNE are the row projection methods [3], where a block Jacobi or block SSOR preconditioning is used. These methods turn out to be very robust in many cases, however, they often converge slowly. One can maintain the minimization property by choosing the direction vector as a linear combination of the residual vector and k previous direction vectors. This approach has been used in methods like Orthomin [23], Orthodir, and other generalized conjugate gradient schemes [] [4] [6] [24]. We will not consider these methods here. We will, however, consider another method which is theoretically equivalent to Orthodir and the Generalized Conjugate Residual method, but more robust, the generalized minimal residual algorithm (GMRES) [7]. The disadvantage of these methods is that they require much storage for the previous direction vectors. In order to reduce the necessary storage, in general restarted versions of the above algorithms are used which involve a xed number k of previous direction vectors. In general, the convergence rate improves with increases in k. One can maintain the three-term recursion property. This is done in the biconjugate gradient algorithm (BICG), conjugate gradient squared (CGS), and the stabilized version of CGS (BICGSTAB). In BICG [7], the approximations are constructed in such a way that the residuals are orthogonal to some "pseudo"-residuals involving A T. This leads to two three-term recursions for the residuals and the "pseudo"-residuals. In CGS [2], both residuals and "pseudo"-residuals of BICG are used to generate new residuals, in order to take advantage of the convergence of the "pseudo"-residuals which is ignored in BICG. The corresponding residual polynomials of CGS are the squared residual polynomials of BICG. BICGSTAB [22] is an attempt to stabilize CGS by replacing the squared residual polynomial with the product of two polynomials, one of which can be selected suitably. If we use a good preconditioner, these methods can give high convergence, often, however, they fail completely. There clearly is a need to explore further the behavior of the CG-like methods and the eects of dierent preconditioners on those methods. Although these methods are very ecient and fast when they converge, they are not robust..2 Outline Chapter 2 describes the methods that are examined. It starts with a presentation of the classical conjugate gradient algorithm which is the origin of the considered algorithms, the biconjugate gradient algorithm (BICG), conjugate gradient squared (CGS), conjugate gradient squared stabilized (BICGSTAB), conjugate gradient on the normal equations (CGNE, CGNR), and the generalized minimal residual algorithm (GMRES). This section is followed by descriptions of these methods. Chapter 3 describes several preconditioners: the incomplete LU factorization with positional dropping (ILU(k)), its modied version (MILU(k)), and an incomplete LU factorization with numerical dropping (ILUT(k)). Chapter 4 describes the test problems considered. The numerical results are presented and analyzed. Finally, Chapter 5 summarizes the results and presents the conclusions. 2

7 3 CHAPTER 2 CONJUGATE GRADIENT-LIKE METHODS The conjugate gradient algorithm is a very powerful method for symmetric positive denite linear systems, but, as soon as either one of those characteristics is absent, it will fail in general. Several attempts have been made to solve this problem for the case of nonsymmetric indenite matrices. We will describe some of the algorithms below. 2. The Classical Conjugate Gradient Algorithm The conjugate gradient algorithm is a very powerful method for symmetric positive denite sparse linear systems, especially when used with a preconditioner. We will consider here the following linear system of order n, where A is symmetric positive denite. Ax = b (2:) There are many ways to derive the algorithm. One possibility is to minimize the function Q(x) = 2 xt Ax? b T x The gradient of Q(x), which is the negative residual b? Ax, vanishes at the solution of the linear system. If the matrix is symmetric positive denite, Q(x) possesses one unique minimum, which is the wanted solution. Moreover, x k, the approximation to the solution x computed in the k-th iteration step, minimizes Q(x) in the k-dimensional subspace x + < p ; p ; :::; p k? >. The p i ; i = ; :::; n are direction vectors which are constructed successively within the CG algorithm in such a way that they are A-conjugate to each other. Consequently, the algorithm will theoretically terminate within at most n iterations. The complete algorithm in its vectorial form is given below. Algorithm 2.. CG Initialization: r = b? Ax p = r (2:2)

8 Repeat for k = ; 2; : : :: k? = rt k? z k? p T k? Ap k? (2.3) x k = x k? + k? p k? (2.4) r k = r k?? k? Ap k? (2.5) k = r T k r k r T k? r k? (2.6) p k = r k + k p k? (2.7) We can summarize some of its important properties in the following theorem. Theorem 2. Let A be symmetric positive denite. r i are the residual vectors, and p i are the direction vectors of the i-th conjugate gradient iterations. Then (i) r T i r j = for i > j (orthogonality) (ii) p T i Ap j = for i > j (A-conjugacy). The proof which is performed by induction can be found in [7]. Rewriting the above algorithm in terms of some matrix equations gives further insight into the algorithm. For this approach, which can also be found in [7], we need to dene the following matrices: L k = R k = P k r q ; q r ; :::; q r k r T r r T r r T k r p q ; q p ; :::; q p k r T r r T r? p? p : : : : D k = : :? k r T k r k A ; (2:8) A ; (2:9) ; (2:) C? p A k C A : (2:) Note that the columns of R k are orthonormal to each other and the columns of P k A- conjugate to each other. Let m be the number of iterations the algorithm takes theoretically 4

9 until termination, m n. Then we can express the CG-algorithm in terms of the following matrix equations, taking into account that r m+ =. AP m D m? = R m L m (2.2) P m L T m = R m : (2.3) Eliminating P m in (2.), using the fact that R m has orthonormal columns gives R T m AR m = L m D m L T m (2:4) where L m D m L T m is a tridiagonal symmetric matrix. We thus see that during the conjugate gradient process A is transformed into a tridiagonal matrix. Now, the residual vector r i can be expressed as a linear combination of the vectors r, Ar, A 2 r,..., A i? r, see (2.4) and (2.6). These vectors span the Krylov subspace K i (A) =< r ; Ar ; A 2 r ; :::; A i? >. (Note also that K i (A) =< r ; r ; :::; r i? >=< p ; p ; :::; p i? >.) Consequently, the residual vector can be expressed in terms of a polynomial in A, r i = i (A)r, where i is a polynomial of degree i. Using the expression of the residuals in terms of polynomials and taking into account that there is a similar expression for the direction vectors p i = i (A)r, we can rewrite the complete algorithm in its polynomial form. In order to do that we dene the symmetric bilinear form (:; :) on N, the space of polynomials of degree smaller or equal to N: (; ) := r T (A)T (A)r Clearly, this form is non-negative, that is, (; ) for all. It is associative, that is, (; ) = (; ) for symmetric A. It is not positive denite, as we can nd polynomials for which (A)r vanishes, for example, set (x) = x? where is an eigenvalue of A and choose the corresponding eigenvector for r. But due to its non-negativity, it is clearly positive semi-denite. So, except for positive deniteness, this bilinear form fullls all requirements for an inner product on the space N. With this, we can reformulate the conjugate gradient algorithm in its polynomial form Algorithm 2.2. CG (Polynomial version) Initialization: = = Repeat for k = ; 2; : : :: (2:5) k? = ( k?; k? ) ( k? ; # k? ) (2.6) k = k?? k? # k? (2.7) k = ( k ; k ) ( k? ; k? ) (2.8) k = k + k k? (2.9) 5

10 where #(x) x. We can now state a theorem similar to Theorem 2. which will also be important for the algorithms described in the subsequent sections. Theorem 2.2 Let [:; :] be a symmetric bilinear form dened on N, satisfying associativity (). Let i and i be the polynomials constructed in the polynomial conjugate gradient algorithm above using [:; :] instead of (:; :). Then as long as the algorithm does not break down due to division by zero, the polynomials i and i satisfy (i) [ i ; j ] = for i 6= j. (ii) [ i ; # j ] = for i 6= j. The proof for this theorem can be found in [2]. The convergence behavior of the conjugate gradient algorithm has been investigated among others by [] [2]. These considerations show the dependence of the rate of convergence of the spectrum of A. We state one important theorem here, which takes into account the condition number (A). More detailed theoretical convergence results that also consider the shape of the spectrum can be found in [2]. Theorem 2.3 Let A be symmetric positive denite, x k the k-th iterate of the CG-process (Algorithm 2.) and x the solution of the linear system Ax = b. Then p (A)? kx? x k k A 2kx? x k A ( p ) k : (2:2) (A) The Bi-Conjugate Gradient Algorithm The bi-conjugate gradient algorithm (BICG) was suggested by Fletcher [7]. It originally was used by Lanczos [] to compute the eigenvalues of an unsymmetric matrix. In the classical conjugate gradient algorithm, applied to a symmetric positive denite matrix, the residuals obtained in dierent iteration steps are orthogonal to each other as stated in Theorem 2.. They also fulll a three-term recursion. In BICG, the three-term recursion is maintained, which leads to a loss of orthogonality of the residuals and A-conjugacy of the directions in the nonsymmetric case. In order to guarantee the theoretical nite termination property "pseudo residuals" ~r i = b? A T x i are introduced as well as "pseudo directions" ~p i. They are constructed in such a way that the "pseudo residuals" are orthogonal to the residuals and the "pseudo directions" A-conjugate to the directions. The new algorithm, which in the symmetric case is fully equivalent to CG, is given below. Algorithm 2.3. BICG Initialization: r = b? Ax ~r = r p = r ~p = r (2:2) 6

11 Repeat for k = ; 2; : : :: k? = ~rt k? r k? ~p T k? Ap k? (2.22) x k = x k? + k? p k? (2.23) r k = r k?? k? Ap k? (2.24) ~r k = ~r k?? k? A T ~p k? (2.25) k = ~r T k r k ~r T k? r k? (2.26) p k = r k + k p k? (2.27) ~p k = ~r k + k ~p k? (2.28) For the derivation of further CG-like methods, we will at rst consider the polynomial variant of the BICG-algorithm which is the polynomial variant of the CG-algorithm, Algorithm 2.2, with a dierent bilinear form. The bilinear form used here is [; ] = ~r T (A) (A)r. Clearly, in the general case, for nonsymmetric indenite A, this bilinear form is not positive semi-denite anymore, but Theorem 2.2 still applies. The possibility of a breakdown due to a zero division in the computation of i or i is however a drawback and often encountered in practice (see the section on numerical experiments below). Using the matrix denitions of the preceding section, BICG can also be written in terms of a matrix equation. Dening the matrices R k = ~R k = P k = ~P k r q ~r T ~r q ~r T p q ~r T r we obtain the following matrix equations ; ; ; q r ; :::; q r k ~r T r ~r T k r k q ~r ; :::; q ~r k ~r T r ~r T k r k q p ; :::; q p k ~r T r ~r T k r ~p q ; q ~p ; :::; q ~p k ~r T r ~r T r ~r T k r k A ; (2:29) A ; (2:3) A ; (2:3) A ; (2:32) AP m D m? = R m L m (2.33) A T ~P m D m? = R ~ m L m (2.34) P m L T m = R m (2.35) ~P m L T m = R ~ m : (2.36) 7

12 where D m and L m are dened as in the previous section by replacing k and k as dened for BICG. Eliminating P m and ~ P m and using that ~ R T mr m = Id, one obtains ~R T m AR m = L m D m L T m (2.37) R T m AT ~R m = L m D m L T m : (2.38) Note that here neither R m nor ~R m have orthogonal columns for general nonsymmetric A. The two equations above are completely equivalent which, shows us that the amount of work in BICG is doubled, and a lot of the operations performed possibly could be avoided or made better use of. This was also realized by Sonneveld [2], who saw that while the residuals r i converge toward zero, the ~r i converge about equally fast toward zero, but the algorithm does not exploit this. He tried to make use of this fact in the conjugate gradient squared algorithm which is described in the next section. 2.3 The Conjugate Gradient Squared (CGS) Algorithm Sonneveld [2] derived the CGS algorithm from BICG. Considering the bilinear form de- ned for the polynomial version of BICG, and using its associativity, one obtains the following equations [ i ; i ] = [ ; 2 i ] (2.39) [ i ; # i ] = [ ; # 2 i ] (2.4) Consequently, we can dene an algorithm that uses the squared polynomials 2 i and i 2 instead of i and i, respectively. If we consider the 2 i as the residual polynomials instead of the original i, the convergence rate might be doubled, and we also can avoid computing ~r i and ~p i. To obtain 2 i and i 2 we need to square the recursion formulae for i and i in the CG algorithm. This leads to 2 k+ = 2 k? 2 k # k k + 2 k #2 2 k (2.4) 2 k+ = 2 k+ + 2 k+ k+ k k+ k : (2.42) This introduces two further polynomials k k and k+ k which can be evaluated by Now dening the following polynomials k k = 2 k + k k k? (2.43) k+ k = k k? k # 2 k : (2.44) k := 2 k (2.45) k := k 2 (2.46) k := k k? (2.47) k := k k (2.48) 8

13 and replacing them in equations above, we obtain the following polynomial version of conjugate gradient squared Algorithm 2.4. CGS (polynomial version) Initialization: = = = Repeat for k = ; 2; : : :: (2:49) k? = [; k?] [; # k? ] (2.5) k = k?? k? # k? (2.5) k = k?? k? #( k? + k ) (2.52) k = [; k] [; k? ] (2.53) k = k + k k (2.54) k = k + k ( k + k? ) (2.55) This can be transformed into the vectorial variant of CGS by substituting r k := k (2.56) p k := k (2.57) t k := k (2.58) s k := k (2.59) and replacing the bilinear form by the corresponding vector product. One obtains Algorithm 2.5. CGS Initialization: Repeat for k = ; 2; : : :: r = b? Ax ~r = r p = r s = r (2:6) k? = ~rt r k? ~r T Ap (2.6) k? t k = s k?? k? Ap k? (2.62) x k = x k? + k? (s k? + t k ) (2.63) 9

14 r k = r k?? k? A(s k? + t k ) (2.64) k = ~r T r k ~r T r k? (2.65) s k = r k + k t k (2.66) p k = s k + k (t k + k p k? ) (2.67) The advantages of this algorithm over BICG are that no evaluation of ~r k and ~p k is needed, and A T does not occur in this algorithm. Computations are therefore made more easily. We can present this algorithm also in forms of matrix equations, as is done for CG and BICG. Therefore, we need to dene matrices as in the previous sections. The matrices R k and P k need to be dened slightly dierently. In order to "normalize" their columns r k and p k we need to divide by the full vector products and not the square roots, as for the previous algorithms, because we deal with the "squares" of the original residuals and direction vectors. Also, L k changes in a similar way. R k = P k = T k = L k = ~L = D k = r r r k ~r T r ; ~r T r ; :::; ~r k T r k p p p k ~r T r ; ~r T r ; :::; ~r k T r k t t t k ~r T r ; ~r T r ; :::; ~r k T r k?? : :?? : : : :!!!? k?? k C C C (2.68) (2.69) (2.7) A ; (2:7) A ; (2:72) A : (2:73) Using the above denitions, assuming the algorithm will terminate properly after m iteration steps, which means r m+ = and replacing every occurrence of s k by its denition, we obtain the following equations

15 T m ~L T = R m? AP m D m? (2.74) R m L m = A(2T m + AP m D m? )D m? (2.75) P m L T m = R m + 2T m (2.76) The second equation reects the fact that the residual polynomial is squared and can also be written as R m L m = A 2 P m D?2 m + 2AT md? m : (2:77) 2.4 The Conjugate Gradient Stabilized (BICGSTAB) Algorithm The CGS algorithm still has many aws. Van der Vorst observed that one big drawback of CGS is that the reduction operator k (A) (where k is the unsquared original residual polynomial) is too dependent on the starting vector r. Hence, even though k (A) might reduce r eectively, it might not reduce k (A)r. Consequently, in practice one can often observe a very irregular convergence behavior, that is, large local peaks in the convergence curve. This behavior might not eect the convergence speed, but it can greatly inuence the nal results and make them inaccurate or even wrong (see also the section on numerical experiments). In order to stabilize this kind of behavior, which is mainly caused by squaring the residual polynomials, Van der Vorst suggested a new algorithm called BICGSTAB (CGS stabilized) [22], in which the original residual polynomial of CG is replaced by the product of an arbitrary polynomial and the residual polynomial instead of the squared residual polynomial. This algorithm is based on the fact that in BICG the residual r k is orthogonal to the k-th Krylov subspace of A T, K k (A T ). So, instead of choosing k? (A T ) as the polynomial for the "pseudo-residual" ~r k?, we can choose an arbitrary polynomial in K k (A T ). Therefore, instead of considering the squares of the residual polynomials as done in CGS, we will consider now the product of the residual polynomial and an arbitrary polynomial, k k, as our new residual polynomial. The question is how to choose k. To select the Chebyshev polynomials that would be the rst to come to mind because of their optimality properties turns out to be inconvenient as it is complicated to obtain their parameters. Van der Vorst suggests selecting k as a polynomial of the form k (x) = (? x)(? x):::(? k? x): (2:78) k is determined in the k-th iteration step of BICGSTAB by minimizing the residual norm. Dening k := k k (2.79) k := k k (2.8) k := k? k (2.8) and using the recursions for k and k in the classical conjugate gradient algorithm, one obtains the polynomial version of BICGSTAB:

16 Algorithm 2.6. BICGSTAB (polynomial version) Initialization: = = Repeat for k = ; 2; : : :: (2:82) k? = [; k?] [; # k? ] (2.83) k = k?? k? # k? (2.84) k = [ k; # k ] [# k ; # k ] (2.85) k = k? k # k (2.86) k = [; k] k? [; k? ] k (2.87) k = k + k ( k?? k # k? ) (2.88) Now replacing s k := k (2.89) r k := k (2.9) p k := k (2.9) and substituting the corresponding vector products leads to the vector variant of BICGSTAB. Algorithm 2.7. BICGSTAB Initialization: Repeat for k = ; 2; : : :: r = b? Ax ~r = r (2:92) p = r k? = ~rt r k? ~r T Ap (2.93) k? s k = r k?? k? Ap k? (2.94) k = s T k As k s T k AT As k (2.95) x k = x k? + k? p k + k s k (2.96) r k = s k? k As k (2.97) k = ~r T r k ~r T r k? (2.98) p k = r k + k k? k (p k?? k Ap k? ) (2.99) 2

17 As this algorithm is based on sustaining orthogonality, this algorithm also theoretically terminates within at most n operations. In order to represent BICGSTAB in terms of matrix equations, we dene R k, P k, D k, and L k as in the preceding section. Additionally, we dene M k = C k =?? : : : : : :? k? k k C C A ; (2:) A : (2:) Replacing s k by its denition, BICGSTAB can be expressed in terms of the following matrix equations R m L m = AP m D? m + AR m C? m (2.2) P m M T m = R m? AP m D? m : (2.3) Inserting the second equation in the rst one by replacing R m in the second term on the right-hand side, we obtain the following equation R m L m = A 2 P m D? m C? m + AP m (D? m + M T mc? m ) (2:4) Comparing this with the corresponding CGS-equation, we see the eect of the choice of a dierent polynomial, as well as the formal similarity. For CGS, we have For BICGSTAB, this changes to R m L m = A 2 P m D?2 m + A P m L T m D? m? A R m D? m : (2:5) R m L m = A 2 P m D? m C? m + AP m M T mc? m + AP m D? m : (2:6) Note, however, that the matrices involved have dierent values (which is indicated here by dierent notations) as their column vectors are generated recursively by the algorithm, and therefore each column depends on the columns to its left. 3

18 2.5 Conjugate Gradient on the Normal Equations We consider here two types of normal equations: A T Ax = A T b (2:7) and AA T y = b; x = A T y (2:8) where we will denote CG applied to A T Ax = A T b by CGNR and CG applied to AA T y = b by CGNE [6] [8]. The matrices of the above systems A T A and AA T are symmetric and positive denite, and therefore the classical conjugate gradient method can be successfully applied to them. Theoretically, no division by zero will occur, and it will terminate within at most n iterations. Note also that A T A and AA T have the same spectrum and therefore should show the same convergence behavior. We will give the exact algorithms CGNE and CGNR below. Algorithm 2.8. CGNR Initialization: Repeat for k = ; 2; : : :: r = A T (b? Ax ) p = r (2:9) k? = r T k? r k? p T k? AT Ap k? (2.) x k = x k? + k? p k? (2.) r k = r k?? k? A T Ap k? (2.2) k = r T k r k r T k? r k? (2.3) p k = r k + k p k? (2.4) Algorithm 2.9. CGNE Initialization: Repeat for k = ; 2; : : :: r = b? Ax ~p = A T r (2:5) k? = rt k? r k? ~p T k? ~p k? (2.6) x k = x k? + k? ~p k? (2.7) 4

19 r k = r k?? k? A~p k? (2.8) k = rk T r k r T r k? k? (2.9) ~p k = A T r k + k ~p k? (2.2) From a theoretical point of view, both of these algorithms are guaranteed to converge. However, we encounter the following problem. The condition numbers of the new matrices A T A and AA T increase signicantly compared to the original matrix A, as a matter of fact they equal the square of the condition number of A. This greatly inuences the convergence behavior. As a consequence of the increase of the condition number, we would expect an increase in the number of iterations. This is supported by the experimental results we obtained and which we comment on in Chapter The Generalized Minimal Residual Algorithm (GMRES) GMRES is a fairly well known and often successful method in the context of nonsymmetric sparse linear systems [7]. Because we consider it in our experiments for purposes of comparison, we will give a short description here. However, our focus is on the preceding methods. For more detailed information, the reader is referred to [7]. It is based on maintaining the orthogonality property, instead of the 3-term recurrence. Consequently, we need to determine an orthonormal basis fv ; v 2 ; :::; v k g of the Krylov subspace K k which can be generated using Arnoldi's method. Then, we would like to determine the z 2 K k which minimizes kb? A(x + z)k = kr? Azk. If we express z in terms of the orthonormal basis of K k, that is, z = V k y, we need to minimize kkr kv? AV k yk where V k is the matrix with columns v ; :::; v k. The actual algorithm is given below. Algorithm 2.. GMRES(m). Initialization: Repeat for k = ; 2; : : :; m: r = b? Ax v = r kr k (2:2) Form approximate solution with minimizing y m : h i;k = (Av k ; v i ); i = ; :::; k (2.22) X ~v k+ = Av k? k h i;k v i (2.23) i= h k+;k = k~v k+ k (2.24) v k+ = ~v k+ h k+;k (2.25) x m = x + V m y m (2:26) 5

20 if kr m k = kb? Ax m k > restart: x := x m v := rm krmk (2:27) 6

21 7 CHAPTER 3 PRECONDITIONERS The iterative methods considered in the previous chapter can be improved by preconditioning the systems using a preconditioning matrix M. M ought to be an approximation of A which can easily be inverted. The idea is to achieve a better conditioned system by multiplication of A with the inverted preconditioner. There are dierent types of preconditioning: left preconditioning, where we solve the system M? Ax = M? b (3:) right preconditioning, where the following system is considered split preconditioning, where M = M M 2 and AM? y = b; y = Mx (3:2) M?? AM 2 y = M? b; y = M 2x (3:3) In this chapter, several incomplete LU factorization preconditioners are described, with positional and numerical dropping. 3. Incomplete LU Factorization with Regard to Structure (ILU(k)) Incomplete LU factorization has been suggested by many researchers [4] [8] [9] [2]. The drawback of a complete LU factorization for a sparse matrix is that in general while the LU factorization is performed, the sparsity of the original matrix is lost, which leads to a large storage requirement and possibly a large number of operations. So it would be desirable to maintain the sparsity of large matrices. The original idea of ILU in its simplest form, that is, ILU(), is to preserve the matrix structure of the original matrix A and to drop any ll-in elements that are generated with L and U in positions where the elements of A vanish. In general, we can dene any structure and determine the incomplete LU factorization in such a way that both L and U contain only elements in positions (i,j) that are contained in the set Z

22 that denes the structure. For example, if we want to maintain the structure of the original matrix A, we will dene Z := f(i; j)j i; j n; A i;j 6= g. Using the set Z, we obtain the ILU()-factorization. Let us now dene recursively Z k := f(i; j)j i; j n; (i; j) 2 Z k? or any (i,j) for which ll-in is generated by elements in Z k? g. Note that Z k does not contain llin generated by level-k elements. Then ILU(k) is given by the following algorithm Algorithm 3.. ILU(k) for i =,..., n for j =,..., n if (i; j) 2 Z k then s i;j = A i;j? P min(i;j)? k= L i;k U k;j if (i j) then L i;j = s i;j if (i j) then U i;j = s i;j Li;i If ILU(k) turns out to be a poor approximation of the complete factorization, increasing the number of levels for the factorization often improves the convergence. The disadvantage of ILU(k), k >, is that it is hard to determine in advance how large the llin generated will be and to estimate accurately the amount of storage necessary. But in many cases this factorization leads to an eective preconditioner. 3.2 Modied Incomplete LU Factorization (MILU(k)) In order to improve the ILU factorization, Gustafsson [8] [9] proposed a modication of the diagonal elements of L so that the rowsums of the error matrix LU? A vanish. Let us dene Z k as in the preceding section. Then MILU(k) can be formulated as follows. Algorithm 3.2. MILU(k) for i =,..., n L i;i = for j =,..., n s i;j = A i;j? P min(i;j)? k= L i;k U k;j if (i; j) 2 Z k then if (i < j) then L i;j = s i;j if (i = j) then L i;i = L i;i + s i;i if (i > j) then ~ U i;j = s i;j else L i;i = L i;i + s i;j for j = i+,..., n U i;j = ~ Ui;j Li;i 8

23 This algorithm also can be used with k levels and is performed in an equivalent way as ILU(k). This factorization has been shown to be very successful for certain elliptic partial dierential equations. In many cases, the number of iterations decreased signicantly. In the case of general sparse matrices, however, it often turns out to be worse than ILU or even fails completely. 3.3 Incomplete LU Factorization with Regard to Element Size (ILUT(k)) This factorization, which was suggested by Saad [5], uses a dierent criterion for dropping elements. Here only the largest n L i +k elements are kept in L and the largest nu i +k elements in U, where n L i is the number of nonzero elements in the lower part of the i-th row of A, including the diagonal element, and n U i is the number of nonzero elements in the strict upper part of the i-th row of A. The advantage of ILUT(k) versus ILU(k) is the fact that the amount of additional storage needed is predictable. The algorithm ILUT(k) is given below. Algorithm 3.3. ILUT(k) for i =,..., n for j =,..., n s i;j = A i;j? P min(i;j)? L k= i;k U k;j if (i j) then L i;j = s i;j if (i j) then U i;j = s i;j Li;i end determine Zi L = n L i + k largest elements in row i of L determine Zi U = n U i + k largest elements in row i of U for j=,..., n if L i;j 62 Zi L L i;j = if U i;j 62 Zi U U i;j = end 9

24 2 CHAPTER 4 NUMERICAL EXPERIMENTS 4. Implementation Details The experiments were performed on an Alliant FX/8 at CSRD. The codes were written in standard ANSI Fortran 77 and taken from the CSRD library SPLIB [3] and SPARSKIT [5]. This section summarizes the types of preconditionings and stopping criterion used for the methods considered, as all those inuence the convergence. 4.. Preconditioning For most solvers, that is, BICG, CGS, BICGSTAB, and GMRES, left preconditioning was used; that is, we considered the system U? L? Ax = U? L? b (4:) where LU is the incomplete factorization used generated by either ILU, MILU, or ILUT. A dierent approach was chosen for CG on the normal equations. For CGNE and CGNR, a split preconditioning was chosen. For CGNE, CG was applied to the system U? L? AA T L?T U?T y = U? L? b; x = A T L?T U?T y: (4:2) Here A is preconditioned from the left by the incomplete factorization. For CGNR, we considered the system L?T U?T A T AU? L? y = L?T U?T A T b; x = U? L? y: (4:3) Note that A is here multiplied from the right by (LU)?. If we want to multiply A from the left as was done for all other preconditioned solvers, we obtain the system A T L?T U?T U? L? Ax = A T L?T U?T U? L? b: (4:4) The spectrum of this system is the same as that of (4.2), so one would expect the same convergence behavior for CGNE with this "right" preconditioning as for CGNR with the above "left" preconditioning of the matrix A, if one chooses the same stopping criteria and disregards possible roundo. Experiments showed this to be true. However, we will not further consider this here.

25 4..2 Stopping Criterion The stopping criterion chosen for our experiments was kr i k 2 2?9 : (4:5) r i is here the residual vector of the i-th iteration step, as dened in the previous given algorithms. Note that as the preconditioned methods are just the former algorithms applied to the preconditioned systems, this is the residual of the preconditioned system, and not the original system. 2

26 4.2 Three-Dimensional Elliptic Problems 4.2. Description of the Problems In this section, we use a set of test problems which can also be found in [2]. Consider the following three-dimensional elliptic partial dierential equation au xx + bu yy + cu zz + du x + eu y + fu z + gu = F: (4:6) on the unit cube [; ] [; ] [; ]. a, b, c, d, e, f are functions of (x; y; z). Using a seven point centered nite dierence operator, one obtains linear systems with a seven-diagonal sparse matrix that is in general nonsymmetric. For ellipticity, we need a; b; c > on [; ][; ][; ]. We consider here the following nine test problems: Problem : u xx + u yy + u zz + u x = F (4:7) with solution u = xyz(? x)(? y)(? z): The large coecient of u x causes a loss of diagonal dominance. Problem 2: u xx + u yy + u zz + u y = F (4:8) with solution u = xyz(? x)(? y)(? z): This is similar to Problem, the convection term now being in y-direction. Problem 3: u xx + u yy + u zz + exp(xyz)(u x + u y? u z ) = F (4:9) with solution u = x + y + z: This problem turns out to be a dicult one for CG-like methods. Problem 4: (exp(xyz)u x ) x + (exp(?xyz)u y ) y + (exp(?xyz)u z ) z?25(x + y + z)u x? 25[(x + y + z)u] x? u +x+y+z = F (4:) with solution u = exp(xyz) sin(x) sin(y) sin(z) This problem has a positive denite symmetric part (A + 2 AT ) for any mesh size [6]. Problem 5: (x + y + z)u u xx + u yy + u zz + xu x? yu y + zu z + = F (4:) xyz with solution u = exp(xyz) sin(x) sin(y) sin(z) This problem [9] leads to an extremely ill-conditioned linear system with an indente symmetric part. Problem 6: u xx + u yy + u zz? 5 x 2 (u x + u y + u z ) = F (4:2) 22

27 Table 4.: Number of successes per method for elliptic problems Method no prec. ILU() ILU() MILU() MILU() ILUT() ILUT() P BICG CGS BICGSTAB CGNE CGNR GMRES() P with solution u = exp(xyz) sin(x) sin(y) sin(z) Problem 7: with solution u xx + u yy + u zz? ( + x 2 )u x + (u y + u z ) = F (4:3) u = exp(xyz) sin(x) sin(y) sin(z) Problem 8: with solution u xx + u yy + u zz? x 2 u x + u = F (4:4) u = exp(xyz) sin(x) sin(y) sin(z) This is another extremely ill-conditioned problem. Problem 9: with solution u xx + u yy + u zz? [(? 2x)u x + (? 2y)u y + (? 2z)u z ] = F (4:5) u = exp(xyz) sin(x) sin(y) sin(z): The above problems were tested on a grid. The spectra of Problem 2, 3, 5, and 7 as well as those of the matrices preconditioned by ILU(), MILU(), and ILUT() are shown in the Appendices A and B. The spectra for the remaining problems can be found in [2] Numerical Results Before we consider the individual results, we present a statistical summary of all the results achieved in Tables 4. and 4.2. Table 4. summarizes the number of successes for each method with and without preconditioning. A success is dened as a run that terminates within 728 iterations and delivers an absolute actual error that is smaller than?3 in the maximum norm. The method with the most successes is BICG, the preconditioner with the most successes ILUT(). The combinations with the most successes are BICG with ILUT() and ILUT(), 23

28 Table 4.2: Overall fastest method and CGNE with preconditioning Problem CGS BICGSTAB precond. CGNE precond. SUM RP-9 2(- 5) 2(- 5) MILU() 4(- 6) ILU() 5(- 6) 3(- 6) 2 3(- 7) 3(- 7) ILU() 5(- 6) ILU() 22(- 7) 2(- 6) 3 6(-5) 58(- 5) ILUT() 66(- 4) ILUT() 3(- 6) 7(- 6) 4 4(- 6) 4(- 6) MILU() 9(- 5) ILU() 6(- 6) 8(- 7) 5 fail fail 434(- 3) ILUT() 56(- 4) 263(- 3) 6 64(- 4) 23(- 4) ILUT() 48(- 4) ILUT() 79(- 7) 38(- 8) 7 5(- 6) 4(- 5) ILUT() 3(- 5) ILU() 47(- 6) 9(- 6) 8 25(- 4) fail ILUT() 55(- 4) ILUT() 286(- 4) 8(- 4) 9 22(- 4) 22(- 4) ILUT() 92(- 5) ILUT() 4(- 6) 5(- 6) CGS with ILUT(), and CGNE with ILUT(). Considering these results we see that overall none of the methods above is able to solve all nine problems. If one investigates the results more closely, one sees that CGNE with ILUT() and ILUT() comes close to solving all nine problems as it barely fails the criterion to be considered a success for problem 5, whereas BICG and CGS with ILUT-preconditioning deliver incorrect results for Problem 5. So, CGNE is here converging extremely slow, whereas all the other methods fail completely. The use of another preconditioner can lead to better results. This has been done for row projection methods [2] that are equivalent to CGNE using a block Jacobi (SUM) or block SSOR preconditioner (RP-9). Here, faster convergence is achieved in almost all cases (see Table 4.2), which shows that for the problems considered, CGNE possesses a robustness that the other solvers do not possess. If we consider the quality of convergence speed, that is, which solver/preconditioner combination is the overall fastest for the problems considered, CGS and BICGSTAB with ILUT turn out to be the overall fastest. The results are given in Table 4.2. Clearly, BICGSTAB is an ecient solver when it converges. Unfortunately, it is not as stable as its name seems to indicate. Considering the individual results (see Tables A-A6 in Appendix A), we can make further observations. Without preconditioning, CGNE is the most successful method. BICG and CGS with ILUT() is the most stable method, but combining CGS with ILU() or MILU() makes it the most unstable solver for the considered problems. We also see that CGS { when it converges { is often more accurate than BICG, but BICG is less sensitive to "blowups". Both eects appear to be a consequence of the squared residual polynomial in CGS. Also, BICGSTAB breaks down far more often than CGS, whereas CGS diverges. No "blowups" can be observed for GMRES, but often the residual stagnates and the algorithm does not nish nor produce any reliable result. A dierent convergence behavior for CGNE and CGNR without preconditioning can be observed. The most signicant dierence occurs for Problem 8. This deviation in the convergence behavior of both methods, which are equivalent, is caused by the use of a dierent stopping criterion. Investigating the actual error shows that they are approximately identical, but the residuals dier. The spectra of the matrices (see Figures A.-A.9) give further insight into the convergence behavior. In Table 4.3, we give the percentages of the part of the spectrum in the left part 24

29 Table 4.3: Percentage of eigenvalues in the left complex halfplane Problem no precond. ILU() MILU() ILUT() of the complex plane. If all eigenvalues have a positive real part, the solvers, in general, succeed. Overall, the solvers fail for matrices with eigenvalues with a negative real part. The only exception here is Problem 2 with MILU()-preconditioning for which all solvers converge. Changing the start vector, however, can make them fail for this problem. The extremely low percentage of eigenvalues in the left part of the complex plane increases the chance of choosing a "good" start vector. Convergence is, of course, not guaranteed for the general case. None of the solvers was able to solve Problem 5. An investigation of the spectrum of the matrix of Problem 5 (see Figure A.) gives further insight. The matrix has a large part in the left complex halfplane. Preconditioning it improves the spectrum (see corresponding spectra), but there is still a fairly high percentage of eigenvalues in the left part of the complex plane. 25

30 Table 4.4: Data on some Harwell/Boeing matrices Matrix order non-zeros Frob.norm max. el. min. el. FS E+8 2.2E+8.8E-23 FS E+8 2.2E+8.4E-6 GRE E+ 9.5E- 6.E-8 HWATT E+.E+ 4.2E-9 MC FE E+7.2E+7 2.7E-3 ORSREG E+5.7E+4 2.5E+ PDE E+2 4.E+ 5.E- PORES E+8 3.8E+7 2.8E-3 SAYLR E+5 6.7E+3 8.2E- STEAM E+ 4.7E+9 6.E-5 Table 4.5: Number of successes per method for H/B collection Method no prec. ILU() ILU() MILU() MILU() ILUT() ILUT() P BICG CGS BICGSTAB CGNE CGNR GMRES() P Harwell/Boeing Collection (RUA) 4.3. Description of the Matrices The Harwell/Boeing collection consists of a variety of test matrices [5]. We consider here only of them, which were taken from the subset RUA. We selected matrices with dierent characteristics; some are small (e.g., GRE 5), some are large (e.g., SAYLR 4), some are very sparse (e.g., GRE 5), others have fairly many elements (e.g., MC FE), some are well-conditioned (e.g., STEAM2), and others are extremely ill-conditioned (e.g., FS 76 3). The purpose of this study was to get an overview of the convergence behavior of the methods considered which these test cases provide. The matrices and some of their characteristics are given in Table Numerical Results and Observations We summarize here the results obtained with the selected matrices of the Harwell/Boeing collections. Table 4.5 contains the number of successes for each method. A success is here any run that terminates within n iterations (where n is the order of the system) and causes an actual absolute error smaller than?3 in the maximum norm. The method with the most successes is { as for the elliptic problems { BICG; the preconditioner with the most successes, however, is MILU() instead of ILUT(). The combination with 26

31 Table 4.6: Overall fastest convergence Method no prec. ILU() ILU() MILU() MILU() ILUT() ILUT() P BICG CGS 4 4 BICGSTAB CGNE CGNR 2 GMRES() P the most successes is BICG with MILU(). We see ILU and ILUT about equally successful, whereas MILU() is clearly the worst preconditioner. Table 4.6 summarizes which of all the above combinations converges most rapidly. The numbers were obtained by giving a point to the combination with the lowest number of iterations for a particular problem. If several combinations had the same number of iterations which occurred fairly often, each one obtained a point. The results show that BICGSTAB with MILU() is the overall fastest method, followed by BICGSTAB with ILU(). Overall, we see that BICG is the most reliable solver and BICGSTAB the fastest, which is in agreement to the results of Section 4.2. Investigating the individual results (see Tables B-B6 in Appendix B), further observations can be made. For the smaller matrices, we also investigated the spectra. For FS 76, almost all solver-preconditioner combinations are successful. Preconditioning with ILU() leads to a spectrum that is close to that of identity matrix (see Figure B.2). This explains the extremely good convergence for this case. One observes a lack of accuracy for BICG, BICGSTAB, and CGNE with ILUT-preconditioning. The spectrum (see Figure B.3) shows eigenvalues close to zero, the smallest eigenvalue being of order?4 ; thus the system is near singular. If no preconditioner is used, convergence deteriorates signicantly, especially for CG on the normal equations, that is, CGNE and CGNR. This is not surprising, as for the normal equations the condition number of the matrix is squared. For FS 76 3, all methods fail. A look at the spectrum of FS 76 and FS 76 3 explains the dierent convergence behavior. Figure B. gives the real parts of the eigenvalues of both matrices. The imaginary parts of the spectra are zero for most eigenvalues. There are a few eigenvalues with imaginary parts smaller than?3. Clearly, FS 76 3 is extremely ill-conditioned, whereas FS 76 is fairly wellconditioned. None of the preconditioners applied to FS 76 3 could generate a well-conditioned system. For GRE 5, ILU-preconditioning fails for all solvers. Without preconditioning, only GM- RES() fails, and the use of CGNR leads to low accuracy. Overall, MILU()- and ILUTpreconditioning improve convergence here, whereas ILU-preconditioning fails. Considering the spectra of GRE 5 and the preconditioned matrices (see Figures B.4, B.5, and B.6), this behavior can be explained. The use of ILU()-preconditioning generates one extreme eigenvalue with real part 2 and leads to a singular system, because one eigenvalue is zero. All these spectra have a few eigenvalues in the left part of the complex plane. For HWATT 2, a preconditioner is absolutely necessary, and clearly, MILU() is the wrong choice, as all solvers fail for this case. CGNR and GMRES() fail overall. Many of them 27

32 stop early but deliver incorrect results. A thorough investigation of the spectrum gives further insight. HWATT 2 is a near singular matrix with real eigenvalues most of which are slightly smaller than zero, and the rest of them being approximately one. A look at the spectrum of MC FE (see Figure B.7) shows that this matrix is extremely illconditioned. Preconditioning with MILU(), improves the spectrum considerably (see Figure B.8). Consequently, all solvers perform well for this case. Overall, ILU-, ILUT- and MILU()- preconditioning turn out to be successful for BICG, CGS, BICGSTAB, and GMRES. ORSREG can be solved by almost all solver/preconditioner combinations. Only CGNE and CGNR without preconditioning converge so slowly that they do not nish within n iterations. Here, level- preconditioning is always signicantly better than level- preconditioning. PDE 95 is a positive denite matrix (see Figure B.9) and therefore can be solved well in most cases. For CGNE and CGNR, convergence slows down signicantly due to the squared condition number of the normal equations. ILU- and MILU-preconditioning preserve positive deniteness and improve the distance between largest and smallest eigenvalues (see Figures B. and B.). ILUT, however, destroys positive deniteness, but generates a spectrum for which good convergence can be expected. All eigenvalues are on the right half of the complex plane and close to the point (,) (see Figure B.2). For PORES 2, preconditioning with ILU leads to success for all solvers, except for CGNR and CGNE. For BICG and CGS, ILUT is also successful, but converges signicantly more slowly. All preconditioners clearly improve the condition of SAYLR4 and lead to good convergence for all solvers. If no preconditioner is used, convergence is extremely slow for BICG, CGS, and BICGSTAB, and CGNE, CGNR, and GMRES() even fail. STEAM2 is a negative denite matrix (see Figure B.3). Its spectrum shows several big gaps, which can cause slow convergence, as can be observed for most solvers. Preconditioning with ILU() leads to a spectrum with eigenvalues close to one (see Figure B.4) and convergence within one iteration for BICG, CGS, BICGSTAB, and CGNE. 28

Preface to the Second Edition. Preface to the First Edition

Preface to the Second Edition. Preface to the First Edition n page v Preface to the Second Edition Preface to the First Edition xiii xvii 1 Background in Linear Algebra 1 1.1 Matrices................................. 1 1.2 Square Matrices and Eigenvalues....................