Wavelet based preconditioners for sparse linear systems

Applied Mathematics and Computation 171 (2) 23 224 www.elsevier.com/locate/amc Wavelet based preconditioners for sparse linear systems B.V. Rathish Kumar *, Mani Mehra Department of Mathematics, Indian Institute of Technology, Kanpur 28 16, India Abstract A class of efficient preconditioners based on Daubechies family of wavelets for sparse, unsymmetric linear systems that arise in numerical solution of Partial Differential Equations (PDEs) in a wide variety of scientific and engineering disciplines are introduced. Complete and Incomplete Discrete Wavelet Transforms in conjunction with row and column permutations are used in the construction of these preconditioners. With these Wavelet Transform, the transformed matrix is permuted to band forms. The efficiency of our preconditioners with several Krylov subspace methods is illustrated by solving matrices from Harwell Boeing collection and Tim Davis collection. Also matrices resulting in the solution of Regularized Burgers Equation, free convection in porous enclosure are tested. Our results indicate that the preconditioner based on Incomplete Discrete Haar Wavelet Transform is both cheaper to construct and gives good convergence. Ó 2 Elsevier Inc. All rights reserved. Keywords: Preconditioning; Sparse matrices; Wavelet transform; Krylov subspace solvers * Corresponding author. E-mail addresses: bvrk@iitk.ac.in (B.V. Rathish Kumar), manimeh@iitk.ac.in (M. Mehra). 96-33/$ - see front matter Ó 2 Elsevier Inc. All rights reserved. doi:1.116/j.amc.2.1.6

24 B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 1. Introduction Consider solving the large and sparse linear system Ax ¼ b ð1:1þ that arises when numerical methods such as finite difference, finite element etc., are used to solve partial differential equations (PDEs). Krylov subspace iterative methods such as Conjugate Gradient (CG), Biconjugate Gradient (BIGG), Conjugate Gradient squared (CGS), Bi-Conjugate Gradient stabilized (Bi- CGSTAB), Generalized minimal residual method (GMRES(k)) are largely employed in solving (1.1). It is well known that the rate of convergence of such methods are strongly influenced by the spectral radius of A. It is therefore natural to try to transform the original system in to one having the same solution but more favorable spectral properties. A preconditioner is a matrix that can be used to accomplish such a transformation. If M is a non-singular matrix which approximates A(M A), the transformed linear system M 1 Ax ¼ M 1 b ð1:2þ will have the same solution as system (1.1) but the convergence rate of iterative methods applied to (1.2) may be much higher. Further it is desirable that M 1 v can be easily calculated for any vector v. A large number of different preconditioning strategies have been developed. A brief survey of the same is provided in [1]. Recently, there is an increased interest in using wavelet based preconditioners with Krylov subspace methods (KSMs) for linear system (1.1). Chan et al. [2] have used Discrete Wavelet transform (DWT, usually complete Discrete Wavelet transform (cdwt) is referred as DWT) for improving the performance of sparse approximate inverse preconditioners as proposed by Grote and Huckle [3]. They dealt with matrices with smooth inverses resulting from 1-D Laplacian operator and linear PDE with variable coefficients. Chen [4] used DWT with permutations for the construction of preconditioners for dense system arising from Boundary element analysis (BEA) of linear PDEs. Ford et al. [,6] have considered dense matrices with local non-smoothness and have shown that wavelet compression can be used for designing preconditioners for such dense systems after isolating local non-smoothness. All these studies largely focus on algorithm for constructing DWT based preconditioners in solving dense symmetric/unsymmetric linear systems arising in BEA of linear PDEs by GMRES method. DWT transforms (1.1) to WAW T ~x ¼ Wb; ~x ¼ Wx: ð1:3þ Since the wavelet transform W is orthogonal the eigen values of ea ¼ WAW T are the same as those of A. This implies that condition number of ea will be unaffected by DWT. Incomplete Discrete Wavelet Transform (idwt) W 1, which is

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 2 approximately orthogonal, is cheaper and easier to construct. W 1 may serve as an alternative for W to project A into wavelet space. Standard DWT leads to preconditioner with non-zero entries dispersed throughout. The cost of applying such a preconditioner is often too high for practical purpose. By use of cdwt and idwt with permutation one can improve the positioning of the non-zero entries to give a preconditioner that is cheaper to apply and also gives good convergence. In this study we propose Haar and Daubechies wavelet based preconditioners for non-symmetric large sparse linear systems resulting from non-linear PDE analysis. Further in our algorithms we are using Discrete Wavelet Transform with permutations based on cdwt and idwt. While it is generally agreed that the construction of efficient general purpose preconditioner is not possible, there is still considerable interest in developing methods which will perform well on a wide range of problems. The algorithms proposed in this study are tested on a variety of matrices resulting from Finite Difference and Finite element method of non-linear PDEs and those from Harwell Boeing collection. Also the preconditioners are tested on five different Krylov subspace methods. The paper is organized as follows. In Section 2 we give a quick overview of sparse and band preconditioner. In Section 3 we summarize some basics of wavelets. We also present cdwt and idwt based on pyramid algorithm. In Section 4 we introduce our new concept of Incomplete Discrete Wavelet Transform with permutation (idwtper) and describe the properties of banded wavelet preconditioner using idwtper. Implementation details and the results of numerical experiments are discussed in Section. Finally, we make some conclusions in Section 6. 2. Sparse and band preconditioner A simple preconditioner for a sparse matrix would be banded matrix constructed by setting to zero all the entries of the matrix outside a chosen diagonal band. The wider the band more accurate will be preconditioner approximating the original matrix. But matrices with large band width (l) are expensive to store and would demand enormous amount of computation at each iterative step involving such a matrix as a preconditioner. So it is important to choose l in such a way that it balances these two conflicting considerations. To decide a preconditioner for linear system (1.1) we have considered the following splitting of A: A ¼ D þ C; ð2:1þ where D is a band matrix with wrap around boundaries. For the case of l =3, the matrix D is simply

26 B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 2 3 A 1;1 A 1;2 A 1;n A 2;1 A 2;2 A 2;3. A.. 3;2...... 6.. 4. A n 1;n 7 ð2:2þ A n;1 A n;n 1 A n;n the associated preconditioned system ði þ D 1 CÞx ¼ D 1 b ð2:3þ has the preconditioner M 1 = D 1. Clearly with the above choice of l =3,M 1 will not approximate A 1 well in all cases. It is our experience that to improve the above preconditioner, merely increasing l alone is not sufficient as the improvements are only marginal. Here we shall discuss a way based on cdwt and idwt to derive a new and improved preconditioner. 3. Wavelet preliminaries Multiresolution analysis (MRA) is the theory that was used by Ingrid Daubechies to show that for any non-negative integer n there exists an orthogonal wavelet with compact support such that all the derivatives up to order n exist. Here we are using compactly supported wavelets like Haar, D4, D6 (DN stands for Daubechies order N wavelets). MRA describes a sequence of nested approximation spaces V j in L 2 (R) such that closure of their union equals L 2 (R) making fgv 1 V V 1 L 2 ðrþ: ð3:1þ The orthogonality of scaling functions and wavelets together with the dyadic coupling between MRA spaces lead to a relation between scaling function coefficients and wavelet coefficients on different scales. This yields a fast and accurate algorithm due to Mallat [8] denoted by pyramid algorithm. Straightforward implementation of the pyramid algorithm leads to difficulties in handling boundary points because such a procedure requires several data values which are defined outside the boundaries and these are not known. Based on this we will define two notions namely cdwt and idwt. Both cdwt and idwt are defined by filter coefficients a,a 1,...,a D 1. Where ÔDÕ is order of wavelet transform. The filter coefficients b,b 1,...,b D 1 are derived from a i, by the following relation b i ¼ð 1Þ i a D 1 i : ð3:2þ

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 27 For f 2 L 2 (R), f j (x) denotes the projection of f onto V j space in terms of scaling function / j,k is defined by f j ðxþ ¼P V j f ðxþ ¼ X k s j k / j;k: ð3:3þ Projection also has a formulation in terms of scaling function and wavelet function w j,k where j is resolution level and k is translation. P V j ðxþ ¼ X k s r k / r;lðxþþ XL 1 X d j k w j;kðxþ; ð3:4þ j¼r k where r is lowest and L is highest resolution level. Thus we obtain the relations s j 1 l d j 1 l ¼ XD 1 k¼ ¼ XD 1 k¼ a k s j 2lþk ; b k s j 2lþk : ð3:þ 3.1. Complete Discrete Wavelet Transform (cdwt) If function f is periodic, we also have periodicity in the scaling and wavelet coefficients. s j k ¼ sj kþ2 j p d j k ¼ dj kþ2 j p ; p 2 Z: ð3:6þ Hence it is enough to consider 2 j coefficients of either type at level j. Thus pyramid algorithm for periodic Wavelet Transform is defined by s j 1 l d j 1 l ¼ XD 1 k¼ ¼ XD 1 k¼ a k s j h2lþki 2j ; b k s j h2lþki 2j ; l ¼ ; 1;...; 2 j 1 ð3:7þ For a given vector s from vector space R n one may construct an infinite periodic sequence of period n and use it as coefficients of a scaling function f L (x) in some fixed subspace V L of L 2 (L in an integer). Hereafter we refer to periodic wavelet transform by cdwt. Then transform Ws! w(w = Ws) is implemented by pyramid algorithm (3.7). Denote s = s (L) is column vector of A at wavelet level L. Then pyramid algorithm transforms the vector s l to w defined as w ¼½ðs ðrþ Þ T ðf ðrþ Þ T ðf ðrþ1þ Þ T...ðf ðl 1Þ Þ T Š ð3:8þ

28 B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 in level by level manner as described below s ðlþ! s ðl 1Þ! s ðl 2Þ!! s ðmþ!!s ðrþ & & & & & & f ðl 1Þ f ðl 2Þ f ðvþ f ðrþ where s j and f j are of length 2 j. Suppose w denotes the wavelet function. Then we expect the new vector w to be nearly sparse because of the usual moment conditions Z 1 1 wðxþx p dx ¼ for p ¼ ; 1;...; D=2 1 ð3:9þ are equivalent to vector moment conditions X D 1 k¼ ð 1Þ k k p b k ¼ for p ¼ ; 1;...; D=2 1: ð3:1þ Here the larger D is, the better is the compression in w, but the compact support is larger as well. As the periodized wavelet transform satisfies the orthonormal relation WW T = I we call it cdwt. 3.2. Incomplete Discrete Wavelet Transform (idwt) Unlike in cdwt our idwt does not require periodic ECÕs as the function f need not be periodic to apply the idwt. idwt assumes that all values outside the boundaries are equal to zero as shown in Fig. 1. Further it is much simpler to implement than its complete counterpart (cdwt). Fig. 1. The pyramid algorithm and boundary points.

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 29 Pyramid algorithm for idwt is s j 1 l d j 1 l ¼ XD 1 k¼ ¼ XD 1 k¼ a k s j 2lþk ; b k s j 2lþk : ð3:11þ where the points outside boundary are zero. The transform W 1 s! w(w = W 1 s) is implemented by (3.11) and satisfy orthonormal relation approximately W 1 W T 1 ui. We can show the power of this idwt on Calderon Zygmund matrix. Consider the Calderon Zygmund type matrix A =(a ij ), where ( ð3=2þ if i ¼ j; a ij ¼ ð3:12þ otherwise: 1 ji jj As shown in Fig. 2 it is smooth, with entries decreasing in magnitude away from the diagonal. When a cdwt is applied to A the resultant matrix ea ¼ WAW T after thresholding has a weak finger pattern with most of the largest entries confined to a narrow diagonal band. The matrix ea 1 ¼ W 1 AW T 1 obtained by applying idwt to A is also shown in Fig. 2 and it is very much similar to ea. 4. DWT-based preconditioners 4.1. The wavelet transform with permutation (DWTPer) A discrete wavelet transform with permutation based on cdwt is defined in [4]. It is equivalent to wavelet transform followed by permutations of rows and columns. We will refer this wavelet transform with permutation as DWTPer (or cdwtper). This transform has the effect of preserving the general structure (by which we mean the ÔshapeÕ of the areas of singularity) of a matrix after a wavelet transform has been applied. The precise effect of applying DWTPer to a diagonal matrix is established in [4]. 4.1.1. Incomplete Discrete Wavelet Transform with permutation (idwtper) Here we are proposing a new notion of Incomplete Discrete Wavelet Transform with permutation as idwtper. Denote by s = s (L) a column vector of A at the wavelet level L. Then pyramid algorithm based on idwt transforms the vector s (L) to w ¼½ðs ðrþ Þ T ðf ðrþ Þ T ðf ðrþ1þ Þ T...ðf L 1 Þ T Š T : ð4:1þ

21 B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 Fig. 2. Mesh plot of matrix Top center: original matrix, bottom left: its cdwtper, bottom right: its idwtper. Assume n =2 L and r is an integer such that 2 r < D and 2 r+1 P D. r = for D = 2 (Haar wavelets) and r = 1 for D = 4 (Daubechies order 4 wavelets). In matrix form w is expressed as w ¼ P rþ1 W 1rþ1...P L 1 W 1L 1 P L W 1L s L W 1 s L ; ð4:2þ where P m ¼ P m I m! nn ð4:3þ

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 211 with P v a permutation matrix of size 2 m =2 L k m, that is, P m ¼ Ið1; 3;...; 2 m 1; 2; 4;...; 2 m Þ, and where W 1m ¼ W 1m with W 1m an orthogonal (sparse) matrix of size 2 m =2 L minusk v and I m is an identity matrix of size k m. The one level transformation matrix W 1m is a compact diagonal block matrix. For example, with the Daubechies order D = 4 wavelets with m = 2 vanishing moments, it is 1 a a 1 a 2 a 3 b b 1 b 2 b 3 a a 1 a 2 a 3 b b 1 b 2 b 3............ W 1m ¼......... :... a a 1 a 2 a 3 b b 1 b 2 b 3 B C @ a a 1 A a a 1 Now we will define new one level idwtper matrix (similar to W 1m ) 1 a / a 1 / a 2 /... a D 1 / I / / / /... / b / b 1 / b 2 /... b D 1 / / / I / /... /. a /...... bw 1m ¼ / I........................... a / a 1 / / I / / B C @ b / b 1 / A / / / I I m nn nn : ð4:4þ Here I is an identity matrix of size 2 L m 1and/Õs are block zero matrices. For m = L, both I and / are of size i.e. bw 1L ¼ W 1L ¼ W 1L. Further this idwt- Per for a vector s L 2 R n can be defined by

212 B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 ^w ¼ bw 1 s ðlþ with bw 1 ¼ bw 1rþ1 bw 1rþ2... bw 1L based on (L +1 r) levels. For a matrix A n n, the idwt would give ð4:þ ba 1 ¼ bw 1 A bw T 1 : Now to relate ba 1 to ea from a standard idwt, we define a permutation matrix P ¼ P T L P T L 1...P T rþ2 P T rþ1 ; ð4:6þ where matrices P k s come from (4.3). Firstly by induction, we can prove the following! T! bw 1k ¼ YL k Y L k P kþl W 1k P kþl for k ¼ r þ 1; r þ 2;...; L; l¼1 that is, bw 1;m ¼ W 1L l 1 bw 1L 1 ¼ P T L W 1 L 1 P L. bw 1rþ1 ¼ P T L P T L 1...P T rþ2 W 1 rþ2 P rþ2...p L 1 P L : Now we can verify that PW 1 ¼ðP T L P T L 1...P T rþ1 ÞðP rþ1w 1rþ1...P L W 1L Þ ¼ bw 1rþ1 ðp T L P T L 1...P T rþ2 ÞðP rþ2w 1rþ2...P L W 1L Þ.. ¼ bw 1rþ1 bw 1rþ2... bw 1L 1 P T L W 1 L 2 ðp L W 1L Þ ¼ bw 1rþ1 bw 1rþ2... bw 1L ¼ bw 1 : Therefore bw 1 ¼ PW 1 ; ba 1 ; ¼ P ea 1 P T. From this relation we can conclude that idwtper can be implemented in a level by level manner, either directly using bw 1m (via bw 1 ) or indirectly using P m (via P) after a idwt, and we obtain the same result. The cost in terms of flops of performing a cdwt may be calculated in a straight forward manner as in [7]. Since DWTPer simply involves a permutation of DWT, the flops count for DWTPer is identical. The cost of applying

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 213 a block size N, order D, level l idwtper to an n n rectangular matrix is less than 8DnNð1 1 2 l Þ. 4.2. Banded Wavelet Preconditioner using cdwtper and idwtper As far as preconditioning is concerned, to solve (1.1), we propose the following algorithm 1. Apply a idwtper to Ax = b to obtain ea 1^x ¼ ^b 2. Select a suitable band form M of ba 1 3. Use M as a preconditioner to solve ba 1^x ¼ ^b For cdwtper replace ba 1 by ba in the above algorithm. As we described in Section 2 the band size of M determines the cost of a preconditioning step. Therefore, we shall explore the possibility of constructing an effective preconditioner based on suitable band. Let LEV denote the actual number of wavelet levels used (1 6 LEV 6 (L +1 r)). With the help of this DWTPer we can transform a band matrix A in to another band matrix ba. We show in Fig. 3 the original matrix and cdwtper, idwtper of a matrix taken from group of Saylor (petroleum reservoir simulation matrices) with D = 4, LEV = 3 and n = 216. Where the Band width of new matrices under cdwtper and idwt- Per is increased and satisfy the over estimate given by Theorem 4.1. Here we are introducing definition of band matrix. Band(A,a,b,k): A block band matrix A n n and blocks of size k k, is called a Band(A,a,b,k) if its lower band width is a and upper block band width b. when k = 1 Band(A,a,b,1) = Band(A,a,b). The strategy that we take is to start preconditioning step is partition A = D + C, where D is a Band(D,a,a) for some integer a. First apply DWTPer with LEV 6 (L +1 r) of wavelet levels, to give ba^x ¼ðbD þ bcþ^x ¼ ^b: Now bd is also a band matrix and it is at most Band( bd; k; kþ matrix with k as predicted by following theorem proved in [4]. Theorem 4.1. A is a band(a, b) matrix. Then the new DWT of l levels, based on DaubechiesÕ order D wavelets, transforms A into ba which is at most a band(k 1,k 2 ) matrix with k 1 a ¼ k 2 b ¼ Dð2 l 1 1Þ: ð4:7þ Let B denote the Band(bA; k; k) part of the matrix ba. Then select the preconditioner of a band width l such that l 6 k and M = Band(B, l,l).

214 B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 1 1 2 2 3 1 1 2 2 3 1 1 1 1 2 2 2 2 3 3 1 1 2 2 3 1 1 2 2 3 Fig. 3. Level 3 Daubechies 4 transforms of a matrix pores1, Top center: original matrix, bottom left: cdwtper, bottom right: idwtper.. Numerical experiments In this section we present the results of numerical experiments on few nonlinear problems and on a range of matrices from the Harwell Boeing collection [9] and Tim Davis collection. The right hand side of each of linear systems from Harwell-Boeing collection was computed from the solution vector x of all ones, the choice used, e.g., in [1]. All numerical experiments were computed in double precision using a MAT- LAB implementation. The initial guess was always x = and the stopping criterion is kb Axk 2 kbk 2 6 1 6 : ð:1þ

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 21 In our experiments we were primarily concerned with testing the effectiveness of the pre-conditioners, so we did not attempt to optimize the choice of tolerance. For some small size problem like IBM32 and pores1 we have tested preconditioner for tolerance of 1 8. Default we have taken 1 6. We shall demonstrate the effectiveness of the preconditioner for few standard KSMs like CGM, BICGM, Bi-CGSTAB, CGS, GMRES(k). To begin with we present the results for all these solvers and subsequently will focus on popular solvers like GMRES and Bi-CGSTAB. Initially matrices such as sherman1, saylr3, pores1, IBM32 from Harwell Boeing collection are used. Also the matrices encountered in finite difference analysis of unsteady burgers equation are considered. The number of iterations for both unpreconditioned and preconditioned iterative methods are given in respective tables. A means slow convergence or no convergence. Further in our numerical experiments we have tested with four different preconditioners namely, cdwtper-haar (complete DWT with permutation based on Haar wavelet), cdwtper-d4 (complete DWT with permutation based on compactly supported wavelet of order four), cdwtper-d6 (complete DWT with permutation based on compactly supported wavelet of order six), idwtper-haar (incomplete DWT with permutation based on Haar wavelet). The convergence results without preconditioning are listed under M = I. We also considered the use of a standard DWT-based preconditioner formed by setting to zero all entries whose magnitude is less than a chosen threshold value (as done in [11]), and found that such preconditioner becomes singular for non-linear problems. All the numerical experiments are carried out using MATLAB software installed on Sun E2 workstation with SunW ultrasparc II, 4 MHZ, Dual processor using double precision arithmetic under Solaris 8 operating system. To begin with we have tested 4 Krylov subspace solvers on matrices given below: Sherman1: This matrix arises in oil reservoir simulation on a 1 1 1 grid, using seven point finite-difference approximation with NC equations and unknowns per gridblock. Here, size n = 1, nz = 2394 (where nz is number of non-zeros entries) and NC = 1. Outcome of the numerical experiments are provided in Table 1. From Table 1 one can notice that our wavelet transform based preconditioners are effective with all the four iterative solvers for unsymmetric matrices, In most of the cases idwtper-haar based preconditioner is found to be effective in accelerating the convergence rate of the iterative solver. In Fig. 4 logarithm of relative residual norm is plotted against the number of iterations to depict the convergence history of the preconditioned KSMs. Saylr3: Saylor petroleum engineering reservoir simulation matrix arises in 3D reservoir simulation. Here, n = 1, and nz = 37. On this matrix we tested our preconditioner with different band width (l). Then we found the choice of l = leads to a singular preconditioner. Results corresponding l = 1 are

216 B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 Table 1 Convergence results for sherman1: unpreconditioned (M = I) Bi-CGSTAB GMRES(2) BICG CGS M = I 2 >1 36 334 DWTPer-haar 1 19 1 12 DWTPer-D4 164 379 128 164 DWTPer-D6 171 748 124 182 idwtper-haar 99 133 78 87 log of relative residual norm 2-2 -4-6 -8-1 -12 None D4 haar D6 ihaar log of relative residual norm 2-2 -4-6 -8-1 None D4 haar D6 ihaar -14-12 -16 1 2 3 4 6 # of iteration -14 2 4 6 8 1 12 14 # of iteration log of relative residual norm 2-2 -4-6 -8-1 -12 None D4 haar D6 ihaar log of relative residual norm 2-2 -4-6 -8-1 -12 None D4 haar D6 ihaar -14-14 -16 1 1 2 2 3 3 4 # of iteration 16 1 1 2 2 3 3 4 # of iteration Fig. 4. Convergence behavior of sherman1 problem. Top left: with Bi-CGSTAB, top right: GMRES(2), bottom left: BICG, bottom right; CGS. presented in Table 2. Level 3, D4 transforms of a matrix saylr3 and the convergence history plots are provided in Fig..

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 217 Table 2 Convergence results for saylr3: unpreconditioned (M = I) Bi-CGSTAB GMRES(2) BICG CGS M = I 244 1 366 332 DWTPer-haar 11 16 83 94 DWTPer-D4 171 38 12 1 DWTPer-D6 168 747 11 18 idwtper-haar 84 132 77 9 1 2 3 4 6 7 8 9 1 1 2 3 4 6 7 8 9 1 1 2 3 4 6 7 8 9 1 1 2 3 4 6 7 8 9 1 log of relative residual norm 2-2 -4-6 -8-1 -12 None D4 haar D6 ihaar log of relative residual norm 2-2 -4-6 -8-1 None D4 haar D6 ihaar -14-12 -16 1 1 2 2 3 3 4 4 # of iteration -14 2 4 6 8 1 12 14 # of iteration Fig.. Level 3 Daubechies 4 transforms of a matrix saylr3. Top left: original matrix, top right: idwtper. Convergence behaviour of saylr3 problem. Bottom left: with Bi-CGSTAB, bottom right: with GMRES(2). Pores 1: This matrix is extracted from PORES package for reservoir simulation. Size n =3andnz = 18. On this matrix we are showing result for two

218 B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 log of relative residual norm - -1-1 -2 None D4 Haar D6 ihaar log of relative residual norm - -1-1 -2-2 -3 None D4 haar ihaar -2 1 1 2 2 3 3 4 4 # of iteration -3 1 1 # of iteration log of relative residual norm - 1-1 -2 None D4 haar D6 ihaar log of relative residual norm 1 - -1-1 -2 None D4 haar D6 ihaar -2 1 2 3 4 6 7 8 9 # of iteration -2 2 4 6 8 1 12 14 16 # of iteration Fig. 6. Convergence behavior of pores1 problem. Top left: with Bi-CGSTAB, top right: GMRES(2), bottom left: BICG, bottom right; CGS. different band width l = 3,. Convergence history is provided in Fig. 6. Convergence results for unpreconditioned (top) and l = (middle), l = 3 (bottom) are presented in Table 3. In this case DWTPer-D6 is not working effectively. IBM32: Size n = 32 and nz = 126. On this matrix we are showing results in Table 4 for two different band width l =,1: unpreconditioned (top) and l = 1 (middle) l = (bottom). Fig. 7 carries the convergence details. Regularized Burgers equation: In this case, A comes from finite difference analysis of Regularized Burgers Equation defined with m > by ouðx; tþ=ot uðx; tþouðx; tþ=ox ¼ mo 2 uðx; tþ=o 2 x for t > and < x < 1 uðx; Þ ¼u ðxþ ð:2þ

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 219 Table 3 Convergence results for Pores1: unpreconditioned (M = I) Bi-CGSTAB GMRES(2) BICG CGS M = I 221 167 77 1 DWTPer-haar 96 2 1 96 DWTPer-D4 61 23 41 7 DWTPer-D6 143 89 136 idwtper-haar 22 2 22 22 DWTPer-haar 198 2 1 DWTPer-D4 198 113 DWTPer-D6 1162 24 113 36 idwtper-haar 22 2 22 2 1 1 1 1 2 2 2 2 3 3 1 1 2 2 3 1 1 2 2 3 1 1 2 2 log of relative residual - -1-1 None D4 haar D6 ihaar 3 1 1 2 2 3-2 2 4 6 8 1 12 14 16 18 # of iteration Fig. 7. Level 3 Daubechies 4 transforms of a matrix IBM32. Top left: original matrix, top right: cdwtper, Bottom left: idwtper. Convergence behaviour of saylr3 problem. Bottom right: with Bi-CGSTAB.

22 B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 Table 4 Convergence results for IBM32: unpreconditioned (M = I) Bi-CGSTAB GMRES(2) BICG CGS M = I 88 34 47 DWTPer-haar 4 74 33 36 DWTPer-D4 37 7 34 31 DWTPer-D6 2 7 34 34 idwtper-haar 33 24 26 3 DWTPer-haar 3 147 38 47 DWTPer-D4 62 134 38 46 DWTPer-D6 7 119 34 37 idwtper-haar 2 9 38 44 Fig. 8 shows the solution at times t =.18 (without and with preconditioner) where time step is Dt =1 3, size n = 124. Related problems arise in many branches of science and engineering particularly fluid mechanics and petroleums reservoir simulation. Finite difference analysis of this transient non-linear PDE model calls KSMs repeatedly till steady state solution is attained. Our preconditioners are found to be effective with all the KSM calls. Here the convergence details corresponding to the matrix attained in the last stage of simulation are provided in Table. Here we are showing our result only with DWTPer-haar, DWTPer-D4 and idwtper-haar preconditioners. Now we continue to demonstrate the effectiveness of the preconditioners for two standard and more popular solvers i.e. Bi-CGSTAB, GMRES(k) on the matrices from Tim Davis and Harwell Boeing collection. Also the effect of the preconditioners on non-linear problem modeling natural convection in porous enclosure is studied. We recall that the Bi-CGSTAB requires two matrixvector multiplications per iteration, whereas the GM-RES(k) requires only one matrix-vector multiplication per iteration. Symbol o denotes convergence with GMRES(k), k = 1. For Bi-CGSTAB matrices like sherman4, pores2, saylr1, sherman1 etc. from Harwell Boeing collection are used. With GMRES(k) matrices like sherman4, sherman1, saylr1 etc. are used. In addition to these matrices related to simulation of flow in Lid Driven Cavity (DRICAV) and finite element computation of Navier stokes equations from FIDAP are also considered. These two matrices are taken from Tim Davis collection. Results corresponding to these testings are provided in Tables 6 and 7. It is reported in [3] that matrix pores2 is difficult to get convergence. For pores2 our method is working for band width 2 with Bi-CGSTAB. However with GMRES(k) we are not getting convergence. For sherman2 GMRES(2) reduced the relative residual below 1 after four and seven steps, but never reached 1 8. This may be due to very large condition number of sherman2. So we have tried with 1 tolerance. Convection in porous enclosure: To conclude this series of numerical experiments, we considered the problem of convection in porous enclosure. The free

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 221 1.9.8.7.6..4.3.2.1.1.2.3.4..6.7.8.9 1 1.9.8.7.6..4.3.2.1 convection of heat from a hot vertical wall in a fluid saturated porous enclosure with insulated top and bottom walls is governed by D 2 w ¼ ot oy ð:3þ ow oy ot o/ ox oy.1.2.3.4..6.7.8.9 1 Fig. 8. Solution of burgers equation at time t =.18. Top: without preconditioner, bottom: with preconditioner. ot oy ¼ 1 Ra 2 D2 T :

222 B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 Table Convergence results for burgers equation: unpreconditioned (M = I) Bi-CGSTAB GMRES(2) BICG CGS M = I 29 78 7 61 DWTPer-haar 17 68 2 37 DWTPer-D4 8 14 1 2 idwtper-haar 17 66 46 46 Table 6 Convergence results with Bi-CGSTAB: unpreconditioned (M = I) Matrix M = I DWTPer-D4 DWTPer-haar idwtper-haar Sherman4 84 1 49 pores2 193 174 14 2334 saylr1 2 17 197 1 Sherman1 19 7 3 48 FIDAP 133 227 112 16 Table 7 Convergence results with GMRES(k): unpreconditioned (M = I) Matrix M = I DWTPer-D4 DWTPer-haar idwtper-haar Sherman4 228 228 229 Sherman2 361 12 486 48 Saylr1 194 83 1 142 DRICAV 16 118 47 47 The other vertical wall is maintained at ambient temperature and w is taken to be zero on all the walls. The linear system resulting from finite element analysis of coupled non-linear PDEs is solved iteratively by GMRES(k), Bi-CGSTAB. The solution to non-linear system.3 is obtained to an accuracy of 1 4 on relative error of field variables in seventeen global iteration. At each global iteration a call is made to GMRES(k)/Bi-CGSTAB. Results corresponding to the tenth global iteration are provided in Table 8. Similar results regarding the efficiency of preconditioner are seen at every global iteration. In this case our preconditioner is found not effective with Bi-CGSTAB solver. The efficiency of our preconditioner is also tested with CGM solver. Harwell Boeing collection of matrices related to Dynamic Analysis in structural engineering such as BC-SSTK1, BCSSTK4 and BCSSTK27 are considered. Also the matrix Plat362 arising in finite difference analysis of PlatzmanÕs oceanographic model, which is known as a difficult sparse matrix has also been considered for testing. In Table 9 results pertaining to these numerical experiments are provided.

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 223 Table 8 Convergence results for Convection problem: unpreconditioned (M = I) GMRES(2) Bi-CGSTAB M = I 177 6 DWTPer-D4 148 8 DWTPer-haar 146 6 idwtper-haar 146 6 Table 9 Convergence results with CGM: unpreconditioned (M = I) Matrix M = I DWTPer-haar idwtper-haar BCSSTK1 78 1 BCSSTK4 324 123 17 BCSSTK27 7 212 2 Plat362 427 242 23 6. Conclusion The notion of idwtper has been proposed. Using idwtper/cdwtper the given banded linear system when projected in to wavelet space takes a banded form. Taking advantage of the compression properties of wavelet transforms idwtper/cdwtper preconditioners based on Daubechies family of wavelets are designed. The proposed class of preconditioners are found to be efficient in accelerating the rate of convergence of iterative solvers likes Bi- CGSTAB,GMRES(k),BICG, CGS and CGM. They are successfully tested on several unsymmetric sparse linear systems from both Harwell Boeing and Tim davis matrix collection. Further they are tested on few non-linear problems including those from CFD context. Overall, preconditioner based on Incomplete Discrete Haar Wavelet Transform is relatively more effective in most test cases. References [1] M. Benzi, M. Tuma, A sparse approximate inverse preconditioner for nonsymmetric linear systems, SIAM J. Sci. Comput. 19 (1998) 141 183. [2] T.F. Chan, W.P. Tang, W.L. Wan, Wavelet sparse approximate inverse preconditioned, BIT 37 (1997) 644 66. [3] M. Grote, T. Huckle, Parallel preconditioning with sparse approximate inverses, SIAM J. Sci. Comput. 18 (3) (1997) 838 83. [4] K. Chen, Discrete wavelet transforms accelerated sparse preconditioners for dense boundary element systems, Elec. Trans. Numer. Anal. 8 (1999) 138 13. [] J. Ford, K. Chen, L. Scales, A new wavelet transform preconditioner for iterative solution of elastohyrodynamic lubrication problems, Int. J. Comput. Math. 7 (2) 497 13.

224 B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2) 23 224 [6] J. Ford, K. Chen, Wavelet-based preconditioners for dense matrices with Non-smooth local features, J. Numer. Math. 41 (2) (21) 282 37. [7] O.M. Nilsen, Wavelets in scientific Computing, Ph.D. thesis, Technical University of Denmark, Lyngby, 1998. [8] S.G. Mallat, A theory for multiresolution signal decomposition: the wavelet representation, IEEE Trans. PAMI 11 (7) (1989) 674 693. [9] I.S. Duff, R. Grimes, J.G. Lewis, UsersÕs Guide for the Harwell-Boeing Sparse Matrix Collection, Technical REport RAL-92-86, Rutherfold Appleton Laboratory, Chilton, UK, 1992. [1] Z. Zlatev, Computational Methods for General Sparse Matrices, Kluwer, Dordrecht, the Netherlands, 1991. [11] D. Miller, Wavelet transforms and linear algebra, M.Sc. dissertation, University of Liverpool, UK, 199.