AMS Mathematics Subject Classification : 65F10,65F50. Key words and phrases: ILUS factorization, preconditioning, Schur complement, 1.

Similar documents
On the Preconditioning of the Block Tridiagonal Linear System of Equations

A SPARSE APPROXIMATE INVERSE PRECONDITIONER FOR NONSYMMETRIC LINEAR SYSTEMS

Jae Heon Yun and Yu Du Han

Solving Large Nonlinear Sparse Systems

Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices

A Method for Constructing Diagonally Dominant Preconditioners based on Jacobi Rotations

9.1 Preconditioned Krylov Subspace Methods

Preface to the Second Edition. Preface to the First Edition

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

Jordan Journal of Mathematics and Statistics (JJMS) 5(3), 2012, pp A NEW ITERATIVE METHOD FOR SOLVING LINEAR SYSTEMS OF EQUATIONS

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners

INCREMENTAL INCOMPLETE LU FACTORIZATIONS WITH APPLICATIONS TO TIME-DEPENDENT PDES

A generalization of the Gauss-Seidel iteration method for solving absolute value equations

The solution of the discretized incompressible Navier-Stokes equations with iterative methods

Optimal Iterate of the Power and Inverse Iteration Methods

Linear Solvers. Andrew Hazel

Multilevel Preconditioning of Graph-Laplacians: Polynomial Approximation of the Pivot Blocks Inverses

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers

Generalized AOR Method for Solving System of Linear Equations. Davod Khojasteh Salkuyeh. Department of Mathematics, University of Mohaghegh Ardabili,

Numerical Methods I Non-Square and Sparse Linear Systems

Preconditioning Techniques Analysis for CG Method

ON THE GLOBAL KRYLOV SUBSPACE METHODS FOR SOLVING GENERAL COUPLED MATRIX EQUATIONS

The flexible incomplete LU preconditioner for large nonsymmetric linear systems. Takatoshi Nakamura Takashi Nodera

Incomplete Cholesky preconditioners that exploit the low-rank property

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

M.A. Botchev. September 5, 2014

Chapter 7 Iterative Techniques in Matrix Algebra

Iterative Methods for Solving A x = b

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Scientific Computing

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU

AN ITERATIVE METHOD TO SOLVE SYMMETRIC POSITIVE DEFINITE MATRIX EQUATIONS

ANONSINGULAR tridiagonal linear system of the form

In order to solve the linear system KL M N when K is nonsymmetric, we can solve the equivalent system

Iterative Methods for Sparse Linear Systems

BLOCK ILU PRECONDITIONED ITERATIVE METHODS FOR REDUCED LINEAR SYSTEMS

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Robust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

FEM and sparse linear system solving

A Block Compression Algorithm for Computing Preconditioners

Scientific Computing WS 2018/2019. Lecture 9. Jürgen Fuhrmann Lecture 9 Slide 1

ON A SPLITTING PRECONDITIONER FOR SADDLE POINT PROBLEMS

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

ENHANCING PERFORMANCE AND ROBUSTNESS OF ILU PRECONDITIONERS BY BLOCKING AND SELECTIVE TRANSPOSITION

Iterative Methods. Splitting Methods

ON THE GENERALIZED DETERIORATED POSITIVE SEMI-DEFINITE AND SKEW-HERMITIAN SPLITTING PRECONDITIONER *

Lecture 18 Classical Iterative Methods

Solving PDEs with CUDA Jonathan Cohen

PROJECTED GMRES AND ITS VARIANTS

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

Efficient smoothers for all-at-once multigrid methods for Poisson and Stokes control problems

Fine-grained Parallel Incomplete LU Factorization

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Structure preserving preconditioner for the incompressible Navier-Stokes equations

Stabilization and Acceleration of Algebraic Multigrid Method

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)

CONVERGENCE OF MULTISPLITTING METHOD FOR A SYMMETRIC POSITIVE DEFINITE MATRIX

Lab 1: Iterative Methods for Solving Linear Systems

Nested splitting CG-like iterative method for solving the continuous Sylvester equation and preconditioning

The Conjugate Gradient Method

Multigrid absolute value preconditioning

Parallel Iterative Methods for Sparse Linear Systems. H. Martin Bücker Lehrstuhl für Hochleistungsrechnen

Incomplete factorization preconditioners and their updates with applications - I 1,2

Algebraic Multigrid as Solvers and as Preconditioner

Section 5.6. LU and LDU Factorizations

6.4 Krylov Subspaces and Conjugate Gradients

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

9. Iterative Methods for Large Linear Systems

Mathematics and Computer Science

An Assessment of Incomplete-LU Preconditioners for Nonsymmetric Linear Systems 1

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

ON A GENERAL CLASS OF PRECONDITIONERS FOR NONSYMMETRIC GENERALIZED SADDLE POINT PROBLEMS

A parameter tuning technique of a weighted Jacobi-type preconditioner and its application to supernova simulations

Numerical Methods - Numerical Linear Algebra

Linear algebra issues in Interior Point methods for bound-constrained least-squares problems

Iterative methods for Linear System

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

Computational Methods. Systems of Linear Equations

Course Notes: Week 1

Finding Rightmost Eigenvalues of Large, Sparse, Nonsymmetric Parameterized Eigenvalue Problems

Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications

A Review of Preconditioning Techniques for Steady Incompressible Flow

THE solution of the absolute value equation (AVE) of

Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning

Exploiting hyper-sparsity when computing preconditioners for conjugate gradients in interior point methods

Further experiences with GMRESR

MATH 590: Meshfree Methods

Today s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

ELA THE MINIMUM-NORM LEAST-SQUARES SOLUTION OF A LINEAR SYSTEM AND SYMMETRIC RANK-ONE UPDATES

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

CONVERGENCE BOUNDS FOR PRECONDITIONED GMRES USING ELEMENT-BY-ELEMENT ESTIMATES OF THE FIELD OF VALUES

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

Transcription:

J. Appl. Math. & Computing Vol. 15(2004), No. 1, pp. 299-312 BILUS: A BLOCK VERSION OF ILUS FACTORIZATION DAVOD KHOJASTEH SALKUYEH AND FAEZEH TOUTOUNIAN Abstract. ILUS factorization has many desirable properties such as its amenability to the syline format, the ease with which stability may be monitored, and the possibility of constructing a preconditioner with symmetric structure. In this paper we introduce a new preconditioning technique for general sparse linear systems based on the ILUS factorization strategy. The resulting preconditioner has the same properties as the ILUS preconditioner. Some theoretical properties of the new preconditioner are discussed and numerical experiments on test matrices from the Harwell- Boeing collection are tested. Our results indicate that the new preconditioner is cheaper to construct than the ILUS preconditioner. AMS Mathematics Subject Classification : 65F10,65F50. Key words and phrases: ILUS factorization, preconditioning, Schur complement, syline format. 1. Introduction Central to many scientific and engineering problems is the solution of large sparse linear system of equations of the form Ax = b, (1) where A is a matrix of dimension N and usually nonsymmetric and unstructured. It is now accepted that, for solving very large sparse linear systems, iterative methods are becoming the method of choice, due to their more favorable memory and computational costs, comparing to the direct solution methods based on Gaussian elimination. One drawbac of many iterative methods is their lac of robustness, i.e., an iterative method may not yield to an acceptable solution for a given problem. A common strategy to enhance the robustness of iterative methods is to exploit preconditioning techniques. However, most Received May 8, 2003. Revised October 8, 2003. c 2004 Korean Society for Computational & Applied Mathematics and Korean SIGCAM. 299

300 Davod Khojasteh Saluyeh and Faezeh Toutounian robust preconditioners are derived from certain type of incomplete LU factorizations of the coefficient matrix. It can be observed experimentally that ILU factorization may produce L and U factors such that the norm (LU) 1 is very large. The long recurrences associated with solving with these factors are unstable [3,6,7], producing solutions with extremely large components. A sign of this severely poor preconditioning is the erratic behavior of the iterative method, for example, divergence of the iterations due to large numerical errors. One possible remedy for determining in advance whether or not a factorization will fail due instability is to estimate a norm of (LU) 1 in some way. In [6], E. Chow and Y. Saad have proposed the ILUS factorization which has many desirable properties, such as its amenability to the Syline Format, the ease with which stability may be monitored, and the possibility of constructing a preconditioner with symmetric structure. In this paper we show that how to use the ILUS preconditioner strategy in bloc form and to construct BILUS factorization which has the same properties as the ILUS preconditioner. Our results indicate that the new preconditioner is cheaper to construct than the ILUS preconditioner. This paper is organized as follows. In section 2, we give a brief description of ILUS preconditioner. In section 3, we introduce a bloc version of incomplete LU preconditioner in sparse syline format and describe some of its theoretical properties. In section 4, BILUS factorization will be applied to the bloc-tridiagonal matrix and we observe, in this case, the BILUS Algorithm is a variant of a general bloc ILU factorization. In section 5, we consider the use of preconditioning on test matrices from the Harwell-Boeing collection. Some concluding remars are given in section 6. 2. ILUS factorization In this section, we briefly review the ILUS factorization proposed in [6]. In ILUS factorization, the sequence of matrices ( ) A v A +1 =, (2) w α +1 where A n = A, is considered. If A is nonsingular and its LDU factorization A = L D U, (3) is already available, then the LDU factorization of A +1 is as following ( ) ( ) ( ) L 0 D 0 U z A +1 =, (4) y 1 0 d +1 0 1 in which z = D 1 L 1 v, (5) y = w U 1 D 1, (6) d +1 = α +1 y D z. (7)

A bloc version of ILUS factorization 301 Hence, each row and column of the factorization can be obtained by solving two unit lower triangular systems and computing a scaled dot product. Since a sparse approximate solution is often required in the preconditioning, the ILU factorization based on this approach will consist of two approximate sparse linear system solutions and a sparse dot product. There are a number of ways to compute the sparse approximations required in (5) and (6), for example see [5, 8, 11, 12, 13, 14]. One technique of approximation proposed in [6] is to use the truncated Neumann series z = D 1 L 1 v = D 1 (I + E + E 2 + + E p )v (8) in which E = I L and p is a small natural number. Note that the matrices E are never formed and that the series is evaluated with Horner s rule and that the vector E j v should be computed in sparse-sparse mode. If the number of nonzero elements in z exceeds the fill-in tolerance lfil, then some of the fill-in elements must be dropped according to some strategy. As mentioned in [6], one advantage of the ILUS factorization strategy is that it can estimate L 1 and U 1 easily and determine the stability of the L and U factors. When instability has been detected, e.g., when a norm estimate exceeds some stable limit norm, the ILUS factorization code exits and indicates that the solver should switch to another preconditioner, or restart ILUS with more allowed fill-in, or attempt to use the other strategies which have been described in [6]. The ILUS Algorithm can be summarized as follows. Algorithm 1. ILUS 1. Set D 1 = a 11, L 1 = U 1 = 1 2. For = 1,..., n 1 Do: 3. Compute a sparse z D 1 L 1 v 4. Compute a sparse y w U 1 D 1 5. Compute d +1 := α +1 y D z 6. Form L +1, D +1 and U +1 via (4) 7. Estimate L 1 1 +1 and U+1 and exit if either exceeds some limit 8. EndDo. 3. BILUS method Let us consider the sequence of matrices ( ) A V A +1 =, = 1,, l 1, (9) W +1

302 Davod Khojasteh Saluyeh and Faezeh Toutounian where A l = A, V IR s m, W IR m s and +1 IR m m in which A IR s s and m n. If A is nonsingular and its LDU factorization A = L D U, (10) is already available, then the LDU factorization of A +1 is as following ( ) ( ) ( ) L 0 D 0 U Z A +1 =, (11) Y L 0 D 0 U in which and Y = W U 1 D 1, (12) Z = D 1 L 1 V, (13) L D U = +1 Y D Z R. (14) So, we can obtain the matrices Y and Z by solving two unit lower systems with multiple right hand sides W T and V, respectively. The matrices L, D, and U can be obtained by computing the LDU factorization of R. Hence, by this way, we can obtain the LDU factorization of matrix A if the LDU factorization of all intermediate R exist. For maing an incomplete LDU factorization, which we call BILUS (Bloc version of ILUS factorization), we can solve the systems (12) and (13) approximately. We can also use an incomplete (or exact) LDU factorization algorithm for (14). For example, we can use the Algorithm 1 for this purpose. One of the advantages of BILUS over ILUS is that the computation of the rows of Y and columns of Z can be done in parallel and this can save much of CPU time. Moreover, as the ILUS factorization, the BILUS factorization strategy is able to estimate L 1 and U 1 easily and to determine the stability of L and U factors. If for the lower triangular factor, we use as in [6], the infinity norm bound L 1 e, where e is a vector of all ones, the solution and norm of L 1 +1e may be updated easily from L 1 e. For the upper triangular factor, as in [6], we can use the infinity norm for its transpose which can be estimated easily. As ILUS, when instability has been detected, an appropriate strategy should be used. Numerical results show that the new preconditioner is cheaper to construct than the ILUS preconditioner. Furthermore, the new technique insures convergence rates of the preconditioned iteration which are comparable with those obtained with ILUS preconditioners. A setch of BILUS Algorithm can be written as follows. Algorithm 2. BILUS 1. Set A 1 = the submatrix of A consisting of the first m 1 rows and columns, and compute an approximate (or exact) LDU factorization of A 1, i.e., A 1 L 1 D 1 U 1 (or A 1 = L 1 D 1 U 1 ) 2. For = 1,..., l 1 Do:

A bloc version of ILUS factorization 303 3. Compute a sparse Y W U 1 4. Compute a sparse Z D 1 L 1 D 1 V 5. Set R = +1 Y D Z 6. Compute an incomplete (or exact) LDU factorization of R, i.e., R L D U (or R = L D U ) 7. Form L +1, D +1 and U +1 via (11) 8. Estimate L 1 some limit 9. EndDo. +1 and U 1 +1 and exit if either exceeds 3.1. Analysis In this section we first show that the exact bloc version of LDU factorization as described above exists if A is a SPD matrix or M-matrix. Then, we derive the conditions which guarantee that the matrices R will be nonsingular and the BILUS Algorithm with pivoting will not brea down. Definition 1. A IR N N is an M-matrix if a ij 0 for all i j, A is nonsingular and A 1 0. The following lemmas which have been proved in [1], will be useful for later applications. Lemma 1. Let A be an M-matrix that is partitioned in bloc matrix form, A = (A ij ), where A ii are square matrices. Then the matrices A ii on the diagonal of A are M-matrices. Lemma 2. Let A be an M-matrix that is partitioned in two-by-two bloc form, i.e., ( ) B E A =, (15) F C where B is square matrix. Then the Schur complement exists and itself is an M-matrix. Now we state and prove the following lemma. S = C F B 1 E (16)

304 Davod Khojasteh Saluyeh and Faezeh Toutounian Lemma 3. Let A be a SPD matrix (M-matrix ) and all steps of the Algorithm 2 are done exactly. Then, the BILUS Algorithm will not brea down. Proof. By substituting Z and Y in relation (14), we have R = +1 W A 1 V. So, R is the Schur complement of A +1 and by lemma 2 is a SPD matrix (Mmatrix), because A +1 is a leading principal submatrix of A and a SPD matrix (M-matrix). Therefore the LDU factorization of R exists. Now, we consider the existence of the BILUS factorization. As we now when R is nonsingular, it is possible to obtain the LDU factorization of P R for some permutation matrix P. So, it is enough to find some conditions which imply the invertibility of R. Let  = L D U be the incomplete LDU factorization of A which is generated by Algorithm 2. By P and Q, we denote the residuals obtained at steps 3 and 4 of the Algorithm 2, respectively, i.e., P = W Y D U, Q = V L D Z. Hence, from equation (11) we see that the Algorithm 2 (at step ) generates an incomplete LDU factorization of intermediate matrix ( ) ( ) L D Ω +1 U L D Z  = V Q, Y D U +1 W P +1 where Y and Z are obtained at steps 3 and 4 of the Algorithm 2, respectively. Furthermore, we note that R is the Schur complement of Ω +1. So, the invertibility of Ω +1 results the invertibility of R. By defining the matrix E +1 A +1 Ω +1 = ( A  Q P 0 We observe that if A +1 is nonsingular and E +1 < A 1 +1 1 for some matrix norm., then Ω +1 is nonsingular, too (see [10], page 218). So, if the steps 3 and 4 of the Algorithm 2 are done with enough accuracy and the norm of A  be sufficiently small then R is nonsingular. When R is nonsingular, it is possible to obtain an incomplete or exact LDU factorization of P R for some permutation matrix P. Hence, an incomplete bloc LDU factorization of P +1 A +1 can be obtained for permutation matrix ( ) I 0 P +1 =, 0 P since, P R = P +1 ( P Y )D Z. ).

A bloc version of ILUS factorization 305 Here, we note that the dimension of the identity matrix I is the same as the dimension of A. In order to have a nonsingular matrix P +1 Â +1 with enough accuracy, it is better to perform the exact LDU factorization of P R. This computation will not be expensive since the order of matrix R is small (m N). Finally, as we observe, the nonsingularity of the matrices A, = 1,..., l, and the computation with enough accuracy guarantee that the BILUS Algorithm with pivoting will not brea down. 4. BILUS in a special case Consider the bloc-tridiagonal matrix bloced in the form G 1 E 2 F 2 G 2 E 3 A =.......... (17) F l 1 G l 1 E l F l G l Let G be bloc-diagonal matrix consisting of the diagonal blocs G i, L the bloc strictly-lower triangular matrix consisting of the sub-diagonal blocs F i, and U the bloc strictly-upper triangular matrix consisting of the super-diagonal blocs E i. Then, A is of the form A = L + G + U. First, we investigate the exact LDU factorization of A. Consider the sequence of matrices G 1 E 2 F 2 G 2 E 3 A +1 =........., = 1,..., l 1, (18) F G E +1 F +1 G +1 with A 1 = G 1. By letting +1 = G +1, W = (0 0 0 F +1 ), V = (0 0 0 E T +1) T, and a little computation we can rewrite (14) as R = L D U = G +1 W A 1 V = G +1 F +1 (L 1 D 1 U 1 ) 1 E +1. (19)

306 Davod Khojasteh Saluyeh and Faezeh Toutounian Therefore, by defining Λ 1 = G 1 = L 0 D 0 U 0 and Λ = L 1 D 1 U 1 for = 2,..., l, we have Λ +1 = G +1 F +1 Λ 1 E +1, for = 1,..., l 1. By the above notations and expressions it can be easily seen where A = (L + Λ)Λ 1 (Λ + U), (20) Λ = diag(λ 1,, Λ l ). Relation (20) shows that, by computing some approximation of matrices Λ, = 1,..., l, it is possible to obtain an incomplete bloc LDU factorization of matrix A. Algorithm 3 gives a sparse approximation matrix X of matrix Λ for = 1,..., l. Algorithm 3. 1. Compute an incomplete LDU factorization of G 1, i.e., G 1 L 0 D 0 U 0 = X 1 2. For = 1,..., l 1 Do: 3. Compute a sparse F +1 F +1 U 1 1D 1 1 4. Compute a sparse E +1 D 1 1L 1 1E +1 5. Set X +1 = G +1 F +1 D 1 E +1 6. Compute an incomplete LDU factorization of X +1, i.e., X +1 L D U 7. EndDo. As we now [2,4], based on the formula (20) a general incomplete LDU factorization for the bloc-tridiagonal matrix (17) taes the following form. Set Z 1 = G 1 and X 1 = approx 2 (Z 1 ). For = 1,..., l 1, compute and let Z +1 = G +1 F +1 approx 1 (Z 1 )E +1, X +1 = approx 2 (Z +1 ). Then the bloc ILU factorization matrix is defined to be where à = (L + Λ) Λ 1 ( Λ + U), Λ = diag(x 1,..., X l ). The role of approx 1 (.) is to control a prespecified sparsity structure of the approximate Z, and the approx 2 (.) is meant to either control a prescribed sparsity

A bloc version of ILUS factorization 307 pattern of X and hence mae them easily factored or if the blocs X 1 are explicitly formed, mae their application to a vector easily computed. The main difference between the BILUS factorization for a bloc-tridiagonal matrix and above ILU factorization scheme can be easily seen if we rewrite the Algorithm 3 in the following form. Set Z 1 = G 1 = L 1 D 1 U 1, X 1 = approx 4 (Z 1 ) = approx 4 (L 1 D 1 U 1 ), and for = 1,..., l 1, compute Z +1 = G +1 approx 3 (F +1 X 1 E +1) X +1 = approx 4 (Z +1 ) = approx 4 (L +1 D +1 U +1 ), In this case, the role of approx 4 (.) is to control the prescribed sparsity patterns of L and U, and the approx 3 (.) is as lie as approx 1 (.), to control a prescribed sparsity pattern of the approximate of Z. As we observed, for a bloc tridiagonal matrix A, the above Algorithm is a variant of BILUS Algorithm. 5. Numerical examples For one of our experiments we consider the equation u + x (ex u) xu = f(x, y), (x, y) Ω = (0, 1) (0, 1) (21) Discretizing (21) on an n x n y grid, by using the second order centered differences for the Laplacian and centered difference for x, gives a linear system of equations of order N = n x n y. In our test, we tae n x = n y = 32, so this yields a matrix of order N = 1024 which is an M-matrix [9]. The boundary conditions are taen so that the exact solution of the system is x = [1,, 1] T. We name this example by F2DA. Also, we use some matrices from Harwell-Boeing collection. These matrices with their properties are shown in Table 1. Table 1 matrix name order symmetric positive definite NOS3 960 yes yes NOS5 468 yes yes GR-30-30 900 yes yes FIDAP001 216 yes no SHERMAN4 1104 no no CANITY05 1182 no no CANITY06 1182 no no CANITY07 1182 no no CANITY08 1182 no no

308 Davod Khojasteh Saluyeh and Faezeh Toutounian For these examples, the right hand side of the systems is taen such that the exact solution is x = [1,, 1] T. The split preconditioned conjugate gradient method and left preconditioned GMRES(10) method are used for solving the systems. For all the examples we used the stopping criterion b Ax i 2 10 6. For preserving sparsity we used two following strategies[6,13]. 1. The truncated Neumann series described in section 2 for a small p. 2. Dropping entries of z i, columns of Z i, (y j, rows of Y j ) (i.e., replaced by zero) which are less than the relative tolerance τ i (γ j ) obtained by multiplying τ by the original 2-norm of the i-th row (j-th column) of A corresponding to z i (y j ). Numerical results are shown in Tables 2 and 3 without running the steps 3 and 4 of BILUS Algorithm in parallel. For each example, the system (1) was solved for various m. Column 4 represents the number of iterations. Columns 5 and 6 present the CPU time for computing the preconditioner M = LDU and solving the systems with preconditioning, respectively. The total of CPU times are shown in column 7. Note that the case m = 1 corresponds to the ILUS method. Finally, PCG and PGMRES stand for Preconditioned CG [13] and GMRES methods [13,15], respectively. Table 2 PCG CPU time(s) matrix p, τ m iterations precon. solve total 1 24 130.167 3.876 134.043 NOS3 2, 10 5 20 23 9.594 3.725 13.319 40 23 6.429 3.956 10.385 1 18 14.982 0.741 15.723 NOS5 2, 10 7 6 17 3.245 0.691 3.936 12 15 1.993 0.630 2.623 1 8 106.473 1.182 107.655 GR-30-30 2, 10 5 20 8 7.961 1.152 9.113 50 7 5.067 1.011 6.078 The columns 4 and 6 of Tables 2 and 3 show that the results of BILUS factorization is as good as ILUS factorization. The column 5 of the Tables show that the CPU time of maing BILUS preconditioner is much smaller than that of maing ILUS preconditioner and decreases more slowly with increasing m. The columns 7 show that the total CPU time of BILUS is always much less than that of ILUS preconditioner and decreases more slowly with increase m. The results for CAVITY8 (Table 3 sign ) show that for m = 1, 6 we have no solution after 500 iterations but for m = 12 we have the solution after 15 iterations. In fact for m = 1, 6 an instability has been detected since L 1 e is equal to 6.7 10 5, 4.6 10 3, 3.1 10 3 for m = 1, 6, 12, respectively. Figure 1 shows

A bloc version of ILUS factorization 309 the required CPU time for computing three preconditioners as a function of m for matrices SHERMAN4, F2DA and CAVITY05 and it shows that the rate of decrease slows down after m sufficiently increases. So we can conclude that the BILUS Algorithm is a robust technique for constructing a good preconditioner. The column 7 show that the total CPU time of BILUS is always much less than that of ILUS preconditioner and decreases more slowly with increase m. Table 3 PGMRES(10) CPU time(s) matrix p, τ m iterations precon. solve total 1 2 1.392 0.261 1.653 FIDAP001 2, 10 5 6 2 0.651 0.251 0.902 12 2 0.571 0.266 0.831 1 8 186.548 19.008 205.556 SHERMAN4 1, 10 3 23 8 11.536 18.858 30.394 46 7 7.281 17.004 24.285 1 4 148.223 15.602 163.825 F2DA 1, 10 2 8 4 21.361 14.921 36.282 16 4 12.027 14.942 26.969 1 4 246.714 11.116 257.830 CAVITY05 3, 10 2 6 4 50.904 11.157 62.060 12 4 28.751 11.126 39.877 1 5 246.965 13.810 260.775 CAVITY06 3, 10 2 6 5 51.043 13.820 64.863 12 5 28.841 13.810 42.651 1 18 246.985 48.300 295.285 CAVITY07 3, 10 2 6 7 51.113 19.248 70.361 12 7 29.192 18.937 48.129 1 247.496 CAVITY08 3, 10 2 6 51.173 12 15 28.891 40.439 69.330 The results for CAVITY8 (Table 3 sign ) show that for m = 1, 6 we have no solution after 500 iterations but for m = 12 we have the solution after 15 iterations. In fact for m = 1, 6 an instability has been detected since L 1 e is equal to 6.7 10 5, 4.6 10 3, 3.1 10 3 for m = 1, 6, 12, respectively. Figure 1 shows the required CPU time for computing three preconditioners as a function of m for matrices SHERMAN4, F2DA and CAVITY05 and it shows that the rate of decrease slows down after m sufficiently increases. So we can conclude that the BILUS Algorithm is a robust technique for constructing a good preconditioner.

310 Davod Khojasteh Saluyeh and Faezeh Toutounian Figure 1. CPU times for computing three preconditioners as a function of m for matrices SHERMAN4, F2DA and CAVITY05 6. Conclusion We have proposed a new preconditioner for general sparse linear systems based on the ILUS factorization strategy. The constructed preconditioner has the same properties as the ILUS preconditioner. It can estimate L 1 and U 1 easily and to determine the stability of the L and U. One of the advantages of BILUS over ILUS is that the computations can be done in parallel and this can save the CPU time. Our results indicated that the new preconditioner is cheaper to construct than the ILUS preconditioner and, in addition, it retains the efficiency and robustness of ILUS preconditioner. 7. Acnowledgements The authors are grateful to referee for his/her comments which substantially improved the quality of this paper. References [1] O. Axelsson, Iterative solution methods, Cambridge University Press, Cambridge, 1996. [2] O. Axelsson and B. Polman, On approximate factorization methods for bloc matrices suitable for vector and parallel processors, Numerical Linear Algebra with applications, Vol. 77(1986), 3-26. [3] A. M. Bruaset, A. Tveito and R. Winther, On the stability of relaxed incomplete LU factorization, Math. Comp., Vol. 54(1990), 701-719. [4] T. F. Chan, P. S. Vassilevsi, A framewor for bloc ILU factorizations using bloc-size reduction, Mathematics of Computation, Vol. 64(1995), 129-156. [5] E. Chow and Y. Saad, Approximate inverse preconditioners via sparse-sparse iterations, SIAM J. Sci, Comput., Vol. 19(1998), 995-1023, 1998. [6] E. Chow and Y. Saad, ILUS: an incomplete LU preconditioner in sparse syline format, International Journal of Numerical Methods in Fluids, Vol. 25(1997), 739-748. [7] H. C. Elman, A stability analysis of incomplete LU factorization, Math. Comp., Vol. 47(1986), 191-217. [8] M. J. Grote and T. Hucle, Parallel preconditioning with sparse approximate inverses, SIAM J. Sci. Comput., Vol. 18(1997), 838-853. [9] A. Kerayechian, D. Khojasteh Saluyeh, On the existence, uniqueness and approximation of a class of elliptic problems, International Journal of Applied Mathematics, Vol. 11(2002), No.1, 49-60. [10] D. Kincaid, W. Cheney,Numerical Analysis, Broos/Cole Publishing Company, 1996. [11] L. Y. Kolotilina and A. Y. Yeremin, Factorized sparse approximate inverse preconditioning I. Theory, SIAM J. Matrix Anal. apll., 14(1993), 45-58.

A bloc version of ILUS factorization 311 [12] Y. Saad, ILUT: a dual thereshold incomplete LU factorization, Numerical Linear Algebra with applications, 1(1994), 387-402. [13] Y. Saad, Iterative Methods for Sparse linear Systems, PWS press, New Yor, 1995. [14] Y. Saad, Preconditioned Krylov subspace methods for CFD applications, in : W. G. Habashi, ed., solution techniques for Large-Scale CFD problems, Wiley, New Yor, 1995 139-158. [15] Y. Saad and M. H. Schultz, GMRES, A generalized minimal residual algorithm for nonsymmetric linear systems, SIAM J. Sci. Statist. Comput., 7(1986), 856-869. Davod Khojasteh Saluyeh received his B.Sc from Sharif University of Technology, Tehran, Iran and his M.Sc from Ferdowsi University of Mashhad, Mashhad, Iran. At present he is writing his Ph.D thesis under supervision of professor Faezeh Toutounian at Ferdowsi University of Mashhad. His research interests are mainly iterative methods for sparse linear systems and finite element method. Department of Mathematics, School of Mathematical Sciences, Ferdowsi University of Mashhad, P.O. Box 91775-9177948953, Mashhad, Iran. e-mail: hojaste@math.um.ac.ir Faezeh Toutounian received her B.Sc in Mathematics from Ferdowsi University of Mashhad, Iran, two degree of M.Sc in Mathematical statistics and applied computer and her Ph.D in Mathematics from Paris VI University, France. She spent two sabbatical years in 1985 and 1996 at Paris VI University. She is currently a Professor of Mathematics at Ferdowsi University of mashhad. Her research interests are mainly numerical linear algebra, iterative methods and error analysis. Department of Mathematics, School of Mathematical Sciences, Ferdowsi University of Mashhad, P.O. Box 91775-9177948953, Mashhad, Iran. e-mail: toutouni@math.um.ac.ir