ADDITIVE SCHWARZ FOR SCHUR COMPLEMENT 305 the parallel implementation of both preconditioners on distributed memory platforms, and compare their perfo

Similar documents
Algebraic two-level preconditioners. for the Schur complement method. L. M. Carvalho, L. Giraud and P. Le Tallec

From Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D

XIAO-CHUAN CAI AND MAKSYMILIAN DRYJA. strongly elliptic equations discretized by the nite element methods.

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems

Iterative versus direct parallel substructuring methods in semiconductor device modeling

Convergence Behavior of a Two-Level Optimized Schwarz Preconditioner

An additive average Schwarz method for the plate bending problem

Domain Decomposition Preconditioners for Spectral Nédélec Elements in Two and Three Dimensions

AN INTRODUCTION TO DOMAIN DECOMPOSITION METHODS. Gérard MEURANT CEA

Capacitance Matrix Method

18. Balancing Neumann-Neumann for (In)Compressible Linear Elasticity and (Generalized) Stokes Parallel Implementation

A MULTIGRID ALGORITHM FOR. Richard E. Ewing and Jian Shen. Institute for Scientic Computation. Texas A&M University. College Station, Texas SUMMARY

A Balancing Algorithm for Mortar Methods

33 RASHO: A Restricted Additive Schwarz Preconditioner with Harmonic Overlap

Solving Large Nonlinear Sparse Systems

Splitting of Expanded Tridiagonal Matrices. Seongjai Kim. Abstract. The article addresses a regular splitting of tridiagonal matrices.

S MALASSOV The theory developed in this paper provides an approach which is applicable to second order elliptic boundary value problems with large ani

Multispace and Multilevel BDDC. Jan Mandel University of Colorado at Denver and Health Sciences Center

20. A Dual-Primal FETI Method for solving Stokes/Navier-Stokes Equations

Multilevel and Adaptive Iterative Substructuring Methods. Jan Mandel University of Colorado Denver

Construction of a New Domain Decomposition Method for the Stokes Equations

MULTIPLICATIVE SCHWARZ ALGORITHMS FOR SOME NONSYMMETRIC AND INDEFINITE PROBLEMS. XIAO-CHUAN CAI AND OLOF B. WIDLUND y

Preface to the Second Edition. Preface to the First Edition

OUTLINE 1. Introduction 1.1 Notation 1.2 Special matrices 2. Gaussian Elimination 2.1 Vector and matrix norms 2.2 Finite precision arithmetic 2.3 Fact

for Finite Element Simulation of Incompressible Flow Arnd Meyer Department of Mathematics, Technical University of Chemnitz,

ETNA Kent State University

Luca F. Pavarino. Dipartimento di Matematica Pavia, Italy. Abstract

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method

Incomplete Block LU Preconditioners on Slightly Overlapping. E. de Sturler. Delft University of Technology. Abstract

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

Overlapping Schwarz preconditioners for Fekete spectral elements

Abstract. scalar steady and unsteady convection-diusion equations discretized by nite element, or nite

Optimal Interface Conditions for an Arbitrary Decomposition into Subdomains

A Nonoverlapping Subdomain Algorithm with Lagrange Multipliers and its Object Oriented Implementation for Interface Problems

Multilevel Preconditioning of Graph-Laplacians: Polynomial Approximation of the Pivot Blocks Inverses

the sum of two projections. Finally, in Section 5, we apply the theory of Section 4 to the case of nite element spaces. 2. Additive Algorithms for Par

Parallel Scalability of a FETI DP Mortar Method for Problems with Discontinuous Coefficients

Domain decomposition is a term used by at least two dierent communities. Literally, the words

Finite-choice algorithm optimization in Conjugate Gradients

The Deflation Accelerated Schwarz Method for CFD

New constructions of domain decomposition methods for systems of PDEs

INRIA. B.P. 105 Le Chesnay Cedex France. Moulay D. Tidriri. NASA Langley Research Center. Hampton VA Abstract

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors

0 Finite Element Method of Environmental Problems. Chapter where c is a constant dependent only on and where k k 0; and k k ; are the L () and H () So

Reduced Synchronization Overhead on. December 3, Abstract. The standard formulation of the conjugate gradient algorithm involves

A Balancing Algorithm for Mortar Methods

Substructuring Preconditioning for Nonconforming Finite Elements engineering problems. For example, for petroleum reservoir problems in geometrically

Balancing domain decomposition for mortar mixed nite element methods

arxiv: v1 [math.na] 11 Jul 2011

1. Fast Solvers and Schwarz Preconditioners for Spectral Nédélec Elements for a Model Problem in H(curl)

ETNA Kent State University

Parallel scalability of a FETI DP mortar method for problems with discontinuous coefficients

Robust Parallel Newton{Multilevel Methods. Bodo Heise. Institute for Mathematics, Johannes Kepler University Linz, A Linz, Austria.

C. Vuik 1 R. Nabben 2 J.M. Tang 1

Jae Heon Yun and Yu Du Han

The Conjugate Gradient Method

/00 $ $.25 per page

Incomplete Cholesky preconditioners that exploit the low-rank property

ON THE CONVERGENCE OF A DUAL-PRIMAL SUBSTRUCTURING METHOD. January 2000

Contents. Preface... xi. Introduction...

Dirichlet-Neumann and Neumann-Neumann Methods

A Robust Preconditioner for the Hessian System in Elliptic Optimal Control Problems

boundaries are aligned with T h (cf. Figure 1). The union [ j of the subdomain boundaries will be denoted by. Figure 1 The boundaries of the subdo

AMS Mathematics Subject Classification : 65F10,65F50. Key words and phrases: ILUS factorization, preconditioning, Schur complement, 1.

Auxiliary space multigrid method for elliptic problems with highly varying coefficients

Short title: Total FETI. Corresponding author: Zdenek Dostal, VŠB-Technical University of Ostrava, 17 listopadu 15, CZ Ostrava, Czech Republic

Nonoverlapping Domain Decomposition Methods with Simplified Coarse Spaces for Solving Three-dimensional Elliptic Problems

Domain Decomposition Algorithms for an Indefinite Hypersingular Integral Equation in Three Dimensions

Solving Updated Systems of Linear Equations in Parallel

Direct solution methods for sparse matrices. p. 1/49

Solution of the Two-Dimensional Steady State Heat Conduction using the Finite Volume Method

Structure preserving preconditioner for the incompressible Navier-Stokes equations

2 IVAN YOTOV decomposition solvers and preconditioners are described in Section 3. Computational results are given in Section Multibloc formulat

Lecture 17: Iterative Methods and Sparse Linear Algebra

SOME PRACTICAL ASPECTS OF PARALLEL ADAPTIVE BDDC METHOD

2 CAI, KEYES AND MARCINKOWSKI proportional to the relative nonlinearity of the function; i.e., as the relative nonlinearity increases the domain of co

A Numerical Study of Some Parallel Algebraic Preconditioners

SOLVING MESH EIGENPROBLEMS WITH MULTIGRID EFFICIENCY

Adaptive Coarse Space Selection in BDDC and FETI-DP Iterative Substructuring Methods: Towards Fast and Robust Solvers

BLOCK ILU PRECONDITIONED ITERATIVE METHODS FOR REDUCED LINEAR SYSTEMS

Robust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations

Some Geometric and Algebraic Aspects of Domain Decomposition Methods

A FINITE DIFFERENCE DOMAIN DECOMPOSITION ALGORITHM FOR NUMERICAL SOLUTION OF THE HEAT EQUATION

ETNA Kent State University

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

Selecting Constraints in Dual-Primal FETI Methods for Elasticity in Three Dimensions

Newton additive and multiplicative Schwarz iterative methods

Two-scale Dirichlet-Neumann preconditioners for boundary refinements

APPARC PaA3a Deliverable. ESPRIT BRA III Contract # Reordering of Sparse Matrices for Parallel Processing. Achim Basermannn.

Solving Ax = b, an overview. Program

From the Boundary Element DDM to local Trefftz Finite Element Methods on Polyhedral Meshes

Numerische Mathematik

Solving PDEs with Multigrid Methods p.1

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

Preconditioning Techniques Analysis for CG Method

Mathematics Research Report No. MRR 003{96, HIGH RESOLUTION POTENTIAL FLOW METHODS IN OIL EXPLORATION Stephen Roberts 1 and Stephan Matthai 2 3rd Febr

Multigrid and Domain Decomposition Methods for Electrostatics Problems

Preconditioning of elliptic problems by approximation in the transform domain

Multigrid and Iterative Strategies for Optimal Control Problems

Parallel Discontinuous Galerkin Method

Transcription:

35 Additive Schwarz for the Schur Complement Method Luiz M. Carvalho and Luc Giraud 1 Introduction Domain decomposition methods for solving elliptic boundary problems have been receiving increasing attention for the last two decades. To a large extent this is due to their potential application on new parallel computers. We describe here two domain decomposition algorithms based on nonoverlapping subregions for solving self-adjoint elliptic problems in two dimensions and we report on some experimental results. Both alternatives can be viewed as block diagonal preconditioners for the Schur complement matrix. The rst is the classical block Jacobi where all but one of the diagonal blocks are related to the interfaces, the remaining block is diagonal and corresponds to the cross-points. The second preconditioner introduces an overlap between the blocks by including the cross-points and their couplings in the diagonal blocks of the block Jacobi preconditioner. In this case, the block related to the cross-points is dropped. We will refer to the latter as Algebraic Additive Schwarz (AAS) since we sum the contributions of each block on the overlap. The nonzero sub-blocks of the Schur complement are dense matrices, we consider approximations which are inexpensive to construct and invert. The algebraic approximation of the interface operator is constructed by using the probing technique studied in [CM92] for the two-subdomain case and extended in [CMS92] for many subdomains. The proposed preconditioner belongs to the one-level type preconditioners as, for instance, Dirichlet-Neumann [BW86] and Neumann- Neumann [RT91]. Therefore, it does not implement any coarse grid component. Currently, most preconditioners include a coarse correction to propagate the error globally. We refer the reader to [BPS86, Smi90, Man93] for a detailed presentation of some of these preconditioners and to [SrG96, CM94] for complete surveys of domain decomposition methods. We should stress that the AAS approach, which rst appeared in a few numerical experiments in [GT93], improves the convergence rate of the wellknown block Jacobi on some model problems while retaining the same computational complexity. First, in Section 2, we describe the AAS preconditioner. In Section 3, we present Ninth International Conference on Domain Decomposition Methods Editor Petter E. Bjrstad, Magne S. Espedal and David E. Keyes c1998 DDM.org

ADDITIVE SCHWARZ FOR SCHUR COMPLEMENT 305 the parallel implementation of both preconditioners on distributed memory platforms, and compare their performance in Section 4. We conclude in Section 5. 2 The AAS Preconditioner We consider the following 2 nd order self-adjoint elliptic problem on an open polygonal domain included in IR 2 : (? @ (a(x; y) @u @x @x )? @ (b(x; y) @u) @y @y = f(x; y) in u = 0 on @; (2.1) where a(x; y); b(x; y) 2 IR 2 are positive functions on. We assume that the domain is partitioned into N nonoverlapping subdomains 1; : : : ; N with boundaries @1; : : : ; @ N. We discretise (2.1) by either nite dierences or nite elements resulting in a symmetric and positive denite linear system Au = f. Grouping the points corresponding to the interfaces between the subdomains (B) in the vector u B and the ones corresponding to the interior (I) of the subdomains in u I, we get the reordered problem: AII A IB ui fi = : (2.2) u B f B A T IB A BB Eliminating u I from the second block row of (2.2) leads to the following reduced equation for u B : Su B = f B? A T IB A?1 II f I ; where S = A BB? A T IB A?1 II A IB (2.3) is referred to as the Schur complement matrix. Let m[ B = ( E i ) [ V; (2.4) be a partition of the interface B into edge points, E i, (depicted with in Figure 1) and vertex points, V, ( points in Figure 1); E i can be written as E i = (@ j \ @ l )? V, with j 6= l. Let R Ei and R V denote the pointwise restriction maps from B onto the nodes on E i and on V, and let RE T i and RV T be the corresponding extension maps, respectively. The diagonal block associated with E i is dened by S ii = R Ei SR T E i ; i = 1; : : : ; m: (2.5) Let Sii ~ be an approximation for S ii ; i = 1; : : : ; m. Then, the approximate block Jacobi preconditioner, bj, can be represented by mx bj = R T E i ~ S?1 ii R Ei + R T V ~ S?1 V V R V : (2.6)

306 CARVALHO & GIRAUD Figure 1 Decomposition of the interface points (a) the block Jacobi and (b) the AAS. (a) (b) Let m[ B = ^E i (2.7) be another decomposition of B, where ^Ei = @ j \ @ l (see Figure 1 (b)). ^Ei and ^E j may overlap on one point belonging to V, so ^Ei \ ^Ej 6= ; for some i 6= j. Let ^R Ei denote the pointwise restriction map from B onto the nodes on ^Ei and ^RT Ei the corresponding extension map. We dene S ii = ^REi S ^RT Ei ; i = 1; : : : ; m: (2.8) Let ^Sii be an approximation for Sii ; i = 1; : : : ; m. Then, we consider the Algebraic Additive Schwarz (AAS) preconditioner dened as: mx AAS = ^R T E i ^S?1 ii ^R Ei (2.9) 3 Parallel Implementation We use the probe technique to construct Sii ~ and ^Sii [CM92]. Let P = [p i ] be a matrix whose columns are the probe vectors p i. Therefore, the probe technique can be applied by multiplying S with P. This matrix-matrix approach only requires two communications for the construction and is more ecient than a more classical approach based on a sequence of matrix-probing vector products. Moreover, the matrix-matrix approach allows the algorithm to overlap part of the communication with some computation. The implementation of ^RT ^S?1 Ei ii ^R Ei in AAS requires two extra neighbour-neighbour communications to exchange information on the cross-points; these communications are not needed for block Jacobi. The classical PCG requires two synchronisations to compute the inner products; we have implemented the variant proposed in [DER93] where only one synchronisation per iteration is needed.

ADDITIVE SCHWARZ FOR SCHUR COMPLEMENT 307 4 Experimental Results The elliptic problem (2.1) was discretised by the standard ve-point dierence stencil on an (nn) mesh. The approximations ~ Sii and ^Sii are tridiagonal matrices built using twelve probing vectors [CG97]. The direct solver for the solution of the local Dirichlet problems A ii ; i = 1 : : : N, is MA27 [DR83], while the tridiagonal matrices ~ Sii and ^S ii are factorised using LAPACK routines. The stopping criterion is to decrease the relative residual to less than 10?5. The grid is either uniform or nonuniform. The message passing library used is MPI. The parallel platform is a Cray T3D (IDRIS - France). In the parallel experiments, we have used as many processors as subdomains and the computations have been performed in 64 bit precision. Figure 2 Denition of the function a(:; :) on, the unit square of IR 2. a(.,.) = 10?3 a(.,.) = 10 3 a(.,.) = 10?3 a(.,.) = 10 3 a(.,.) = 10?3 a(.,.) = 10 3 Table 1 gives the number of iterations of the PCG. Table 2 gives the total parallel elapsed time, in seconds, for solving Ax = b; that is, the analysis and factorisation steps of MA27, the preconditioner construction, the PCG iterations, and the computation of the global solution by local back substitutions. Table 1 Number of iterations of the PCG on a 257 257 grid. AAS block Jacobi # subdomains Poisson Aniso. Nonunif. Poisson Aniso. Nonunif. 4 17 11 14 18 11 16 16 23 29 27 24 35 37 64 33 52 40 35 82 60 Table 1 shows that AAS performs better than block Jacobi in the presence of anisotropic phenomena; the increase in the number of iterations is less for AAS as the number of subdomains grows larger. We have observed such behaviour for a wide class of model test problems, but we only report two of these results here. For all the experiments the function b(:; :) has been set to one. In the rst example, the anisotropy is physically present in the denition of the problem through the function

308 CARVALHO & GIRAUD Table 2 Times (in sec.) on the Cray T3D for solving Ax = b on a 257 257 grid. AAS block Jacobi # subdomains Poisson Aniso. Nonunif. Poisson Aniso. Nonunif. 4 10.77 9.64 10.18 10.97 9.64 10.57 16 2.17 2.43 2.33 2.20 2.66 2.72 64 0.58 0.78 0.65 0.59 1.07 0.84 a(:; :) as dened in Figure 2, while in the second example the anisotropy is numerically introduced in the Poisson problem by the discretisation on a nonuniform grid. This grid is used to solve the drift-diusion equations [GT93] involved in MOSFET device simulation. As a rst remark, we indicate that the time per iteration is almost the same for PCG with both preconditioners; the neighbour-neighbour communications required by AAS do not penalise its performance on the Cray T3D. We can observe in Table 2 that the reduction in the number of PCG iterations is not directly reected in the time reduction. This is due to the signicant amount of time required by the MA27 analysis and the factorisation routines for the solution of the local problems A ii ; i = 1 : : : N, which are common to both preconditioners. These routines are responsible for 20% of the global solution time in the case of 64 subdomains. Consequently, the global time is not a linear function of the number of iterations but is an ane function; that is, the global parallel solution time is equal to the time for the symbolic and numerical factorisation of the local Dirichlet matrices, plus the time for the back solution (once the interface problem has been solved), plus the number of iterations times the time per iteration (which is the same for both preconditioners). Table 3 shows the performance of the parallel device simulation code, using PCG with either AAS or block Jacobi for the solution of the linear systems involved in the nonlinear solver. Both numerical and physical anisotropies are encountered and AAS performs better than block Jacobi. Table 3 Times in seconds and number of iterations for a continuation step of the MOSFET device simulation on a 257 257 grid. AAS block Jacobi # subdomains time # iterations time # iterations 4 857.6 1520 861.6 1543 16 214.3 2731 240.1 3422 64 67.5 4877 88.9 7300 The AAS preconditioner can easily be extended by using a larger overlap between the edges E i. Experimental results have shown that this does not signicantly

ADDITIVE SCHWARZ FOR SCHUR COMPLEMENT 309 accelerate the convergence for the MOSFET problem. In Table 4, we vary the size of the overlap and report the total number of linear iterations involved in the solution of the drift-diusion equations. Table 4 Iterations using preconditioners with overlap sizes 0 (block Jacobi), 1(AAS), 2 and 3 on a 129 129 grid for the MOSFET problem. # points in the overlap between the ^Ei # subdomains 0 1 3 5 4 1660 1693 1678 1671 16 3610 2842 2797 2720 64 8385 5177 5419 5599 5 Conclusions We have compared a so-called Algebraic Additive Schwarz preconditioner (AAS) with an approximate block Jacobi preconditioner for the Schur complement domain decomposition method for solving elliptic PDEs. We have shown that AAS performs better both in terms of number of iterations and in computing time, for problems with anisotropic phenomena. This behaviour has been illustrated when solving linear systems that arise from model problems and from the real life application of device modelling. Although AAS does not introduce a global coupling among the subdomains, the growth of the number of iterations is slower than the one observed with block Jacobi, when the number of subdomains increases. AAS can be a cheap alternative to improve the simple block Jacobi preconditioner when it is dicult to dene or to implement a coarse problem in the preconditioner applied to the Schur complement system. For a detailed presentation of this work we refer to [CG97]. Acknowledgement We would like to thank the referee whose comments helped to improve the presentation of this paper. REFERENCES [BPS86] Bramble J., Pasciak J., and Schatz J. (1986) The construction of preconditioners for elliptic problems by substructuring I. Math. Comp. 47(175): 103{134. [BW86] Bjrstad P. E. and Widlund O. B. (1986) Iterative methods for the solution of elliptic problems on regions partitioned into substructures. SIAM Journal on

310 CARVALHO & GIRAUD Numerical Analysis 23(6): 1093{1120. [CG97] Carvalho L. M. and Giraud L. (1997) Parallel domain decomposition for device modelling ( in preparation). Technical report, CERFACS, Toulouse, France. [CM92] Chan T. F. and Mathew T. P. (1992) The interface probing technique in domain decomposition. SIAM J. Matrix Analysis and Applications 13. [CM94] Chan T. F. and Mathew T. P. (1994) Domain Decomposition Algorithms, volume 3 of Acta Numerica, pages 61{143. Cambridge University Press, Cambridge. [CMS92] Chan T., Mathew T., and Shao J.-P. (1992) Fourier and probe variants of the vertex space domain decomposition algorithm. In Keyes D., Chan T., Meurant G., Scroggs J., and Voigt R. (eds) Fifth International Symposium on Domain Decomposition Methods for Partial Dierential Equations, pages 236{249. SIAM, Phil. [DER93] D'Azevedo E., Eijkhout V., and Romine C. (1993) Reducing communication costs in the conjugate gradient algorithm on distributed memory multiprocessors. Technical Report CS-93-185, Computer Science Department, University of Tennessee, Knoxville. [DR83] Du I. S. and Reid J. K. (1983) The multifrontal solution of indenite sparse symmetric linear systems. TOMS 9: 302{325. [GT93] Giraud L. and Tuminaro R. (1993) Domain decomposition algorithms for the drift-diusion equations. In Sincovec R., Keyes D., Leuze M., Petzold L., and Reed D. (eds) Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientic Computing, pages 719{726. SIAM. [Man93] Mandel J. (1993) Balancing domain decomposition. Comm. Numer. Meth. Engrg. 9: 233{241. [RT91] Roeck Y.-H. D. and Tallec P. L. (1991) Analysis and test of a local domain decomposition preconditioner. In Glowinski R., Kuznetsov Y., Meurant G., Periaux J., and Widlund O. (eds) Fourth International Symposium on Domain Decomposition Methods for Partial Dierential Equations. SIAM, Philadelphia, PA, USA. [Smi90] Smith B. F. (1990) Domain decomposition algorithms for the partial dierential equations of linear elasticity. PhD thesis, Courant Institute of Mathematical Sciences, New York. [SrG96] Smith B., rstad P. B., and Gropp W. (1996) Domain Decomposition, Parallel Multilevel Methods for Elliptic Partial Dierential Equations. Cambridge University Press, New York, 1st edition.