Parallel algebraic multilevel Schwarz preconditioners for a class of elliptic PDE systems

Size: px

Start display at page:

Download "Parallel algebraic multilevel Schwarz preconditioners for a class of elliptic PDE systems"

Bruce Simpson
5 years ago
Views:

1 Parae agebraic mutieve Schwarz preconditioners for a cass of eiptic PDE systems A. Borzì Vaentina De Simone Daniea di Serafino October 3, 213 Abstract Agebraic mutieve preconditioners for agebraic probems arising from the discretization of a cass of systems of couped eiptic partia differentia equations (PDEs) are presented. These preconditioners are based on modifications of Schwarz methods and of the smoothed aggregation technique, where the coarsening strategy and the restriction and proongation operators are defined using a point-based approach with a primary matrix corresponding to a singe PDE. The preconditioners are impemented in a parae computing framework and are tested on two representative PDE systems. The resuts of the numerica experiments show the effectiveness and the scaabiity of the proposed methods. A convergence theory for the twoeve case is presented. Keywords: systems of eiptic PDEs, agebraic mutieve preconditioners, Schwarz methods, smoothed aggregation, parae computing. MSC: 65F8, 65N55, 49J2. 1 Introduction It is a genera consensus that Kryov sovers preconditioned with mutieve Schwarz (MLS) methods [27, 3] are highy efficient and robust iterative schemes for soving inear systems that arise from the discretization of partia differentia equation (PDE) probems. MLS methods combine the natura paraeism of the domain decomposition technique, which spits a given probem into a coection of probems of the same type on smaer subdomains, with the optima convergence property resuting from the appication of the mutigrid strategy, which expoits a representation of the probem on a hierarchy of discretization meshes. MLS methods can be impemented with the agebraic approach, that is, using ony information on the system matrix, without assuming specific knowedge of the discretization used. Work partiay supported by INdAM-GNCS Projects Advanced numerica methods for arge-scae noninear constrained optimization probems (211) and Numerica methods and software for preconditioning inear systems arising in PDE and optimization probems (212), and by BMBF Verbundproject 5M213 ROENOBIO: Robust energy optimization of fermentation processes for the production of biogas and wine. Institut für Mathematik, Universität Würzburg, Campus Huband Nord, Emi-Fischer-Str. 3, 9774 Würzburg, Germany (afio.borzi@mathematik.uni-wuerzburg.de). Dipartimento di Matematica e Fisica, Seconda Università degi Studi di Napoi, viae A. Lincon 5, 811 Caserta, Itay (vaentina.desimone@unina2.it). Dipartimento di Matematica e Fisica, Seconda Università degi Studi di Napoi, viae A. Lincon 5, 811 Caserta, Itay, and Istituto di Cacoo e Reti ad Ate Prestazioni (ICAR), CNR, Napoi, Itay (daniea.diserafino@unina2.it). 1

2 In the case of scaar PDEs, the deveopment of agebraic MLS methods foows canonica guideines: the subprobems in one-eve Schwarz methods are usuay obtained by buiding an overapping partition of the adjacency graph of the matrix [7], whie the coarse-eve correction is obtained by appying typica coarsening strategies of agebraic mutigrid (AMG) methods; see, e.g., [15, 29]. However, many appication probems require to sove systems of PDEs and an effective extension of agebraic MLS methods is not straightforward. It is the purpose of this paper to present and investigate an agebraic MLS strategy for a cass of systems of eiptic PDEs. This cass is defined in terms of the structure of the system and of the properties of the singe eiptic equations, therefore we do not caim that our approach is purey agebraic athough ony genera information is required. A reasonabe step towards a successfu MLS procedure for PDE systems is the use of a pointbased approach, where the unknown variabes that may correspond to a grid point are treated simutaneousy (see, e.g., [18, 2, 22, 26]). In this case, one usuay reorders the unknowns so that those corresponding to the same grid point are consecutive, thus obtaining a bock-sparse matrix of coefficients where the bocks account for the contribution of a the equations to a singe grid point and the bock structure provides a compact information on the overa probem. As noted in [18], the point-based approach can be appied aso in a competey agebraic context where no physica grid points are avaiabe, provided that we are abe to identify bocks of variabes that can be treated as a whoe. In this context, a fundamenta issue is the definition of connectivity between bocks of variabes, which is the basis for the coarsening strategy [29]. In the scaar case, the connectivity among neighboring variabes is given by the stenci of the discretized differentia operator, that is naturay refected in the structure and the entries of the matrix. In the case of PDE systems, the point-based view requires a generaization of the connectivity concept to bocks of variabes resuting from the presence of mutipe PDE operators. The overapping bock-partitions used by the Schwarz methods are usuay buit by reasoning as in the scaar case, but on the graph obtained by considering the matrix bocks as points. However, the generaization of the coarsening process is more critica, since the strength of the bock connections must be defined. Some strategies for bock coarsening are outined in [18]. However, as noted in [2], the transfer operators resuting from the point-based strategy might significanty depend not ony on the differentia part, but aso on the zero-order terms in the equations, thus resuting in a possibe oss of efficiency of the mutieve method. On the other hand, the effective geometric mutigrid practice [3] is to negect the non-differentia part. For this reason, we aim at deveoping an agebraic coarsening that cosey resembes the geometric approach. In this paper, we consider a cass of eiptic PDE systems of the foowing type d k=1 x k ( d (i) k u (i) x k ) + n e m=1 b (i,m) u (m) = f (i), i = 1,..., n e, (1) where Ω R d and d (i) k, b(i,m) L (Ω) and f (i) L 2 (Ω) are given. We note that the couping of different unknowns takes pace ony through the zero-order terms. On the boundary Ω, the foowing Robin boundary conditions are assigned α u(i) n + β u(i) = γ (i), i = 1,..., n e, (2) where α, β, R, γ (i) L 2 (Ω) and n is the outward norma to Ω. The formuation (1)-(2) covers a wide cass of appications, incuding, e.g., optimaity systems of some PDE-constrained optimization probems and diffusion-reaction systems. For our purpose, we require that the second-order operators have simiar structure in terms of anisotropy. 2

3 We anayse a point-based agebraic approach for buiding MLS preconditioners where the restriction and proongation operators associated with both the Schwarz matrix partitioning and the coarsening strategy are defined by using a primary matrix that resuts from the discretization of the second-order differentia operator corresponding to a singe equation in the PDE system and negecting the zero-order couping terms. In particuar, we use this primary matrix in the context of the smoothed aggregation coarsening technique [4, 5, 32]. Furthermore, we present a parae impementation of the proposed preconditioners in the framework of MLD2P4 (Mutieve Domain Decomposition Parae Preconditioners Package based on PSBLAS) [11]. We remark that our approach has the great advantage of requiring a setup process whose computationa and memory compexity is simiar to that corresponding to a singe PDE. We notice that the point-based singe-operator approach has aready been expoited in [2], in the context of (sequentia) AMG methods with Stüben-type coarsening. Our aim is to extend and assess this methodoogy in the case of parae agebraic MLS methods based on smoothed aggregation. We show that our MLS preconditioners are abe to achieve the effectiveness of the scaar case as we as to yied good parae performance. They aso appear to be robust with respect to probem parameters such as the reguarization parameter in eiptic PDE-constrained optimization probems. As expected, the effectiveness of the preconditioners worsen on systems of PDEs where the eiptic operators show strong anisotropies in different directions; nevertheess, advantageous computationa performance is achieved as ong as the differences in the anisotropies are moderate. Aong this ine, we remark that our numerica experience and resuts reported in the mathematica iterature demonstrate that the use of Kryov iterations in combinations with MLS preconditioners provides a robust aternative to the usage of a pure mutigrid scheme. In fact, in the atter case more sophisticated ad-hoc coarsening techniques and intergrid transfer operators generay need to be constructed to guarantee an efficient mutieve method. This paper is organized as foows. In Section 2 we provide representative eiptic PDE systems that wi be used to test our MLS approach. In Section 3 we describe our MLS preconditioners. In Section 4 we present their parae impementation. In Section 5 we report the resuts of numerica experiments aimed at anayzing the effectiveness and the parae performance of the preconditioners on the seected test probems. In Section 6 we provide theoretica converge resuts for a twoeve case. A section of concusions competes this work. 2 Eiptic PDE systems An important cass of eiptic PDE probems of type (1)-(2) is given by the optimaity systems arising from a variety of eiptic PDE-constrained optimization probems [23]. In particuar, we consider the foowing distributed optima contro probem 1 2 y z 2 L 2 (Ω) + ν 2 u 2 L 2 (Ω), s.t. y u = g in Ω, (3) min u L2 (Ω) y = on Ω, where Ω R d is convex, and g, z L 2 (Ω) are given functions. In this context, y and z are the so-caed state and target functions, respectivey. The function u L 2 (Ω) represents the contro and the reguarization parameter ν > is the weight of the cost of the contro. For d = 2, probem (3) may describe the probem of controing the temperature distribution y of a materia pate, kept equa to zero aong the boundary, through the appication of a source u 3

4 aimed at getting a target temperature profie. The optima soution of (3) is characterized by the foowing optimaity conditions: y u = g in Ω, p + y = z in Ω, νu p = in Ω, y =, p = on Ω. However, we can use the condition νu p = to reduce the system as foows: ν y p = ν g in Ω, p + y = z in Ω, y =, p = on Ω. (4) This is one of the test probems used to anayse the behavior of the MLS preconditioners presented in this paper (detais about it are given in Section 5.1). Notice that the i-conditioning of this probem increases as ν gets smaer. Another wide cass of differentia probems of the type (1)-(2) is given by steady-state inear diffusion-reaction probems that describe a variety of appications, incuding chemica and bioogica processes [25]. In particuar, we consider diffusion-reaction probems where some diffusion operators are anisotropic aong different directions as foows: 2 u x1 2 2 u x2 2 + u + v w = f in Ω, ε 2 v x1 2 2 v x2 2 + u + v w = g in Ω, 2 w x 2 1 ε 2 w x 2 2 u v + w = h in Ω, (5) u = v = w = on Ω, where < ɛ < 1 and f, g, h L 2 (Ω). This probem is significant in the context of this work, since, as ε decreases, it becomes a more difficut test case for MLS methods because of the different anisotropies. 3 Mutieve Schwarz preconditioners for systems of eiptic PDEs In this section, we present our approach for buiding MLS preconditioners for inear systems arising from the discretization of eiptic differentia probems of the type (1)-(2). After discretization, these probems resut in an agebraic system of the foowing form where A (i) B (i,m) A (i) u (i) + n e m=1 B (i,m) u (m) = f (i), i = 1,..., n e, (6) is the discrete operator corresponding to the differentia part of the ith PDE and represents the zero-order term associated with the unknown vector u (m) in that PDE. We 4

5 assume that A (i) and the right-hand side f (i) take into account aso the discretization of the boundary conditions. We remark that the structure (6) identifies the cass of probems that we are considering. As we discuss next, MLS agorithms construct a hierarchy of coarser systems of the type (6); the index =,..., L in (6) refers to the th coarsening eve, where L identifies the origina (finest) system and identifies the coarsest system. System (6) can be aso represented in a more compact form as foows where A u = f, (7) A = Aˆ + B, ( Aˆ = diag A (1),..., A (n e) ( u = u (i) ), B = ) (, f = i=1,...,n e ( B (i,m) ) ) f (i). i=1,...,n e i,m=1,...,n e, The tota number of grid points at eve is denoted by N and the tota number of unknowns in (7) by N := n e N ; furthermore, the set of row indices of the matrix A is denoted by V. Note that, even if we use a point-based approach, we describe our method without reordering the rows and coumns of system (7) to have bocks of coefficients associated with grid points. This is a consequence of our choice of the primary matrix in the point-based approach, as shown next. We consider agebraic one-eve additive Schwarz (AS) methods and reca how they work in a genera scaar agebraic approach (see [7] for detais). In this case, one starts from a -overap partition of the index set V, i.e., from a set of t disjoint nonempty sets V,j V such that t V,j = V j, and generate overapping partitions of V by using the adjacency graph of the matrix A. For δ >, a δ-overap partition of V is constructed recursivey, by considering the sets V,j δ V δ 1,j obtained by incuding the vertices that are adjacent to any vertex in V δ 1,j. Once a δ-overap partition of V has been buit, we can associate with each of its sets V,j the restriction operator R,j that maps a vector u of size N onto the vector made ony of the entries of u with indices in V,j, and the proongation operator P,j = R,j T. These are the basic ingredients to obtain Schwarz preconditioners. In our approach, we define the overapping partitions and the corresponding restriction and proongation operators by appying the primary-matrix approach outined in the introduction, i.e., by using a singe bock A (j) in Aˆ, which is assumed to be representative of a the bocks. The choice of the primary matrix is a deicate issue. Our numerica experience and previous work on the eipticity symbo of PDE systems [34, 35] suggest that the equation with better (in the sense of more uniform) eipticity shoud provide the primary matrix. On the other hand, if the discretized differentia operators A (i) have simiar features, any of them can be chosen as the primary matrix. For the two probems described in Section 2, the primary matrix is chosen according to these guideines (see Sections 5.1 and 5.2). In the foowing, to simpify the notation, we denote by A the bock seected as primary matrix. We use the strategy previousy described to buid a δ-overap partition of the index set V associated with A, { } V,j, j = 1,..., t, and the associated restriction and proongation operators, R,j and P,j, respectivey. To buid Schwarz methods for the whoe system (7), we extend R,j and P,j as foows ˆP,j = I ne P,j, ˆR,j = I ne R,j, (9) (8) 5

6 where is the Kronecker product and I ne is the identity matrix of dimension n e. The operators (9) are used as in cassica Schwarz methods to buid the foowing matrices: A,j = ˆR,j A ˆP,j. (1) This procedure naturay associates a row-index set V,j with each matrix A,j. We define our AS preconditioner as foows M AS = t ˆP,j A 1,j ˆR,j, (11) where A,j is supposed to be nonsinguar. An approximate inverse S,j of A,j is usuay considered instead of A 1,j, to save computationa cost. Variants of this preconditioner are obtained by negecting the overap when the restriction or the proongation operators are appied, as in the scaar case. To buid the coarse index set and the reevant transfer operators, we use the smoothed aggregation technique [4, 5, 32]; however, we do not appy the bock approach proposed in [32], where the connectivity among bocks of variabes is measured using the norms of the matrix bocks resuting from the point-based approach. Instead, we buid the coarse index set and the proongation and restriction operators by using ony the seected matrix A, which is our { primary matrix. } The indices in V are grouped into subsets, forming a disjoint covering C,q, q = 1,..., N 1 of V. Each subset C,q, contains indices strongy couped to a certain root index in V ; two indices r and s are said strongy couped if the foowing hods: ars ϑ a rra ss, where ars denotes the (r, s)th entry of the matrix A and ϑ > is a suitaby chosen threshod [32]. A tentative proongator P 1, mapping a vector of ength N 1 onto a vector of ength N, is buit by expoiting information on the near nu space of A [24]; for exampe, if A stems from the discretization of the negative Lapacian, P 1 has the foowing form P 1 = ( p 1 sq ), p 1 sq = { 1 if s C,q, otherwise. (12) To reduce the energy of the proongator coumns, a damped Jacobi smoother is appied to P 1, thus obtaining P 1 = (I N ω D 1 A ) P 1, where I N is the identity matrix of dimension N, D is the diagona part of A, ω = 4/(3ρ ) and ρ is an upper bound on the spectra radius of D 1 A [4]. To obtain a smoothed proongator P 1 for system (7), the operator P 1 is extended as foows: The coarse-eve matrix A 1 is computed as ˆP 1 = I ne P 1. (13) A 1 = ˆR 1 A ˆP 1, (14) where ˆR 1 = ˆP T 1. We observe that A 1, ˆR 1 and ˆP 1 identify the associated index space, V 1 ; furthermore, A 1 has the same bock structure as A, hence the strategy described so far can be appied at any eve. 6

7 We note that, by construction, the restriction and proongation operators, and therefore the coarsening, are independent of the zero-order terms in the PDEs. On the other hand, to take into account the couping introduced by the matrix B, the matrix A 1 in (14) as we as the matrices A,j in (1) are computed by appying the reevant proongation and restriction operators to the matrix A and not to Aˆ. Therefore, using as primary matrix the N N bock A reduces the computationa cost for the coarsening and the construction of the restriction and proongation operators with respect to the cassica bock approach. On the other hand, the primary matrix A 1 = R 1 A P 1, to be used at the next coarse eve, must be computed too, but the cost of this operation is generay paid off by the smaer cost of the coarsening and the transfer operators, as ong as the number n e of PDEs is arge enough. The hierarchy of coarse matrices and the inter-eve transfer operators may be combined in severa ways with the AS preconditioners at each eve, to obtain different mutieve preconditioners (see, e.g., [1, 27]). The simpest combination is the additive scheme, where the coarseeve correction is added to the AS preconditioner at each eve; if two eves and 1 are considered, the additive mutieve Schwarz (AMLS) preconditioner takes the foowing form: M AMLS = ˆP 1 A 1 ˆR t ˆP,j A 1,j ˆR,j. Letting P L,j = ˆP L,j, j = 1,..., t L, P = ˆP L ˆP L 1 ˆP, =,..., L 1, P,j = P ˆP,j, = 1,..., L 1, j = 1,..., t, (15) P,j = P, j = t = 1, R,j = P T,j, =,..., L, j = 1,..., t, it is easy to verify that the additive mutieve Schwarz preconditioner using the L + 1 eves =,..., L (where L and denote the finest and the coarsest eve, respectivey) can be written as foows [27]: M AMLS = L t = P,j A 1,j R,j; (16) this matrix form wi be used in Section 6 to prove some theoretica resuts. Mutipicative combinations of the coarse matrices and the transfer operators with the AS preconditioners may be aso considered (see, e.g., [27] for detais); the resuting preconditioners are caed hybrid, since they are additive at each eve and mutipicative in passing from eve to eve 1 and vice versa. An exampe is given by the symmetrized mutipicative mutieve Schwarz (SMMLS) preconditioner [27, Agorithm 3.2.2], that we use in our numerica experiments in Section 5. For simpicity, we iustrate it in Figure 1 instead of providing its matrix form. Notice that the SMMLS scheme is equivaent to a V-cyce with zero starting guess at the finest eve; furthermore, this preconditioner is symmetric if the matrix A is symmetric. 4 Parae impementation issues The MLS preconditioners described in the previous sections were impemented in the framework of MLD2P4, a Fortran 95 package of parae agebraic mutieve Schwarz preconditioners based 7

8 SMMLS(, A, f ) if ( = ) then u = M AS f f 1 = ˆR 1 (f A u ) u 1 = SMMLS( 1, A 1, f 1 ) u = u + ˆP 1 u 1 u = u + M AS (f A u ) ese u = A 1 f endif return u Figure 1: The symmetrized mutipicative mutieve Schwarz method. on smoothed aggregation, designed for distributed-memory architectures [11]. MLD2P4 has a ayered software architecture, which is the resut of an object-based moduar approach refecting the intrinsic hierarchy of mutieve methods. In this context, objects of increasing compexity are obtained by suitaby combining simper objects, that are genera enough to ensure fexibiity and reuse, whie meeting performance requirements. This approach aows the introduction of new features into the package, as we as the modification of existing ones. At the bottom of this architecture, the PSBLAS ibrary [16,17] provides parae sparse inear agebra computation and communication kernes used as basic buiding bocks in the preconditioner setup and appication (see [9,11] for detais). These kernes are impemented on the top of BLACS [13] and MPI [28], which are de-facto standard message-passing environments. The preconditioners avaiabe in MLD2P4 can be used with the Kryov sovers impemented in PSBLAS. Whie a detaied description of the parae impementation of our MLS preconditioners is beyond the scope of this paper, here we outine very genera impementation choices, to better understand the parae performance resuts reported in the next section. The preconditioner data structure of MLD2P4 assume a row-bock distribution of the input matrix, where the rows may aso be non-contiguous, and a consistent distribution of the associated vectors of unknowns and right-hand sides; the number of row-bocks equas the number of avaiabe processors. This distribution naturay induces a partitioning of the row-index set associated with the matrix, which is the basis for buiding the overapping partitions used by the Schwarz methods. To keep the same data partitioning in a the phases of the construction and appication of our MLS preconditioners, we consider a point-based reordering of (7), where the unknowns associated with the same grid point are consecutive. We denote the resuting system as foows with A ū = f, (17) A = ΠA Π T, ū = Π T u, f = Πf, where Π is a suitabe permutation matrix. Note that A, ū and f have a bock structure, where 8

9 each bock accounts for a the variabes associated with a grid point. We have A (Ā(r,s) ) = + B (r,s) Ā (r,s), B (r,s) R n e n e, ( ū = ū (r) The primary matrix A = A (j) bock Ā (r,s) ) r=1,...,n, ( ) (r) f = f, r=1,...,n r,s=1,...,n, ū (r) R n e, f (r) R n e. is obtained from A by considering the jth row and coumn of each. Notice that, even if we do not assume any specific knowedge of the discretization scheme appied to the PDE probem, we must be abe to distinguish between the part of the matrix accounting for the discretization of the differentia terms and the part of the matrix accounting for the zero-order terms. In our opinion, this is not a strong requirement when soving agebraic probems arising from the discretization of a system of PDEs. A row-bock decomposition of system (17) is considered, where a the rows associated with the same grid point are assigned to the same processor; this naturay resuts in a partition of the primary matrix A and the corresponding row-index set V. This is the -overap partition used by Schwarz methods to buid a δ-overap partition of V and its extension to the whoe index set V. The impementation of our one-eve Schwarz methods is accompished by modifying the MLD2P4 components that identify the overap indices and retrieve the corresponding matrix rows on each processor. Concerning the coarsening strategy, MLD2P4 impements a decouped aggregation agorithm, that acts ocay on the subset of indices assigned to each processor, negecting any connection with indices hed by other processors. This agorithm does not require communication among the processors, thus aowing a significant time saving with respect to other parae aggregation agorithms. On the other hand, it may generate nonuniform aggregates near the boundary indices (the indices hed by a processor that are adjacent, in the matrix graph, to indices in other processors) and is dependent on the number of processors and on the initia partitioning of the matrix. Nevertheess, this approach has shown to be abe to produce good resuts in practice [6, 9, 1, 31]. In our case, the decouped aggregation is performed on V and then the resuting aggregates and proongation operators are extended according to the point-based ordering of the unknowns. The oca indices to be aggregated on each processor are determined by the initia -overap partition of V. The aggregation and the construction of the smoothed restriction and proongation operators in our MLS preconditioners were obtained by suitaby modifying the corresponding tasks impemented in MLD2P4, to dea with the primary matrix instead of the whoe matrix. We note that the modifications introduced in MLD2P4 are impemented by using the data structures of MLD2P4 and the basic sparse inear agebra computation and communication kernes provided by PSBLAS, to preserve the objectives of efficiency, portabiity, and genera appicabiity of the origina package. 5 Numerica experiments We performed numerica experiments to evauate the effectiveness of our parae point-based agebraic MLS preconditioners. We appied the preconditioners to sove agebraic probems arising from the discretization of the optimaity system (4) and the anisotropic diffusion-reaction probem (5). These PDE probems are defined on the domain Ω = [, 1] [, 1] and are approximated by second-order centra finite-differences on a uniform n n grid. Henceforth, the corresponding inear systems are referred to as probem 1 and probem 2, respectivey. Notice 9

10 that the former is nonsymmetric, whie the atter is symmetric positive definite and anisotropic. For each probem we considered different vaues of the parameters and different sizes of the discretization grids. We report resuts obtained using the SMMLS scheme with different eves. The coarsest matrix A was repicated on the processors and the corresponding system was soved by the sparse LU factorization impemented in UMFPACK [12], avaiabe through its MLD2P4 interface. According to [32], the threshod ϑ in the aggregation agorithm was chosen adaptivey as ϑ =.8 L, where is the current eve. The bock-jacobi (BJAC) method was used as basic Schwarz preconditioner, with the ILU() factorization to approximate the bocks A 1,j in (11); notice that ILU() reduces to an incompete LDL T factorization when A,j is symmetric positive definite. The preconditioners were couped with Kryov sovers from PSBLAS, i.e., the BiCGStab method was appied to probem 1 and the Conjugate Gradient (CG) method to probem 2. The zero vector was used as initia guess. The iterations were stopped when the 2-norm of the reative residua was smaer than a fixed toerance, set to 1 6. Furthermore, a maximum number of 1 iterations was considered. In the foowing, the SMMLS preconditioners are denoted by xlev, where x is the number of eves. For comparison purposes, we aso considered the cassica bock version of the SMMLS preconditioners based on the smoothed aggregation technique, as impemented in the Triinos/ML package [19]. A the components and parameters of the Triinos/ML preconditioners were chosen as in our preconditioners, except the threshod ϑ in the aggregation agorithm, which was set to the same vaue at a the eves because ML does not aow a variabe choice of it (see Sections 5.1 and 5.2 for more detais). The ML preconditioners were couped with the BiCGStab and CG impementations provided by AztecOO in the context of the Triinos framework [21], using the zero initia guess and the stopping criterion based on the 2-norm of the reative residua, with toerance 1 6 and maximum number of iterations equa to 2. A the experiments were carried out on a HP XC 6 Linux custer avaiabe at ICAR-CNR, Napes branch, Itay. Each node of the custer comprises an Inte Itanium 2 Madison bi-processor, with cock frequency of 1.4 Ghz, and 4 GB of RAM; it runs HP Linux for High Performance Computing 3 (kerne ). The interconnection network is Quadrics QsNetII Ean 4, which has a sustained bandwidth of 9 MB/sec. and a atency of about 5 µsec. for arge messages. Our preconditioners were impemented within MLD2P , instaed on the top of PSBLAS 2.3.4, ATLAS 3.6., BLACS 1.1 and the HP MPI impementation 2.1. UMFPACK 4.4 was used with MLD2P4. The software was compied with the version 4.3 of the GNU Compier Coection. The tests were performed in doube precision; the execution times were measured using the PSBLAS function psb_wtime. We used Triinos 8..5, instaed in the same environment. 5.1 PDE-constrained optimization probem In the PDE-constrained optimization probem (3), the target function z and the right-hand side g were chosen as foows: z(x, y) = sin(2πx) cos(2πy), g(x, y) = sin(2πx) sin(2πy). The preconditioners were buit using as a primary matrix the one arising from the discretization of the negative Lapacian and were tested varying the reguarization parameter, ν, the number of grid points at the finest eve, N L = n 2, and the number of processors, np. Note that with this choice of the primary matrix the mutieve proongation and restriction operators are independent of ν. 1

11 np BJAC 2LEV 3LEV 4LEV 5LEV ν = ν = ν = Tabe 1: Probem 1 - number of iterations as ν varies, for n = 1 ( indicates that 1 iterations were performed without satisfying the accuracy requirement). np 2LEV 3LEV 4LEV 5LEV n = Tabe 2: Probem 1 - number of iterations for n = 5 and ν = 1 3. Tabe 1 shows the number of BiCGStab iterations for ν = 1 3, 1 5, 1 7 and n = 1. This grid size correspond to a inear system with 3 miion unknowns. We see that, with a the eves, the number of iterations is amost constant with respect to ν and np. Notice that the BJAC preconditioner eads to convergence ony with ν = 1 3. We remark that resuts of further experiments show that our mutieve preconditioners provide the same performance when higher accuracy is required. In particuar, the number of iterations with toerance 1 12 is ower than twice the number of iterations obtained with toerance 1 6. We aso report, in Tabe 2, the iterations obtained by the mutieve preconditioners with grid size n = 5, in the case ν = 1 3. By comparing these data with the corresponding ones for n = 1, we see that the number of iterations is amost independent of the grid size. The same behaviour is obtained with ν = 1 5, 1 7. Thus, the preconditioners exhibit the typica behaviour of mutieve Schwarz methods for scaar probems. Next, we anayse the strong scaing of our MLS preconditioners. In Figure 2 we depict the execution times and the speedup vaues obtained with the different preconditioners for n = 1 11

12 3 time LEV 3LEV 4LEV 5LEV speedup LEV 3LEV 4LEV 5LEV np np Figure 2: Probem 1 - execution time and speedup for n = 1 and ν = 1 7. ν 2LEV 3LEV 4LEV Cassica New Tabe 3: Probem 1 - number of iterations obtained on one processor with the cassica bock version (Cassica) and our version (New) of SMMLS, for n = 5 and ϑ = 1 2 ( indicates that 2 iterations were performed without satisfying the accuracy requirement). and ν = 1 7 (simiar vaues were obtained with ν = 1 3, 1 5 ). The CPU times incude both the setup of the preconditioner and the soution of the inear system. We note that, on a singe processor, arger grid sizes (e.g., n = 2) resuted in too high memory requirements, yieding either out-of-memory fauts in UMFPACK or huge execution times, which do not aow a fair evauation of strong scaing. Therefore, we do not consider arger probems here. Figure 2 shows that 5LEV achieves the best performance in terms of both time and speedup; in particuar, the speedup on 32 processors is about 23, which can be considered satisfactory. As expected, 2LEV resuts much more expensive than the other preconditioners, because of the cost of repicating, factorizing and soving a arge coarse-eve system. It aso appears that the BJAC preconditioner is much ess efficient than the mutieve preconditioners with at east 3 eves. In another series of experiments, we compared our preconditioners with the corresponding Triinos/ML preconditioners, impementing the cassica bock approach to systems of PDEs. The experiments were performed using a singe processor, since we are interested in evauating our coarsening approach against the cassica one rather than our parae impementation against the Triinos one. We considered n = 5, from 2 to 4 eves, and different vaues of the aggregation threshod, i.e., ϑ =, 1 1, 1 2, 1 3, for both Triinos and our preconditioners, in order to make a fair comparison. In Tabe 3 we report resuts obtained with ϑ = 1 2 ; the behaviour of the preconditioners with the other vaues of ϑ is simiar. We see that with ν = 1 3 the 12

13 np BJAC 3LEV 4LEV 5LEV ε = ε = ε = Tabe 4: Probem 2 - number of iterations as ε varies, for n = 1 ( indicates that maxit was achieved without satisfying the accuracy requirement). two preconditioners are comparabe. However, as the vaue of ν decreases, the cassica bock preconditioner oses its effectiveness when more than two eves are used, and generay does not ead to convergence, whie our preconditioners ead to convergence with a modest number of iterations. 5.2 Anisotropic diffusion-reaction probem We discuss the resuts obtained on the anisotropic diffusion-reaction probem (5) using ε = 1 i, i =, 2, 4. Our purpose is to evauate the behaviour of our preconditioners as the difference among the differentia operators, due to the different anisotropies, increases. In this case we chose as a primary matrix the one arising from the discretization of the negative Lapacian, i.e., the matrix corresponding to the isotropic differentia operator in the PDE system (5). This makes the mutieve transfer operators independent of ε, as for probem 1. We set n = 1. In Tabe 4 we report the number of iterations of the CG sover preconditioned with BJAC and SMMLS. As expected, with the SMMLS preconditioner the number of CG iterations generay increases as ε decreases. However, this number is aways much smaer than with BJAC, yieding a smaer execution time for a the vaues of ε, as shown in Figure 3 (eft). For ε = 1, 1 2 the number of iterations is amost constant as the number of processors varies; conversey, for ε = 1 4 a growth of the number of iterations is aready visibe on 8 processors. We note that the significant reduction of the number of iterations observed on one processor for ε = 1 4 is a consequence of the greater effectiveness of the basic Schwarz preconditioner within our mutieve framework (the bock-jacobi method reduces to an incompete factorization when one processor is considered). The speedups corresponding to the anisotropic probem with ε = 1 2 are potted in Figure 3 (right). We see that a satisfactory speedup is obtained with both the 4LEV and 5LEV preconditioners. 13

14 1 3 BJAC 3LEV 4LEV 5LEV 3 25 BJAC 3LEV 4LEV 5LEV time 1 2 speedup np np Figure 3: Probem 2 - execution time and speedup for ε = 1 2 and n = 1. ε 3LEV 4LEV 5LEV Cassica New Tabe 5: Probem 2 - number of iterations obtained on one processor with the cassica bock version (Cassica) and our version (New) of SMMLS, for n = 5 and ϑ = 1 2. We concude this section comparing our preconditioners with the Triinos/ML preconditioners using the cassica bock approach. The experiments were performed on one processor, with n = 5 and ϑ =, 1 1, 1 2, 1 3. In Tabe 5 we report the number of iterations obtained with ϑ = 1 2, which generay produced the better resuts for amost a the vaues of ε. We see that for ε = 1 2 the cassica bock approach is more effective; nevertheess, the number of iterations obtained with our preconditioners is fairy good, especiay if we take into account the ower computationa cost that resuts from buiding the proongation and restriction operators by using a primary matrix associated with a singe PDE. As previousy noted, for ε = 1 4 the resuts obtained by our preconditioner on a singe processor are not representative of its genera behaviour. 6 Convergence anaysis In this section, we discuss the convergence properties of the AMLS preconditioners presented in Section 3. Specificay, we show how to extend cassica convergence resuts of additive mutieve Schwarz preconditioners to our schemes (see, e.g., [8, 27, 33]). For each eve, et W be the vector space of the grid functions u associated with the index space V, and et W,j, j = 1,..., t, be the vector spaces of the grid functions u,j associated with the index sets V,j of the overapping partition of V used by the AS method. We denote with 14

15 (, ) the discrete L 2 scaar product either in W or W,j and by = (, ) the corresponding discrete L 2 -norms; furthermore, we denote with A ˆ the energy norm associated with the matrix Aˆ in (8). Anaogousy, we denote with W the vector space of the grid functions u associated with the index set V corresponding to the primary matrix A at eve, and with W,j, j = 1,..., t, the vector spaces associated with the index sets V,j, a endowed with the corresponding L 2 scaar products, denoted again by (, ). We note that W can be regarded as the Cartesian product of n e spaces W (i), each isomorphic to W ; anaogousy, W,j can be regarded as the Cartesian product of n e spaces W (i),j isomorphic to W,j. Therefore, we can identify each space W (i) with W and each space W (i),j with W,j. This aows to express the L 2 scaar product of u, v W in terms of the scaar products of their components u (i), w (i), i = 1,..., n e, as foows ( (u, v ) = u (i) ( ), w (i) ) + + u (n e), w (n e). Simiary, the L 2 scaar product in W,j can be written in terms of the scaar product in W,j. The mutieve restriction and proongation operators defined in section 3 are such that ˆR : W W 1, ˆP : W 1 W and ( ˆR u, v 1 ) 1 = (u, ˆP v 1 ), (18) for a u W and v 1 W 1. Simiar resuts hod for the restriction and proongation operators ˆR,j and ˆP,j of the AS method at each eve. Furthermore, we require that the operator S,j : W,j W,j that approximates the inverse of A,j be symmetric positive definite for a j, athough A,j may not have this property. Indeed, whie this assumption appears restrictive, it greaty faciitates the anaysis. On the other hand, it is possibe to construct smoothers that are symmetric and the practice shows agreement with our theoretica statements aso when using nonsymmetric smoothers. By substituting A 1,j with S,j, the AMLS preconditioner in (16) takes the foowing form M = L t P,j S,j R,j, = where we negect the superscript AMLS in M for brevity. Thus, etting we get E,j = S,j R,j A, (19) MA = L t P,j E,j. = For simpicity, we consider the twoeve preconditioner. In this case it foows from (15) that P,j = ˆP,j and R,j = ˆR,j ; furthermore, we can convenienty omit the index and identify the coarse eve with j =. Therefore, in the twoeve case M and MA can be written as M = ˆP j S j ˆR j, MA = ˆP j E j. (2) We aso reca that A = ˆ A + B and suppose that ˆ A is symmetric positive definite. 15

16 We first discuss the convergence properties of the preconditioner M in the case where A is symmetric positive definite. We foow the same ine of reasoning as [8], but we make sighty different assumptions that take into account the roe of Aˆ in the costruction of M. These assumptions are specified next: (A.1) there exists a constant ξ > such that for a u W we can find u j W j, j =,..., t, satisfying u = t ˆP j u j and ( S 1 j u j, u j ) < ξ ( ˆ A u, u ) ; (A.2) there exists a constant ω > such that for a u j W j, j =,..., t, we have ( ) ) A ˆ ˆP j u j, ˆP j u j ω (S 1 j u j, u j ; (A.3) there exists a constant α > such that, for a u W and u j W j, j = 1,..., t, we have ( ) ( Au, ˆ ˆP j u j α ˆ Au, u ) ( ) A ˆ ˆP j u j, ˆP j u j (A.4) for any operator H such that ( ˆ Au, Hu) > for a nonzero u W, there exists a constant µ (, 1) such that for a u W, we have (Bu, Hu) µ ( ˆ Au, Hu); (21) furthermore, there exists a constant γ > such that for a u, v W, we have (Bu, v) γ( ˆ Au, u) ( ˆ Av, v). A the constants ξ, ω, α, µ and γ are assumed to be independent of t. Furthermore, without oss of generaity, we consider ξ, ω, α 1. Assumption (A.1) means that any function in the soution space W can be decomposed into a sum of functions in W j and this decomposition is stabe with respect to the energy norm induced by A. ˆ Assumption (A.4) means that B can be regarded as a sma perturbation of A; ˆ in particuar, inequaity (21) impies that (1 µ)( ˆ Au, Hu) (Au, Hu) (1 + µ)( ˆ Au, Hu). (22) Assumptions (A.2) and (A.4) ensure that the argest eigenvaue of S j A j is bounded by ω, i.e., S j does not provide a too bad approximation to A 1 j. Finay, (A.3) requires that the overapping of the spaces W j is bounded in terms of the energy norm induced by A. ˆ The foowing theorem provides a bound on the condition number of MA. Theorem 1 Let A be symmetric positive definite. Under the assumptions (A.1)-(A.4), with MA satisfying the assumptions for H in (A.4), it hods: ; κ(ma) ωξ(1 + α ) 2 (1 + µ) 3 (1 µ) 3, where κ(ma) is the spectra condition number of MA. 16

17 Proof. Foowing [8], we first estimate the smaest eigenvaue of MA. By using assumptions (A.1) and (A.4), as we as (18), (19) and (2), we get ( ˆ Au, u) = ( ˆ Au, = = ˆP j u j ) = (Au, ( ˆR j Au, u j ) (S 1 j (S 1 E j u, u j ) j u j, u j ) ξ ( Au, ˆ u) ( ˆR j Bu, u j ) ( ˆR j Bu, u j ) ˆP j u j ) (Bu, (S 1 ( ˆR j Au, E j u) j E j u, E j u) = ξ ( ˆ Au, u) (Au, MAu) + µ( ˆ Au, u). ˆP j u j ) + µ( ˆ Au, u) (Bu, u) It foows that and thus, by (22), we have (1 µ) 2 ( ˆ Au, u) ξ(au, MAu) (23) (Au, u) ξ(1 + µ) (Au, MAu), (24) (1 µ) 2 which gives the foowing estimate of the smaest eigenvaue of the preconditioned matrix: λ min (MA) (1 µ)2 ξ(1 + µ). (25) Next, we provide an upper bound on the argest eigenvaue of MA. We first observe that, by assumption (A.3), we have ( Au, ˆ ˆP j u j ) = ( Au, ˆ ˆP u ) + ( Au, ˆ ˆP j u j ) ( Au, ˆ u) ( Aˆ ˆP u, ˆP u ) + α ( Au, ˆ u) (1 + α ) ( Au, ˆ u) ( Aˆ ˆP j u j, ˆP j u j ). ( Aˆ ˆP j u j, ˆP j u j ) 17

18 Then, by expoiting assumptions (A.2) and (A.4), and equaities (18), (19) and (2), we get ( ˆ Au, MAu) = By (22) it foows that ( Au, ˆ ˆP j E j u) (1 + α ) ( Au, ˆ u) ( Aˆ ˆP j E j u, ˆP j E j u) ω (1 + α ) ( Au, ˆ u) = ω (1 + α ) ( Au, ˆ u) (S 1 j E j u, E j u) (Au, ˆP j E j u) = ω (1 + α ) ( ˆ Au, u) (Au, MAu). (Au, MAu) ω(1 + α ) 2 (1 + µ) 2 (Au, u) 1 µ and hence ω(1 + α ) 2 (1 + µ) 2 λ max (MA). 1 µ The previous inequaity, together with (25), yieds the thesis. In the genera case of A being nonsymmetric, one can appy a nonsymmetric Kryov sover, such as GMRES or BiCGStab, to sove the preconditioned system MAu = Mf. In particuar, for the GMRES method we have the foowing convergence estimate [14]: ( ) m r m 2ˆ A 1 β2 1 β 2 r 2 A ˆ, 2 where r m = Mf MAu m is the residua corresponding to the mth iterate, and β 1 and β 2 are defined as foows: ( Au, ˆ MAu) MAu A ˆ β 1 = min u = u 2ˆ, β 2 = max. u = u ˆ A A Notice that β 1 is required to be positive to get convergence of the preconditioned GMRES method [14]. Furthermore, from the Cauchy-Schwarz inequaity we have β 1 /β 2 1. To provide a ower bound on β 1 /β 2, we make a further assumption: (A.5) there exists a constant α 1 > such that, for any u W and u j W j, j = 1,..., t, it is (Bu, ˆP j u j ) α 1 ( Au, ˆ u) ( Aˆ ˆP j u j, ˆP j u j ). (26) 18

19 This means that B is bounded by ˆ A in some sense. As the costants in (A.1)-(A.4), α 1 is assumed to be independent of t. The foowing theorem hods: Theorem 2 Under the assumptions (A.1)-(A.5), with MA satisfying the assumptions for H in (A.4), β 1 (1 µ) 2 β ω ξ α (1 + µ)(1 + γ + α + α 1 ) >. Proof. From estimate (23), which hods because by hypothesis ( Au, ˆ MAu) > for u =, and from (22) it foows that (1 µ)2 β 1 ξ(1 + µ). (27) Next, we estimate β 2 foowing the procedure iustrated in [8]. Consider any u W and et w = t ˆP j E j u. By using assumption (A.3), we have It foows that w 2ˆ A = ( Aw, ˆ ˆP j E j u) = w ˆ ( Aw, ˆ ˆP j E j u) α w ˆ A α ˆP j E j u 2ˆ A ˆP j E j u 2ˆ By using the previous inequaity, the expression of MA in (2) and assumption (A.2), we get A. A. MAu 2ˆ A = ˆP j E j u 2ˆ A 2 ˆP E u 2ˆ A ˆP E u 2ˆ A + 2α ˆP j E j u 2ˆ A 2ωα ˆP j E j u 2ˆ A (S 1 j E j u, E j u). (28) By using assumptions (A.2), (A.3) and (A.4), as we as equaities (18) and (19), and inequa- 19

20 ity (26), we obtain = (S 1 j E j u, E j u) = ( Au, ˆ ˆP j E j u) + = ( ˆ Au, ˆP E u) + ( ˆR j Au, E j u) = (Bu, ˆP j E j u) (Au, ˆP j E j u) ( Au, ˆ ˆP j E j u) + (Bu, ˆP E u) + (1 + γ) u A ˆ ˆP E u A ˆ + (α + α 1 ) u A ˆ 2 (1 + γ + α + α 1 ) u ˆ A ˆP j E j u 2ˆ 2ω (1 + γ + α + α 1 ) u ˆ Therefore, we get the foowing bound: By using (28), we have and hence A A (S 1 (Bu, ˆP j E j u) ˆP j E j u 2ˆ j E j u, E j u) (S 1 j E j u, E j u) 4ω (1 + γ + α + α 1 ) 2 u 2ˆ A MAu 2ˆ A 8ω2 α (1 + γ + α + α 1 ) 2 u 2ˆ A β 2 2 2ωα (1 + γ + α + α 1 ). The thesis foows from the previous bound and (27). Ceary, in specific appications we must verify that assumptions (A.1)-(A.5) hod. To this aim, the bock-diagona form of Aˆ and ˆP (see (8), (9) and (13)) and the fact that the scaar products in W and in W j, j = 1,..., t, can be expressed in terms of the scaar products in the corresponding spaces associated with the primary matrix (i.e., W and W j ) may be of great hep. In particuar, we can expoit resuts concerning AMLS methods for scaar eiptic PDE probems (see, e.g., [4, 8]).. A 7 Concusions A point-based approach for buiding agebraic mutieve Schwarz preconditioners based on smoothed aggregation was discussed. These preconditioners were designed for soving agebraic probems arising from the discretization of a cass of systems of eiptic PDEs with zeroorder couping terms. In this approach, the restriction and proongation operators associated with the mutieve hierarchy and with the Schwarz matrix partitioning were defined based on a 2

21 primary matrix corresponding to the second-order differentia operator of a singe PDE. Further, a parae impementation of the proposed preconditioners was iustrated. Numerica experiments with eiptic optimaity systems and with muti-species anisotropic diffusion-reaction modes showed the effectiveness and the parae efficiency of the proposed preconditioners. Convergence resuts for the twoeve version of the proposed additive Schwarz preconditioner were presented. References [1] P. Bastian, W. Hackbusch, and G. Wittum, Additive and Mutipicative Mutigrid A Comparison, Computing, 6 (1998), pp [2] A. Borzì and G. Borzì, An agebraic mutigrid method for a cass of eiptic differentia systems, SIAM J. Sci. Comput., 25 (23), pp [3] A. Brandt, Mutigrid techniques: 1984 guide with appications to fuid dynamics, vo. 85 of GMD-Studien, Geseschaft für Mathematik und Datenverarbeitung mbh, St. Augustin, [4] M. Brezina and P. Vaněk, A back-box iterative sover based on a two-eve Schwarz method, Computing, 63 (1999), pp [5] M. Brezina, P. Vaněk, and P. S. Vassievski, An improved convergence anaysis of smoothed aggregation agebraic mutigrid, Numer. Linear Agebra App., 19 (212), pp [6] A. Buttari, P. D Ambra, D. di Serafino, and S. Fiippone, 2LEV-D2P4: a package of highperformance preconditioners for scientific and engineering appications, App. Agebra Engrg. Comm. Comput., 18 (27), pp [7] X.-C. Cai and Y. Saad, Overapping domain decomposition agorithms for genera sparse matrices, Numer. Linear Agebra App., 3 (1996), pp [8] T. F. Chan and J. Zou, A convergence theory of mutieve additive Schwarz methods on unstructured meshes, Numer. Agorithms, 13 (1996), pp [9] P. D Ambra, D. di Serafino, and S. Fiippone, On the deveopment of PSBLAS-based parae two-eve Schwarz preconditioners, App. Numer. Math., 57 (27), pp [1] P. D Ambra, D. di Serafino, and S. Fiippone, Performance anaysis of parae Schwarz preconditioners in the LES of turbuent channe fows, Comput. Math. App., 65 (213), pp [11] P. D Ambra, D. di Serafino, and S. Fiippone, MLD2P4: a package of parae agebraic mutieve domain decomposition preconditioners in Fortran 95, ACM Trans. Math. Software, 37 (21), art. 3, 23 pages. [12] T. A. Davis, Agorithm 832: UMFPACK V4.3 an unsymmetric-pattern mutifronta method, ACM Trans. Math. Software, 3 (24), pp [13] J. Dongarra and R. C. Whaey, A user s guide to the BLACS v1.1, Lapack Working Note 94, Tech. Report UT-CS , University of Tennessee, March 1995 (updated May 1997). 21

22 [14] S. Eisenstat, H. Eman, and M. Schutz, Variationa iterative methods for nonsymmetric systems of inear equations, SIAM J. Numer. Ana., 2 (1983), pp [15] R. D. Fagout, An introduction to agebraic mutigrid, Computing in Science and Engineering, 8 (26), pp [16] S. Fiippone and A. Buttari, PSBLAS: User s and Reference Guide, Avaiabe from [17] S. Fiippone and M. Coajanni, PSBLAS: a ibrary for parae inear agebra computation on sparse matrices, ACM Trans. Math. Software, 26 (2), pp [18] T. Füenbach and K. Stüben, Agebraic mutigrid for seected PDE systems, in Eiptic and paraboic probems (Roduc/Gaeta, 21), Word Sci. Pub., River Edge, NJ, 22, pp [19] M. W. Gee, C. M. Siefert, J. J. Hu, R. S. Tuminaro, and M. G. Saa, ML 5. smoothed aggregation user s guide, Tech. Report SAND , Sandia Nationa Laboratories, Abuquerque, NM, and Livermore, CA, USA, 26. [2] M. Griebe, D. Oetz, and M. A. Schweitzer, An agebraic mutigrid method for inear easticity, SIAM J. Sci. Comput., 25 (23), pp [21] M. A. Heroux, R. A. Bartett, V. E. Howe, R. J. Hoekstra, J. J. Hu, T. G. Koda, R. B. Lehoucq, K. R. Long, R. P. Pawowski, E. T. Phipps, A. G. Sainger, H. K. Thornquist, R. S. Tuminaro, J. M. Wienbring, A. Wiiams, and K. S. Staney, An overview of the Triinos Project, ACM Trans. Math. Soft., 31 (25), pp [22] D. Lahaye, H. De Gersen, S. Vandewae, and K. Hameyer, Agebraic mutigrid for compex symmetric systems, IEEE Trans. on Magnetics, 36 (2), pp [23] J.-L. Lions, Optima contro of systems governed by partia differentia equations., Springer- Verag, New York, [24] J. Mande, M. Brezina, and P. Vaněk, Energy optimization of agebraic mutigrid bases, Computing, 62 (1999), pp [25] F. Rothe, Goba Soutions of Reaction-Diffusion Systems, vo. 172 of Lecture Notes in Mathematics, Springer-Verag, Berin, [26] J. Ruge and K. Stüben, Agebraic mutigrid (AMG), in Mutigrid methods, S. F. McCormick, ed., vo. 3 of Frontiers in Appied Mathematics, SIAM, [27] B. F. Smith, P. E. Bjørstad, and W. D. Gropp, Domain Decomposition, Cambridge University Press, Cambridge, [28] M. Snir, S. Otto, S. Huss-Lederman, D. W. Waker, and J. J. Dongarra, MPI: The Compete Reference. Vo. 1 The MPI Core, Scientific and Engineering Computation, The MIT Press, Cambridge, MA, second ed., [29] K. Stüben, An introduction to agebraic mutigrid, in Mutigrid, U. Trottenberg, C. Oosteree, and A. Schüer, eds., Academic Press, 21, pp [3] A. Tosei and O. Widund, Domain Decomposition Methods - Agorithms and Theory, Springer,

Smoothers for ecient multigrid methods in IGA

Smoothers for ecient multigrid methods in IGA Smoothers for ecient mutigrid methods in IGA Cemens Hofreither, Stefan Takacs, Water Zuehner DD23, Juy 2015 supported by The work was funded by the Austrian Science Fund (FWF): NFN S117 (rst and third