Parallel sparse linear solvers and applications in CFD

Size: px

Start display at page:

Download "Parallel sparse linear solvers and applications in CFD"

Tracy Day
5 years ago
Views:

1 Parallel sparse linear solvers and applications in CFD Jocelyne Erhel Joint work with Désiré Nuentsa Wakam () and Baptiste Poirriez () SAGE team, Inria Rennes, France journée Calcul Intensif Distribué dans l Industrie, Université Paris 13, 22 janvier / 21

2 Outline Solver interface Interface to direct and iterative solvers: MUMPS, SuperLU Dist, Hypre, Petsc, parms,etc SLSI [Nuentsa Wakam et al 2010] available on demand System solver in H2OLab platform [Erhel et al 2009] Application to CFD problems (m): a Krylov method for general matrices combining Domain Decomposition and deflation : a Krylov method for SPD matrices combining Domain Decomposition and deflation 2 / 21

3 Preconditioned Ax = b, A R n n x, b R n B = AM 1 (m): a Krylov subspace method [Saad and Schultz 1986, Meurant s book 1999, Saad s book 2003, Simoncini and Szyld 2007, Erhel 2011,...] Fix x 0, then r 0 = b Ax 0 K m(b, r 0) = span{r 0, Br 0,... B m 1 r 0} D A Find x m x 0 + K m(b, r 0) such that r m 2 = b Bx m 2 = min u x0 +Km(B,r 0 ) b Bu 2 Building blocks of Initial step: choose x 0, compute r 0 First step: generate an orthonormal basis V m+1 = [v 0,..., v m] of K m+1(b, r 0) such that v 0 = r 0/β, β = r 0 2, BV m = V m+1 H m Second step: approximate solution x m = x 0 + M 1 V my m r m = r 0 BV my m = V m+1(βe 1 H my m) y m = min y R m βe 1 H my 2 3 / 21

4 ... practical Arnoldi process 1: v 0 = r 0 / r 0 2 2: for k = 0,... do 3: p = Bv k 4: for i = 1 : k do 5: h ik = vi T p 6: p = p h ik v i 7: end for 8: h k+1,k = p 2 9: v k+1 = p/h k+1,k 10: end for BV m = V m+1 H m Granularity in parallel algorithms Communication-avoiding strategies Generate the basis vectors [Reichel 1990, Bai et al 1994] Orthogonalize the basis [De Sturler 1994, Erhel 1995, Sidje 1997] Improve the strategy [Hoemmen 2010, Demmel et al 2011] Preconditioning use multilevel methods to deal with large systems Schwarz preconditioning [Atenekeng Kahou et al 2007, Dufaud+Tromeur-Dervout 2010, Giraud+Haidar 2009, Smith et al s book 1996,...] Filtering and Schur complement [Li et al 2003, Grigori et al 2011] Multilevel parallelism [Nuentsa Wakam et al 2011, Giraud et al 2010,...] Complexity and stagnation with restarted (m) Use deflation to recover possible loss of information Deflation by preconditioning [Erhel et al 1996, Burrage et al 1998, Baglama et al 1998,...] Deflation by augmented basis [Morgan 1995, Morgan 2002,...] D A Work in the team SAGE Combine communication-avoiding... and Deflation... and domain decomposition preconditioners [Nuentsa Wakam 2011, Nuentsa Wakam+Erhel+Gropp 2013, Nuentsa Wakam+Pacull 2013, Nuentsa Wakam+Erhel 2014] 4 / 21

5 ... practical Arnoldi process 1: v 0 = r 0 / r 0 2 2: for k = 0,... do 3: p = Bv k 4: for i = 1 : k do 5: h ik = vi T p 6: p = p h ik v i 7: end for 8: h k+1,k = p 2 9: v k+1 = p/h k+1,k 10: end for BV m = V m+1 H m Granularity in parallel algorithms Communication-avoiding strategies Generate the basis vectors [Reichel 1990, Bai et al 1994] Orthogonalize the basis [De Sturler 1994, Erhel 1995, Sidje 1997] Improve the strategy [Hoemmen 2010, Demmel et al 2011] Preconditioning use multilevel methods to deal with large systems Schwarz preconditioning [Atenekeng Kahou et al 2007, Dufaud+Tromeur-Dervout 2010, Giraud+Haidar 2009, Smith et al s book 1996,...] Filtering and Schur complement [Li et al 2003, Grigori et al 2011] Multilevel parallelism [Nuentsa Wakam et al 2011, Giraud et al 2010,...] Complexity and stagnation with restarted (m) Use deflation to recover possible loss of information Deflation by preconditioning [Erhel et al 1996, Burrage et al 1998, Baglama et al 1998,...] Deflation by augmented basis [Morgan 1995, Morgan 2002,...] D A Work in the team SAGE Combine communication-avoiding... and Deflation... and domain decomposition preconditioners [Nuentsa Wakam 2011, Nuentsa Wakam+Erhel+Gropp 2013, Nuentsa Wakam+Pacull 2013, Nuentsa Wakam+Erhel 2014] 4 / 21

6 ... practical Arnoldi process 1: v 0 = r 0 / r 0 2 2: for k = 0,... do 3: p = Bv k 4: for i = 1 : k do 5: h ik = vi T p 6: p = p h ik v i 7: end for 8: h k+1,k = p 2 9: v k+1 = p/h k+1,k 10: end for BV m = V m+1 H m Granularity in parallel algorithms Communication-avoiding strategies Generate the basis vectors [Reichel 1990, Bai et al 1994] Orthogonalize the basis [De Sturler 1994, Erhel 1995, Sidje 1997] Improve the strategy [Hoemmen 2010, Demmel et al 2011] Preconditioning use multilevel methods to deal with large systems Schwarz preconditioning [Atenekeng Kahou et al 2007, Dufaud+Tromeur-Dervout 2010, Giraud+Haidar 2009, Smith et al s book 1996,...] Filtering and Schur complement [Li et al 2003, Grigori et al 2011] Multilevel parallelism [Nuentsa Wakam et al 2011, Giraud et al 2010,...] Complexity and stagnation with restarted (m) Use deflation to recover possible loss of information Deflation by preconditioning [Erhel et al 1996, Burrage et al 1998, Baglama et al 1998,...] Deflation by augmented basis [Morgan 1995, Morgan 2002,...] D A Work in the team SAGE Combine communication-avoiding... and Deflation... and domain decomposition preconditioners [Nuentsa Wakam 2011, Nuentsa Wakam+Erhel+Gropp 2013, Nuentsa Wakam+Pacull 2013, Nuentsa Wakam+Erhel 2014] 4 / 21

7 ... practical Arnoldi process 1: v 0 = r 0 / r 0 2 2: for k = 0,... do 3: p = Bv k 4: for i = 1 : k do 5: h ik = vi T p 6: p = p h ik v i 7: end for 8: h k+1,k = p 2 9: v k+1 = p/h k+1,k 10: end for BV m = V m+1 H m Granularity in parallel algorithms Communication-avoiding strategies Generate the basis vectors [Reichel 1990, Bai et al 1994] Orthogonalize the basis [De Sturler 1994, Erhel 1995, Sidje 1997] Improve the strategy [Hoemmen 2010, Demmel et al 2011] Preconditioning use multilevel methods to deal with large systems Schwarz preconditioning [Atenekeng Kahou et al 2007, Dufaud+Tromeur-Dervout 2010, Giraud+Haidar 2009, Smith et al s book 1996,...] Filtering and Schur complement [Li et al 2003, Grigori et al 2011] Multilevel parallelism [Nuentsa Wakam et al 2011, Giraud et al 2010,...] Complexity and stagnation with restarted (m) Use deflation to recover possible loss of information Deflation by preconditioning [Erhel et al 1996, Burrage et al 1998, Baglama et al 1998,...] Deflation by augmented basis [Morgan 1995, Morgan 2002,...] D A Work in the team SAGE Combine communication-avoiding... and Deflation... and domain decomposition preconditioners [Nuentsa Wakam 2011, Nuentsa Wakam+Erhel+Gropp 2013, Nuentsa Wakam+Pacull 2013, Nuentsa Wakam+Erhel 2014] 4 / 21

8 ... practical Arnoldi process 1: v 0 = r 0 / r 0 2 2: for k = 0,... do 3: p = Bv k 4: for i = 1 : k do 5: h ik = vi T p 6: p = p h ik v i 7: end for 8: h k+1,k = p 2 9: v k+1 = p/h k+1,k 10: end for BV m = V m+1 H m Granularity in parallel algorithms Communication-avoiding strategies Generate the basis vectors [Reichel 1990, Bai et al 1994] Orthogonalize the basis [De Sturler 1994, Erhel 1995, Sidje 1997] Improve the strategy [Hoemmen 2010, Demmel et al 2011] Preconditioning use multilevel methods to deal with large systems Schwarz preconditioning [Atenekeng Kahou et al 2007, Dufaud+Tromeur-Dervout 2010, Giraud+Haidar 2009, Smith et al s book 1996,...] Filtering and Schur complement [Li et al 2003, Grigori et al 2011] Multilevel parallelism [Nuentsa Wakam et al 2011, Giraud et al 2010,...] Complexity and stagnation with restarted (m) Use deflation to recover possible loss of information Deflation by preconditioning [Erhel et al 1996, Burrage et al 1998, Baglama et al 1998,...] Deflation by augmented basis [Morgan 1995, Morgan 2002,...] D A Work in the team SAGE Combine communication-avoiding... and Deflation... and domain decomposition preconditioners [Nuentsa Wakam 2011, Nuentsa Wakam+Erhel+Gropp 2013, Nuentsa Wakam+Pacull 2013, Nuentsa Wakam+Erhel 2014] 4 / 21

9 Communication-avoiding strategy: Newton basis building blocks Initial step: run one cycle of (m) and compute shifts for the Newton basis First step: build a basis K m+1 = [k 0, k 1,..., k m] of the Krylov subspace K m+1 (B, r 0 ) such that BK m = K m+1 T m D A Second step: compute an orthonormal basis of K m+1 (B, r 0 ) Compute the QR factorization K m+1 = V m+1 R m+1 RODDEC [Sidje 1997, Erhel 1995] or TSQR [Demmel et al 2011] BK m = V m+1 R m+1 T m BV m = V m+1 R m+1 T mr 1 m }{{} Hm Third step: approximate solution x m = x 0 + M 1 V my m r m = r 0 BK my m = V m+1 (βe 1 H my m) with β = r 0 2 y m = min y R m βe 1 H my 2 5 / 21

10 Communication-avoiding strategy: Newton basis building blocks Initial step: run one cycle of (m) and compute shifts for the Newton basis First step: build a basis K m+1 = [k 0, k 1,..., k m] of the Krylov subspace K m+1 (B, r 0 ) such that BK m = K m+1 T m D A Second step: compute an orthonormal basis of K m+1 (B, r 0 ) Compute the QR factorization K m+1 = V m+1 R m+1 RODDEC [Sidje 1997, Erhel 1995] or TSQR [Demmel et al 2011] BK m = V m+1 R m+1 T m BV m = V m+1 R m+1 T mr 1 m }{{} Hm Third step: approximate solution x m = x 0 + M 1 V my m r m = r 0 BK my m = V m+1 (βe 1 H my m) with β = r 0 2 y m = min y R m βe 1 H my 2 5 / 21

11 Communication-avoiding strategy: Newton basis building blocks Initial step: run one cycle of (m) and compute shifts for the Newton basis First step: build a basis K m+1 = [k 0, k 1,..., k m] of the Krylov subspace K m+1 (B, r 0 ) such that BK m = K m+1 T m D A Second step: compute an orthonormal basis of K m+1 (B, r 0 ) Compute the QR factorization K m+1 = V m+1 R m+1 RODDEC [Sidje 1997, Erhel 1995] or TSQR [Demmel et al 2011] BK m = V m+1 R m+1 T m BV m = V m+1 R m+1 T mr 1 m }{{} Hm Third step: approximate solution x m = x 0 + M 1 V my m r m = r 0 BK my m = V m+1 (βe 1 H my m) with β = r 0 2 y m = min y R m βe 1 H my 2 5 / 21

12 Communication-avoiding strategy: Newton basis building blocks Initial step: run one cycle of (m) and compute shifts for the Newton basis First step: build a basis K m+1 = [k 0, k 1,..., k m] of the Krylov subspace K m+1 (B, r 0 ) such that BK m = K m+1 T m D A Second step: compute an orthonormal basis of K m+1 (B, r 0 ) Compute the QR factorization K m+1 = V m+1 R m+1 RODDEC [Sidje 1997, Erhel 1995] or TSQR [Demmel et al 2011] BK m = V m+1 R m+1 T m BV m = V m+1 R m+1 T mr 1 m }{{} Hm Third step: approximate solution x m = x 0 + M 1 V my m r m = r 0 BK my m = V m+1 (βe 1 H my m) with β = r 0 2 y m = min y R m βe 1 H my 2 5 / 21

13 combined with Domain Decomposition Main steps with Domain Decomposition preconditioning Partition the weighted graph of the matrix in parallel with PARMETIS. Redistribute the matrix and right-hand-side according to the PARMETIS partitioning. Perform a parallel iterative row and column scaling on the matrix and the right-hand side vector [Amestoy et al, 2008]. Define the overlap between the submatrices for the additive Schwarz preconditioner. D A M 1 RAS = Setup the submatrices (ILU or LU factorization). Solve iteratively the preconditioned system using. D (R 0 k )T (A δ k ) 1 R δ k k=1 6 / 21

14 Deflation strategies Restarted (m) x m = x 0 + M 1 V my m where y m minimizes r m 2 The convergence rate depends on the spectral distribution in B Smallest eigenvalues slow down the convergence Deflation occurs when the Krylov subspace is large enough With restarting : loss of spectral information, risk of stalling D A Accelerating the restarted [Simoncini and Szyld, 2007] Approximate the smallest eigenvalues and the associated invariant subspace U r Explicit deflation technique [Erhel et al 1996; Burrage et al 1998; Moriya et al 2000 ]: B M 1 x = b with M 1 = (I n + U r ( λ n T 1 I r )Ur T and T = Ur T BUr Augmented techniques [Morgan 2000, 2002, Giraud et al, 2010]: x m x 0 + span{u r } + K m(b, r 0 ) 7 / 21

15 Deflation strategies Restarted (m) x m = x 0 + M 1 V my m where y m minimizes r m 2 The convergence rate depends on the spectral distribution in B Smallest eigenvalues slow down the convergence Deflation occurs when the Krylov subspace is large enough With restarting : loss of spectral information, risk of stalling D A Accelerating the restarted [Simoncini and Szyld, 2007] Approximate the smallest eigenvalues and the associated invariant subspace U r Explicit deflation technique [Erhel et al 1996; Burrage et al 1998; Moriya et al 2000 ]: B M 1 x = b with M 1 = (I n + U r ( λ n T 1 I r )Ur T and T = Ur T BUr Augmented techniques [Morgan 2000, 2002, Giraud et al, 2010]: x m x 0 + span{u r } + K m(b, r 0 ) 7 / 21

16 D: with adaptive preconditioning deflation D(m, r) Perform one cycle of restarted (m) and compute shifts for the Newton basis compute U r, a basis of a coarse subspace Build M 1 r I n + U r ( λ n T 1 I r )U T r, T = UT r BUr Apply (m) to B M 1 r At each restart, update r and the basis U r D A Adaptive D(m,r) Detect a potential slow convergence [Sosonkina et al 1998] Switch to D(m,r) only if necessary [Nuentsa Wakam et al 2013] 8 / 21

17 D: with adaptive preconditioning deflation D(m, r) Perform one cycle of restarted (m) and compute shifts for the Newton basis compute U r, a basis of a coarse subspace Build M 1 r I n + U r ( λ n T 1 I r )U T r, T = UT r BUr Apply (m) to B M 1 r At each restart, update r and the basis U r D A Adaptive D(m,r) Detect a potential slow convergence [Sosonkina et al 1998] Switch to D(m,r) only if necessary [Nuentsa Wakam et al 2013] 8 / 21

18 D: with adaptive preconditioning deflation D(m, r) Perform one cycle of restarted (m) and compute shifts for the Newton basis compute U r, a basis of a coarse subspace Build M 1 r I n + U r ( λ n T 1 I r )U T r, T = UT r BUr Apply (m) to B M 1 r At each restart, update r and the basis U r D A Adaptive D(m,r) Detect a potential slow convergence [Sosonkina et al 1998] Switch to D(m,r) only if necessary [Nuentsa Wakam et al 2013] 8 / 21

19 Implementation of Deflated in PETSc New KSP type : D D A Usage in Petsc Available in PETSc Use D just as any other KSP with the following options 9 / 21

20 Experiments with CFD matrices FLUOREM matrices in MatrixMarket collection large, sparse, nonsymmetric matrices linearization of Navier-Stokes: symmetric profile with structured blocks Schwarz preconditioning combined with or D [Nuentsa Wakam+Erhel+Gropp 2013; Nuentsa Wakam+Pacull 2013] D A RM07R n= 381,689; nnz=37,464,962 HV15R n= 2,017,169; nnz=283,073, / 21

Combining D with domain decomposition RM07R HV15R D A CPU time on parallel computers RM07R, n = 381,689, nz = 37,464,962 D FULL- (48) D(32,2,15) ITS Time (s) ITS Time (s) ITS Time (s) r 16 92 214 169

21 Combining D with domain decomposition RM07R HV15R D A CPU time on parallel computers RM07R, n = 381,689, nz = 37,464,962 D FULL- (48) D(32,2,15) ITS Time (s) ITS Time (s) ITS Time (s) r HV15R, n = 2,017,169, nz = 283,073,458 D FULL- (150) D(96,10,70) ITS Time (s) ITS Time (s) ITS Time (s) r (8.4E-08) 3, (6.8E-01) / 21

22 A: deflation with an adaptive augmented basis Building blocks Initial step: run one cycle of (m) and compute shifts for the Newton basis Compute U r = [u 0, u 1,..., u r 1 ] a basis of a coarse subspace First step: build a basis K m+1 = [k 0, k 1,..., k m] of the Krylov subspace K m+1 (B, r 0 ) such that D A BK m = K m+1 T m Define the augmented subspace C s = K m(b, r 0 ) + span{u r } with s = m + r with the basis [ Km U r ] compute BU r = ˆK r D r Define the augmented subspace Ĉ s+1 = K m+1 (B, r 0 ) + span{bu r } with the basis [ Km+1 ˆK r ] 12 / 21

23 A: deflation with an adaptive augmented basis Building blocks Initial step: run one cycle of (m) and compute shifts for the Newton basis Compute U r = [u 0, u 1,..., u r 1 ] a basis of a coarse subspace First step: build a basis K m+1 = [k 0, k 1,..., k m] of the Krylov subspace K m+1 (B, r 0 ) such that D A BK m = K m+1 T m Define the augmented subspace C s = K m(b, r 0 ) + span{u r } with s = m + r with the basis [ Km U r ] compute BU r = ˆK r D r Define the augmented subspace Ĉ s+1 = K m+1 (B, r 0 ) + span{bu r } with the basis [ Km+1 ˆK r ] 12 / 21

24 A: deflation with an adaptive augmented basis Building blocks Second step: Compute an orthonormal basis of Ĉ s+1 QR factorize the augmented basis [ K m+1 ˆK r ] = Vs+1 R s+1 BK m = V m+1 R m+1 T m BV m = V m+1 R m+1 T mr 1 m D A BU r = (V m+1 R m+1,r + V r R r )D r Define the basis W s = [ V m U r ] BW s = V s+1 H s Third step: x s = x 0 + M 1 W s y s r s = r 0 BW s y s = V s+1 (βe 1 H s y s ) and β = r 0 2 y s = min y R s βe 1 H s y 2 Final step: Adaptively update r and the coarse basis U r 13 / 21

25 A: deflation with an adaptive augmented basis Building blocks Second step: Compute an orthonormal basis of Ĉ s+1 QR factorize the augmented basis [ K m+1 ˆK r ] = Vs+1 R s+1 BK m = V m+1 R m+1 T m BV m = V m+1 R m+1 T mr 1 m D A BU r = (V m+1 R m+1,r + V r R r )D r Define the basis W s = [ V m U r ] BW s = V s+1 H s Third step: x s = x 0 + M 1 W s y s r s = r 0 BW s y s = V s+1 (βe 1 H s y s ) and β = r 0 2 y s = min y R s βe 1 H s y 2 Final step: Adaptively update r and the coarse basis U r 13 / 21

26 A: deflation with an adaptive augmented basis Building blocks Second step: Compute an orthonormal basis of Ĉ s+1 QR factorize the augmented basis [ K m+1 ˆK r ] = Vs+1 R s+1 BK m = V m+1 R m+1 T m BV m = V m+1 R m+1 T mr 1 m D A BU r = (V m+1 R m+1,r + V r R r )D r Define the basis W s = [ V m U r ] BW s = V s+1 H s Third step: x s = x 0 + M 1 W s y s r s = r 0 BW s y s = V s+1 (βe 1 H s y s ) and β = r 0 2 y s = min y R s βe 1 H s y 2 Final step: Adaptively update r and the coarse basis U r 13 / 21

27 A in PETSc New KSP type : A figure D A Usage in Petsc Use A just as KSPSetType(ksp, KSPA) or -ksp type agmres, -pc type asm,... Options : -ksp gmres restart m, -ksp agmres eig r, -ksp max its maxits, -ksp agmres smv smv -ksp agmres bgv bgv, / 21

28 Experiments with Augmented RM07R: effect of the restarting RM07R: effect of the number of subdomains RM07R-A(m,2)-(m) RM07R-(m)-A(m,2)-D (32) A(32,2) (48) A(48,2) (32)-D32 A(32,2)-D32 (32)-D64 A(32,2)-D64 D A Rel. Res. Norm e-06 Rel. Res. Norm e-06 1e-08 1e-08 1e e Number of Iterations Number of Iterations CPU time on parallel computers RM07R, n = 381,689, nz = 37,464,962 D (32) A(32,r) ITS Time (s) ITS Time (s) / 21

29 Parallel CPU Time with A Convection-Diffusion test cases 3DCONSKY 121 : size = 1,771,561; nonzeros = 50,178,241 3DCOSKY 161 : size= 4,173,281; nonzeros = 118,645,121 [Nuentsa Wakam+Erhel 2014] D A 3DCONSKY 121 3DCONSKY 161 CPU Time DCONSKY_121 (16) A(16,1) Subdomains CPU Time DCONSKY_161 (16) A(16,1) Subdomains 16 / 21

30 Augmented Conjugate Gradient A Symmetric Positive Definite (SPD) matrix is a Krylov method short recurrences and minimization properties preconditioning M 1 SIDNUR Coarse grid and balancing [Nicolaides 1987, Mandel 1993, DD proceedings, Giraud et al.] Z basis of a coarse subspace restriction of A: nonsingular small matrix A c = Z T AZ projections: P = I AZA 1 c Z T and P T = I ZA 1 c (AZ) T coarse grid: ZA 1 c Z T balancing: C b = P T M 1 P + ZA 1 c Z T Coarse grid and augmented CG [Erhel et al 2000, Saad et al. 2000, Tang et al. 2009, Poirriez 2011, Nataf et al] x 0 = ZA 1 c Z T b C a = P T M 1 C a is equivalent to C b (if no loss of orthogonality) 17 / 21

31 SIDNUR: AugCG and Domain Decomposition Balancing Neumann Neumann applied to a Schur complement Neumann-Neumann preconditioning M 1 Balancing with a coarse grid Z SIDNUR SIDNUR [Poirriez 2011, Pichot et al. 2014] domain decomposition provided by the user coarse grid : signature of subdomains [Frank and Vuik 2001] C++ library mutual factorization of local Schur complements and local matrices management of floating subdomains numerical experiments with 3D fracture networks 18 / 21

32 Experiments with SIDNUR Flow computations in Discrete Fracture Networks random domain generated with MPFRAC software SPD sparse matrix solving with SIDNUR [Poirriez 2011] SIDNUR Number of iterations CPU time S c h u r S c h u r N b o f ite ra tio n s C P U T im e x x x x x x x x x x s y s te m s iz e n 6 x x x x x x x x x x s y s te m s iz e 19 / 21

33 Comparing SIDNUR with other solvers UMFPACK: direct solver Boomer-AMG: algebraic multigrid : Conjugate Gradient preconditioned by Boomer-AMG [Poirriez 2011] SIDNUR CPU time P C G A M G U m f S c h u r re la tiv e C P U tim e 1 0,1 6 x x x x x x x x x x s y s te m s iz e 20 / 21

34 Conclusion D KSP module: deflation in (m) with or without Newton basis A KSP module: augmented Newton basis in (m) Deflation combined with Schwarz domain decomposition preconditioning Robustness: reduce the restarting effects and the domain decomposition effects Efficiency: increase granularity and scalability Numerical experiments with CFD problems: D and A faster than SIDNUR Deflation combined with Schur domain decomposition SIDNUR: Balancing Domain Decomposition Robustness: reduce the domain decomposition effects Efficiency: parallel Schur and Neumann Neumann computations Numerical experiments with 3D fracture networks: faster than multigrid and Library soon available as free software 21 / 21

C. Vuik 1 R. Nabben 2 J.M. Tang 1

Deflation acceleration of block ILU preconditioned Krylov methods C. Vuik 1 R. Nabben 2 J.M. Tang 1 1 Delft University of Technology J.M. Burgerscentrum 2 Technical University Berlin Institut für Mathematik