Multipréconditionnement adaptatif pour les méthodes de décomposition de domaine. Nicole Spillane (CNRS, CMAP, École Polytechnique)

Multipréconditionnement adaptatif pour les méthodes de décomposition de domaine Nicole Spillane (CNRS, CMAP, École Polytechnique) C. Bovet (ONERA), P. Gosselet (ENS Cachan), A. Parret Fréaud (SafranTech), D.J. Rixen (TU Munich), F.-X. Roux (ONERA) 17 novembre 2017 Séminaire du LJLL

Find u Rn s.t. Ku = f where I K is symmetric positive definite (spd) X I K= Kτ {τ Th } I All Kτ are symmetric positive semi-definite (spsd). Nicole Spillane (CNRS,CMAP) 2/29

Nicole Spillane (CNRS,CMAP) 3/29 Two observations about parallel linear solvers 1. Direct solvers are robust (but require a lot of memory). e.g., Cholesky factorization w/ forward backward substitutions K = L L u = L \ 2. Iterative solvers are naturally parallel (but not robust). e.g., Preconditioned Conjugate Gradient L f \ Error Bound (log scale) 10 1 10 1 10 3 10 5 10 7 κ = 10 κ = 100 κ = 1000 [ ] m u u m A κ 1 2, κ = λ max. u u 0 A κ + 1 λ min 10 9 0 20 40 60 80 100 Iterations

Nicole Spillane (CNRS,CMAP) 3/29 Two observations about parallel linear solvers 1. Direct solvers are robust (but require a lot of memory). e.g., Cholesky factorization w/ forward backward substitutions K = L L u = L \ 2. Iterative solvers are naturally parallel (but not robust). e.g., Preconditioned Conjugate Gradient L f \ Error Bound (log scale) 10 1 10 1 10 3 10 5 10 7 10 9 0 20 40 60 80 100 Iterations κ = 10 κ = 100 κ = 1000 [ ] m u u m A κ 1 2, κ = λ max. u u 0 A κ + 1 λ min Hybrid Direct/Iterative solvers should be both naturally parallel and robust e.g., Domain Decomposition

1 Balancing Domain Decomposition 2 GenEO Coarse Space 3 Multipreconditioned BDD 4 Adaptive Multipreconditioned BDD Balancing Domain Decomposition

Balancing Domain Decomposition (BDD) : Ku = f (K spd) Global Formulation Ω Γ Ω s Γ s J. Mandel. Balancing domain decomposition. Comm. Numer. Methods Engrg., 9(3) :233 241, 1993. ( ) ( ) KΓΓ K ΓI u,γ = K IΓ BDD is a hybrid solver : K II It solves Au,Γ = b with PCG. u,i ( fγ f I Ku = f ) (boundaries) (interiors) u,i = K 1 II (f I K IΓ u,γ ) and (K ΓΓ K ΓI K 1 II K IΓ ) u,γ = f Γ K ΓI K 1 II f I. }{{}}{{} :=A :=b Within each iteration, a direct solve is performed : K 1 II. Nicole Spillane (CNRS,CMAP) 4/29

Nicole Spillane (CNRS,CMAP) 5/29 Balancing Domain Decomposition (BDD) : Ku = f (K spd) Parallel Formulation Ω Γ Ω s Γ s A = K ΓΓ K ΓI K 1 II K IΓ ; K II = K 1 I 1I 1... K N I N I N.

Nicole Spillane (CNRS,CMAP) 5/29 Balancing Domain Decomposition (BDD) : Ku = f (K spd) Parallel Formulation Ω Γ Ω s Γ s A = K ΓΓ K ΓI K 1 II K IΓ ; K II = K 1 I 1I 1... K N I N I N. The operator A is a sum of local contributions : S s := K s Γ N s Γ K s s Γ s I (K s s I s I s) 1 K s I s Γ s A = R s S s R s R s s=1 Γ s Γ R s

Balancing Domain Decomposition (BDD) : Ku = f (K spd) Parallel Formulation Ω Γ Ω s Γ s Good preconditioner : good approximation of A 1, cheap to compute. A = K ΓΓ K ΓI K 1 II K IΓ ; K II = K 1 I 1I 1... K N I N I N. The operator A is a sum of local contributions : S s := K s Γ N s Γ K s s Γ s I (K s s I s I s) 1 K s I s Γ s A = R s S s R s R s s=1 Γ s Γ R s The preconditioner H also : N H := R s D s S s 1 D s R s, with s=1 N R s D s R s = I. Nicole Spillane (CNRS,CMAP) 5/29 s=1

Nicole Spillane (CNRS,CMAP) 6/29 Some remarks The kernel of S s is handled by deflation and H := N R s D s S s D s R s s=1 ( : pseudoinverse). Eigenvalues of the preconditioned operator satisfy λ min (HA) 1 as a consequence of : N s=1 Rs D s R s = I. The choice of the scaling matrices D s is important (problems with heterogeneous coefficients or subdomains)

Nicole Spillane (CNRS,CMAP) 7/29 Elasticity with homogeneous subdomains N = 81 subdomains, ν = 0.4, E 1 = 10 7 and E 2 = 10 12. Young s modulus E Regular Partition Error (log scale) 10 0 10 1 10 2 10 3 10 4 10 5 10 6 0 2 4 6 8 10 12 Iterations

Lack of Robustness : Heterogeneous Elasticity N = 81 subdomains, ν = 0.4, E 1 = 10 7 and E 2 = 10 12. Young s modulus E Error (log scale) 10 0 10 1 10 2 10 3 10 4 Metis Partition 10 5 10 6 Three obectives for new DD methods : 0 20 40 60 80 100 120 140 Iterations Reliability : robustness and scalability. Efficiency : adapt automatically to difficulty. Simplicity : non invasive implementation. Nicole Spillane (CNRS,CMAP) 8/29

1 Balancing Domain Decomposition 2 GenEO Coarse Space 3 Multipreconditioned BDD 4 Adaptive Multipreconditioned BDD GenEO Coarse Space

Nicole Spillane (CNRS,CMAP) 9/29 Deflation, Coarse space : Ideal choice ((H)Ax = (H)b) General convergence result for PCG : [ ] m x x m A κ 1 2, where κ = λ max(ha) x x 0 A κ + 1 λ min (HA).

Nicole Spillane (CNRS,CMAP) 9/29 Deflation, Coarse space : Ideal choice ((H)Ax = (H)b) Convergence result for BDD [Mandel, Tezaur, 1996] : [ ] m x x m A λmax 1 2, because λ min 1. x x 0 A λmax + 1

Nicole Spillane (CNRS,CMAP) 9/29 Deflation, Coarse space : Ideal choice ((H)Ax = (H)b) Convergence result for BDD [Mandel, Tezaur, 1996] : [ ] m x x m A λmax 1 2, because λ min 1. x x 0 A λmax + 1 Deflation strategy [Nicolaides, 1987] [Dostál, 1988] (i) Compute (λ k, x k ) such that HAx k = λ k x k. (ii) Define U := {x k; λ k τ} (span(u) is the coarse space) (I Π) : A-orthogonal projection onto span(u). (iii) Solve Π Ax = Π b }{{} DD solver and Guaranteed Convergence : x x m A x x 0 A 2 (I Π )Ax = (I Π) b. }{{} direct solver [ ] m τ 1 τ + 1 BUT too expensive to compute.

GenEO coarse space for BDD (Generalized Eigenvalues in the Overlaps) 1. In each subdomain we want x s, S s x s τ x s, R s AR s x s for all x s so we solve : ) N S s x s k = λs k Rs AR s x s k (A, = R s S s R s. 2. Let the coarse space be } U := {R s x s k ; s = 1,, N and λs k τ ( w/ e.g. τ = 0.1). 3. Let (I Π) be the A-orthogonal projection onto span(u) Ax = b Π Ax = Π b }{{} DD solver and s=1 (I Π )Ax = (I Π) b. }{{} direct solver Guaranteed convergence : [ ] m x x m A N /τ 1 2, N : number of neighbours. x x 0 A N /τ + 1 Nicole Spillane (CNRS,CMAP) 10/29

Numerical Illustration : Heterogeneous Elasticity N = 81 subdomains, ν = 0.4, E 1 = 10 7 and E 2 = 10 12, τ = 0.1 Young s modulus E Metis Partition Error (log scale) 10 0 10 1 10 2 10 3 10 4 10 5 10 6 BDD BDD + GenEO 0 20 40 60 80 100 120 140 Iterations Size of the coarse space : n 0 = 349 including 212 rigid body modes. N. S. and D. J. Rixen. Int. J. Numer. Meth. Engng., 2013. N. S., V. Dolean, P. Hauret, F. Nataf, C. Pechstein, and R. Scheichl. Numer. Math., 2014. Nicole Spillane (CNRS,CMAP) 11/29

1 Balancing Domain Decomposition 2 GenEO Coarse Space 3 Multipreconditioned BDD 4 Adaptive Multipreconditioned BDD Multipreconditioned BDD

Nicole Spillane (CNRS,CMAP) 12/29 What is multipreconditioning? (H)Ax = (H)b PCG : Minimize x x i+1 A over x i + span(hr i ). An enlarged minimization space implies better convergence : deflation (e.g. with a spectral coarse space), block methods, or multipreconditioning! MPCG : Minimize x x i+1 A over x i + span(h 1 r i,, H N r i ) where H 1,..., H N are N preconditioners. R. Bridson and C. Greif. A multipreconditioned conjugate gradient algorithm. SIAM J. Matrix Anal. Appl., 2006. Very natural for DD : one preconditioner per subdomain H = N } R s D s S {{ s 1 D s R } s. :=H s s=1

What is multipreconditioning? (H)Ax = (H)b PCG : Minimize x x i+1 A over x i + span(hr i ). An enlarged minimization space implies better convergence : deflation (e.g. with a spectral coarse space), block methods, or multipreconditioning! MPCG : Minimize x x i+1 A over x i + span(h 1 r i,, H N r i ) where H 1,..., H N are N preconditioners. D. J. Rixen. Substructuring and Dual Methods in Structural Analysis. PhD thesis, 1997. R. Bridson and C. Greif. A multipreconditioned conjugate gradient algorithm. SIAM J. Matrix Anal. Appl., 2006. C. Greif, T. Rees, and D. Szyld. MPGMRES : a generalized minimum residual method with multiple preconditioners. Technical report, 2011 (now published in SeMA Journal). C. Greif, T. Rees, and D. Szyld. Additive Schwarz with variable weights. DD21 proceedings, 2014. P. Gosselet, D. J. Rixen, F.-X. Roux, and N. S. Simultaneous FETI and block FETI...International Journal for Numerical Methods in Engineering, 2015. Nicole Spillane (CNRS,CMAP) 12/29

Multipreconditioned CG for Ax = b prec. by {H s } s=1,,n A, H R n n spd, MPCG H = N s=1 Hs, where H s spsd. 1 x 0 given ; Initial Guess 2 r 0 = b Ax 0; 3 Z 0 = [ H 1 r 0 H N r 0 ] ; 4 P 0 = Z 0; Initial search directions 5 for i = 0, 1,, convergence do 6 Q i = AP i ; 7 α i = (Q i P i ) (P i r i ); 8 x i+1 = x i + P i α i ; Update approximate solution 9 r i+1 = r i Q i α i ; Update residual 10 Z i+1 = [ H 1 r i+1 H N r i+1 ] ; Multiprecondition 11 β i,j = (Q j P j ) (Q j Z i+1 ), j = 0,, i; 12 P i+1 = Z i+1 i j=0 P jβ i,j ; Project and orthog. 13 end 14 Return x i+1 ; PCG x 0 given ; r 0 = b Ax 0 ; z 0 = Hr 0; p 0 = z 0; for i = 0, 1,, conv. do end q i = Ap i ; α i = (q i p i ) 1 (p i r i ); x i+1 = x i + α i p i ; r i+1 = r i α i q i ; z i+1 = Hr i+1 ; β i = (q i p i ) 1 (q i z i+1 ); p i+1 = z i+1 β i p i ; Return x i+1 ; Nicole Spillane (CNRS,CMAP) 13/29

MultiPreconditioned CG for Ax = b prec. by {H s } s=1,,n A, H R n n spd, MPCG H = N s=1 Hs, where H s spsd. 1 x 0 given ; Initial Guess 2 r 0 = b Ax 0; 3 Z 0 = [ H 1 r 0 H N r i+1 ] ; Multiprecondition 4 P 0 = Z 0; Initial search directions 5 for i = 0, 1,, convergence do 6 Q i = AP i ; 7 α i = (Q i P i ) (P i r i ); 8 x i+1 = x i + P i α i ; Update approximate solution 9 r i+1 = r i Q i α i ; Update residual 10 Z i+1 = [ H 1 r i+1 H N r i+1 ] ; Multiprecondition 11 β i,j = (Q j P j ) (Q j Z i+1 ), j = 0,, i; 12 P i+1 = Z i+1 i j=0 P jβ i,j ; Orthogonalize 13 end 14 Return x i+1 ; Remark x i, r i R n. Z i, P i, R n N n : size of problem, N : nb of precs. Properties 1. Global minimization over x 0 + i 1 j=0 range(p j) of dimension : i N. 2. P i AP j = 0 (i j), but no short recurrence. Nicole Spillane (CNRS,CMAP) 14/29

Numerical Illustration (Heterogeneous Elasticity E 2 /E 1 = 10 5 ) 10 0 10 1 Error (log scale) 10 2 10 3 10 4 10 5 10 6 10 7 MPBDD BDD BDD+GenEO 0 20 40 60 80 100 120 140 Iterations Dimension of the minimization space 1,500 1,000 500 MPBDD BDD BDD+GenEO 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 Iterations Iterations Nicole Spillane (CNRS,CMAP) 15/29 Number of local solves 2.5 104 2 1.5 1 0.5 0 MPBDD BDD

Balancing Domain Decomposition GenEO Coarse Space Multipreconditioned BDD Adaptive Multipreconditioned BDD Solid propellant (linear elasticity with FETI) in collaboration with C. Bovet (ONERA) 6721 stiff inclusions, E2 /E1 6 106, 57 Mdofs, 360 subdomains on 1440 cores Nicole Spillane (CNRS,CMAP) 16/29

Nicole Spillane (CNRS,CMAP) 17/29 Good convergence but a possible limitation Local contributions H s r i form a good minimization space. Not adaptive : invert P i AP i R N N at each iteration in α i = (P i AP i ) (P i r i ). Introduce adaptativity into multipreconditioned CG. a few local vectors should suffice to accelerate convergence (previous work on GenEO).

1 Balancing Domain Decomposition 2 GenEO Coarse Space 3 Multipreconditioned BDD 4 Adaptive Multipreconditioned BDD Adaptive Multipreconditioned BDD

Adaptive Multipreconditioned CG for Ax = b preconditioned by N s=1 Hs. (τ R + is chosen by the user e.g., τ = 0.1) 1 x 0 chosen; 2 r 0 = b Ax 0 ; Z 0 = Hr 0 ; P 0 = Z 0; 3 for i = 0, 1,, convergence do 4 Q i = AP i ; 5 α i = (Qi P i ) (P i r i ); 6 x i+1 = x i + P i α i ; 7 r i+1 = r i Q i α i ; 8 t i = (P iα i ) A(P i α i ) r i+1 Hr ; i+1 9 if t i < τ then τ-test 10 Z i+1 = [ H 1 r i+1 H N ] r i+1 ; Multiprec. 11 else 12 Z i+1 = Hr i+1 ; Precondition 13 end 14 β i,j = (Qj P j ) (Qj Z i+1 ), j = 0,, i; 15 P i+1 = Z i+1 i P j β i,j ; j=0 16 end 17 Return x i+1 ; Remark x i, r i R n. Z i, P i R n N or R n, n : size of problem, N : nb of precs. Properties 1. Global minimization over x 0 + i 1 j=0 range(p j) of dim i dim i N. 2. P i AP j = 0 (i j), but no short recurrence. Nicole Spillane (CNRS,CMAP) 18/29

Theoretical Results (1/2) : Choice of the τ-test? Theorem If the τ test returns t i τ then x x i+1 A x x i A ( ) 1 1/2. 1 + τ Proof (inspired by [Axelsson, Kaporin, 01]) : n n x = x 0 + P i α i = x i + P j α j x x i 2 A = x x i 1 2 A x x i 2 A i=0 j=i x i+1 = x i + P i α i ; t i = (P iα i ) A(P i α i ) r i+1 Hr ; i+1 if t i < τ then ] Z i+1 = [H 1 r i+1 H N r i+1 else end Z i+1 = Hr i+1 ; n P j α j 2 A x x i 1 2 A = x x i 2 A+ P i 1 α i 1 2 A. j=i r i 2 H P i 1 α i 1 2 A = 1+ x x i 2 A r i 2 H }{{} = x x i 2 AHA/ x x i 2 A λ min 1 1+ P i 1α i 1 2 A r i 2 H } {{ } =t i 1 1+τ. Nicole Spillane (CNRS,CMAP) 19/29

Nicole Spillane (CNRS,CMAP) 20/29 Theoretical Results (2/2) : No extra work for well conditioned problems Theorem t i 1/λ max so if τ < 1/λ max then AMPCG is PCG.

Nicole Spillane (CNRS,CMAP) 21/29 Local Adaptive MPCG for N s=1 As x = b preconditioned by N s=1 Hs. (τ R + is chosen by the user) 1 x 0 chosen ; r 0 = b Ax 0 ; Z 0 = Hr 0 ; P 0 = Z 0; 2 for i = 0, 1,, convergence do 3 Q i = AP i ; 4 α i = (Q i P i ) (P i r i ); 5 x i+1 = x i + P i α i ; 6 r i+1 = r i Q i α i ; 7 Z i+1 = Hr i+1 ; initialize Z i+1 8 for s = 1,, N do 9 ti s = P iα i, A s P i α i r ; i+1 Hs r i+1 10 if ti s < τ then local τ-test 11 Z i+1 = [ Z i+1 H s ] r i+1 ; 12 end 13 end 14 β i,j = (Qj P j ) (Qj Z i+1 ), j = 0,, i; 15 P i+1 = Z i+1 i P j β i,j ; j=0 16 end 17 Return x i+1 ; An additional assumption A = Benefits N R } s {{ S s R } s. :=A s s=1 Between 1 and N search directions per iteration. Reduces the cost of storage and of inverting P i AP i.

Numerical Illustration (0/6) : Homogenous subdomains Dimension of the minimization space 800 600 400 200 AMPBDD (Global) AMPBDD (Local) MPBDD BDD BDD+GenEO 0 2 4 6 8 10 12 Error (log scale) 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 0 2 4 6 8 10 12 Iterations AMPBDD (Global) AMPBDD (Local) MPBDD BDD BDD+GenEO 0 2 4 6 8 10 12 Iterations Iterations Nicole Spillane (CNRS,CMAP) 22/29 Number of local solves 4,000 3,000 2,000 1,000 0 AMPBDD-G AMPBDD-L MPBDD BDD

Numerical Illustration (1/6) Elasticity with E 2 /E 1 = 10 5 10 0 10 1 Error (log scale) 10 2 10 3 10 4 10 5 10 6 10 7 AMPBDD (Global) AMPBDD (Local) MPBDD BDD BDD+GenEO 0 20 40 60 80 100 120 140 Iterations Dimension of the minimization space 1,500 1,000 500 AMPBDD (Global) AMPBDD (Local) MPBDD BDD BDD+GenEO 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 Iterations Iterations Nicole Spillane (CNRS,CMAP) 23/29 Number of local solves 2.5 104 2 1.5 1 0.5 0 AMPBDD (Global) AMPBDD (Local) MPBDD BDD

Numerical Illustration (2/6) : Zoom on new methods Dimension of the minimization space 1,500 1,000 500 AMPBDD (Global) AMPBDD (Local) MPBDD BDD+GenEO 0 5 10 15 20 25 Error (log scale) 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 0 5 10 15 20 25 Iterations AMPBDD (Global) AMPBDD (Local) MPBDD BDD+GenEO 0 5 10 15 20 25 Iterations Iterations Nicole Spillane (CNRS,CMAP) 24/29 Number of local solves 8,000 6,000 4,000 2,000 0 AMPBDD (Global) AMPBDD (Local) MPBDD

Numerical Illustration (3/6) : Weak Scalability FETI Software : Z-Set Wall time [s] 2,500 2,000 1,500 1,000 500 0 FETI AMPFETIL AMPFETIG Cluster : Cobalt at CCRT/TGCC 1422 computational nodes with Intel Broadwell Processors : 2.4 GHz, 28 cores 128 Go SDRAM, infiniband Mellanox network 7 cores per subdomain and local factorizations are performed with mumps. 8 27 64 125 216 343 512 Nb. of subdomains N #DOFs ( 10 6 ) #cores #iter. # min. space time (s) 8 1.6 56 69 132 89.58 64 12.5 448 112 742 210.80 216 42.0 1512 107 2257 277.30 512 99.2 3584 108 5729 376.30 C. Bovet, P. Gosselet, A. Parret Freaud, and N. S. Adaptive multi preconditioned FETI : scalability results and robustness assessment. Computers and Structures, 2017. Nicole Spillane (CNRS,CMAP) 25/29

Nicole Spillane (CNRS,CMAP) 26/29 Numerical Illustration (4/6) : Composite Weave Pattern FETI in collaboration with A. Parret Freaud (Safran Tech)

Numerical Illustration (5/6) : Choice of τ 1 35 Global Local Contraction factor ρ 0.8 0.6 0.4 Number of iterations 30 25 20 15 10 4 10 3 10 2 10 1 10 0 10 1 10 4 10 3 10 2 10 1 10 0 10 1 Threshold τ (log scale) Threshold τ (log scale) Dimension of the minimization space 1,000 800 600 400 Global Local Number of local solves 7,000 6,000 5,000 Global Local 10 4 10 3 10 2 10 1 10 0 10 1 10 4 10 3 10 2 10 1 10 0 10 1 Threshold τ (log scale) Threshold τ (log scale) Nicole Spillane (CNRS,CMAP) 27/29

Nicole Spillane (CNRS,CMAP) 28/29 Numerical Illustration (6/6) : 400 stiff inclusions FETI in collaboration with C. Bovet (ONERA) τ = 0.01, E max /E min = 10 5, 480 subdomains, 32 Mdofs

Conclusion AMPDD Robustness and Efficiency. Perspectives Best adaptation process in non symmetric cases? Best adaptation process in a fully algebraic context? Implementation of AMPDD in an open source code. Restart! Recycle! within AMPDD Krylov subspace solvers. Theoretical Analysis at the PDE level. N. S. An Adaptive Multipreconditioned Conjugate Gradient SISC, 2016. N. S. Algebraic Adaptive Multi Preconditioning applied to Restricted Additive Schwarz. DD23 Proceedings, 2016. C. Bovet, P. Gosselet, A. Parret Freaud, and N. S. Adaptive multi preconditioned FETI : scalability results and robustness assessment. Computers and Structures, 2017. C. Bovet, P. Gosselet, and N. S. Multipreconditioning for nonsymmetric problems : the case of orthomin and bicg. Note au CRAS, 2017. Nicole Spillane (CNRS,CMAP) 29/29