Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM ALA 2015, 10-30-2015 VK, YS (U of M) DD-CINT techniques 10-30-2015 1 / 31
Acknowledgments Joint work with Eric Polizzi and James Kestyn (UM Amherst). Special thanks to Minnesota Supercomputing Institute for allowing access to its supercomputers. Work supported by NSF and DOE (DE-SC0008877). VK, YS (U of M) DD-CINT techniques 10-30-2015 2 / 31
Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques 10-30-2015 3 / 31
Introduction The sparse symmetric eigenvalue problem Ax = λx where A R n n, x R n and λ R. A pair (λ,x) is called an eigenpair of A. Focus in this talk Find all (λ,x) pairs inside the interval [α,β] where α, β R and λ 1 α, β λ n. Typical approach Project A on a low-dimensional subspace by V AVy = θv Vy, x = Vy. V: Krylov, (Generalized-Jacobi)-Davidson, contour integration,... VK, YS (U of M) DD-CINT techniques 10-30-2015 4 / 31
Contour integration (CINT) V := PˆV = 1 (ζi A) 1 dζ ˆV XX ˆV, 2iπ with Γ complex contour with endpoints [α,β]. V is an exact invariant subspace Numerical approximation PˆV P ˆV n c = j=1 Γ ω j (ζ j I A) 1ˆV, n c ρ(z) = with (weight,pole) (ω j,ζ j ), j = 1,...,n c. Trapezoidal, Midpoint, Gauss-Legendre,... Zolotarev, Least-Squares,... j=1 FEAST (Polizzi), Sakurai-Sugiura (SS), SS-CIRR,... ω j ζ j z VK, YS (U of M) DD-CINT techniques 10-30-2015 5 / 31
Main characteristics of CINT Can be seen as a (rational) filtering technique Different levels of parallelism Eigenvalue problem Linear systems with multiple right-hand sides In this talk We study contour integration from a Domain Decomposition (DD) point-of-view Two ideas: Use DD to derive CINT schemes Use DD to accelerate FEAST or other CINT-based method We target parallel architectures VK, YS (U of M) DD-CINT techniques 10-30-2015 6 / 31
Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques 10-30-2015 7 / 31
Partitioning of the domain (Metis, Scotch,...) Figure: An edge-separator (vertex-based partitioning) VK, YS (U of M) DD-CINT techniques 10-30-2015 8 / 31
The local viewpoint assume M partitions Stack interior variables u 1,u 2,...,u P into u, then interface variables y, B 1 E 1 u 1 u 1 B 2 E 2 u 2 u 2..... = λ. B M E M u M u M E1 E2 EM C y y VK, YS (U of M) DD-CINT techniques 10-30-2015 9 / 31
Pictorially: Write as ( ) B E A = E. C VK, YS (U of M) DD-CINT techniques 10-30-2015 10 / 31
Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques 10-30-2015 11 / 31
Expressing (A ζi) 1 in DD Let ζ C and recall that ( ) B E A = E. C VK, YS (U of M) DD-CINT techniques 10-30-2015 12 / 31
Expressing (A ζi) 1 in DD Let ζ C and recall that ( ) B E A = E. C Then ( (B ζi) 1 +F(ζ)S(ζ) 1 F(ζ) F(ζ)S(ζ) 1 ) (A ζi) 1 = S(ζ) 1 F(ζ) S(ζ) 1, where F(ζ) = (B ζi) 1 E S(ζ) = C ζi E T (B ζi) 1 E. VK, YS (U of M) DD-CINT techniques 10-30-2015 12 / 31
Spectral projectors and DD As previously, ( (B ζi) 1 +F(ζ)S(ζ) 1 F(ζ) F(ζ)S(ζ) 1 ) (A ζi) 1 = S(ζ) 1 F(ζ) S(ζ) 1, Then, P DD = 1 (A ζi) 1 dζ 2iπ Γ ( ) H W W G VK, YS (U of M) DD-CINT techniques 10-30-2015 13 / 31
Spectral projectors and DD As previously, ( (B ζi) 1 +F(ζ)S(ζ) 1 F(ζ) F(ζ)S(ζ) 1 ) (A ζi) 1 = S(ζ) 1 F(ζ) S(ζ) 1, Then, P DD = 1 (A ζi) 1 dζ 2iπ Γ ( ) H W W G H = 1 2iπ Γ [(B ζi) 1 +F(ζ)S(ζ) 1 F(ζ) ]dζ G = 1 2iπ Γ S(ζ) 1 dζ W = 1 2iπ Γ F(ζ)S(ζ) 1 dζ. VK, YS (U of M) DD-CINT techniques 10-30-2015 13 / 31
Extracting approximate eigenspaces Let ˆV be a set of mrhs to multiply P ) ( ) ( ) (ˆVu HˆVu WˆV s Zu P DD = ˆV s W ˆVu,with +GˆV s Z s Z u = 1 2iπ Γ (B ζi) 1ˆVu dζ 1 2iπ Γ F(ζ)S(ζ) 1 [ˆV s F(ζ) ˆVu ]dζ Z s = 1 2iπ Γ S(ζ) 1 [ˆV s F(ζ) ˆVu ]dζ. VK, YS (U of M) DD-CINT techniques 10-30-2015 14 / 31
Extracting approximate eigenspaces Let ˆV be a set of mrhs to multiply P ) ( ) ( ) (ˆVu HˆVu WˆV s Zu P DD = ˆV s W ˆVu,with +GˆV s Z s Z u = 1 2iπ Γ (B ζi) 1ˆVu dζ 1 2iπ Γ F(ζ)S(ζ) 1 [ˆV s F(ζ) ˆVu ]dζ Z s = 1 2iπ Γ S(ζ) 1 [ˆV s F(ζ) ˆVu ]dζ. In practice: n c n c Z u = ω j (B ζ j I) 1ˆVu ω j F(ζ j )S(ζ) 1 [ˆV s F(ζ) ˆVu ], j=1 j=1 n c Z s = ω j S(ζ j ) 1 [ˆV s F(ζ j ) ˆVu ]. j=1 VK, YS (U of M) DD-CINT techniques 10-30-2015 14 / 31
Pseudocode - Full projector (DD-FP) 1: for j = 1 to n c do 2: W u := (B ζ j I) 1ˆVu (local) 3: W s := ˆV s E W u (local) 4: W s := S(ζ j ) 1 W s ; Z s := Z s +ω j W s (distributed) 5: W u := W u (B ζ j ) 1 EW s ; Zu := Z u +ω j W u (local) 6: end for Practical considerations For each ζ j, j = 1,...,n c : Two solves with B ζ j I + One solve with S(ζ j ) The procedure can be repeated with an updated ˆV Equivalent to FEAST tied with a DD solver VK, YS (U of M) DD-CINT techniques 10-30-2015 15 / 31
An alternative scheme CINT along the interface unknowns P DD = 1 ( ) W (A ζi) 1 dζ = [P 1,P 2 ], 2iπ Γ G G = 1 S(ζ) 1 dζ, W = 1 (B ζi) 1 ES(ζ) 1 dζ. 2iπ 2iπ Γ Advantage: Does not involve the inverse of whole matrix. Γ VK, YS (U of M) DD-CINT techniques 10-30-2015 16 / 31
An alternative scheme CINT along the interface unknowns P DD = 1 ( ) W (A ζi) 1 dζ = [P 1,P 2 ], 2iπ Γ G G = 1 S(ζ) 1 dζ, W = 1 (B ζi) 1 ES(ζ) 1 dζ. 2iπ 2iπ Γ Advantage: Does not involve the inverse of whole matrix. Γ ( ) U P DD = XX, X = Y ( ) UY P DD = YY Just capture the range of P 2 = XY P 2 randn() Also: Lanczos on P 2 P 2 (sequential, doubles the work) VK, YS (U of M) DD-CINT techniques 10-30-2015 16 / 31
Pseudocode - Partial projector (DD-PP) 1: for j = 1 to n c do 2: W u := (B ζ j I) 1ˆVu (local) 3: W s := ˆV s E W u (local) 4: W s := S(ζ j ) 1ˆVs ; Zs := Z s +ω j W s (distributed) 5: W u := W u (B ζ j ) 1 EW s ; Zu := Z u +ω j W u (local) 6: end for Practical considerations For each ζ j, j = 1,...,n c : One solve with B ζ j I + One solve with S(ζ j ) More like a one-shot method VK, YS (U of M) DD-CINT techniques 10-30-2015 17 / 31
A more detailed analysis of DD-PP Spectral Schur complement λ det[s(λ)] = 0 (λ / σ(b)) The eigenvector satisfies ( ) (B λi) 1 Ey x =, with y := S(λ)y = 0. y If (B ζi) 1 analytic in [α,β] W = 1 (B ζi) 1 ES(ζ) 1 dζ {(B λi) 1 Ey} Γ 2iπ Γ G = 1 S(ζ) 1 dζ { y} Γ 2iπ Γ Another idea: Solve S(λ)y = 0 directly ([VK,RLi,YS],[VanB,Kra],[Sak]) VK, YS (U of M) DD-CINT techniques 10-30-2015 18 / 31
Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques 10-30-2015 19 / 31
A closer look at the Schur complement So far: Eigenvalue problem Linear systems with mrhs Schur complement From the DD framework we have S 1 (ζ) E 12... E 1M E 21 S 2 (ζ)... E 2M S(ζ) =....., where E M1 E M2... S M (ζ) S i (ζ) = C i ζi E T i (B i ζi) 1 E i, i = 1,...,M, is the local Schur complement (complex symmetric). VK, YS (U of M) DD-CINT techniques 10-30-2015 20 / 31
Solving linear systems with the Schur complement Straightforward approach Form and factorize S(ζ) Extremely robust but impractical for 3D problems Alternative Use an approximation of S(ζ) Lots of ideas (parms, LORASC,...) Typical preconditioners implemented: Block Jacobi: Use C i, C i ζi or S i (ζ), i = 1,...,M Global approximation: Use C, C ζi or S(ζ) Memory Vs robustness Important: magnitude of the imaginary part of a pole VK, YS (U of M) DD-CINT techniques 10-30-2015 21 / 31
Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques 10-30-2015 22 / 31
Implementation and computing environment Hardware ITASCA HP Linux cluster at Minnesota Supercomputing Inst. 1,091 HP ProLiant BL280c G6 blade servers, each with two-socket, quad-core 2.8 GHz Intel Xeon X5560 Nehalem EP (24 GB per node) Software 40-gigabit QDR InfiniBand (IB) interconnect The software was written in C++ and on top of PETSc (MPI) Linked to AMD, METIS, UMFPACK, MUMPS, MKL-BLAS, MKL-LAPACK Compiled with mpiicpc (-O3) VK, YS (U of M) DD-CINT techniques 10-30-2015 23 / 31
Parallel implementation (0,0) (0,1) (0,2) (0,3) (1,0) (1,1) (1,2) (1,3) (2,0) (2,1) (2,2) (2,3) (3,0) (3,1) (3,2) (3,3) COMM DD COMM XX Different levels of parallelism (1-D, 2-D, 3-D grids) COMM DD is the communicator used for Domain Decomposition Different choices for XX XX = mrhs: Parallelize the multiple right-hand sides XX = poles: Parallelize the number of poles Selection depends on the linear solver (direct or iterative) VK, YS (U of M) DD-CINT techniques 10-30-2015 24 / 31
Experimental framework CINT + Subspace Iteration CINT-SI: standard FEAST approach Direct (MUMPS) or iterative (preconditioned) solver CINT + DD DD-FP: implements the full projector DD-PP: implements the partial projector Schur complement: exact or approximate Details # MPI processes # cores Quadrature rule: Gauss-Legendre Eig/vle tolerance: 1e 8 VK, YS (U of M) DD-CINT techniques 10-30-2015 25 / 31
Numerical illustration A comparison of DD-FP and DD-PP I 10 2 2D Laplacean 50 50 [α,β]=[1.6,1.7] Average relative residual 10 3 10 4 DD FP, n = 4 c DD PP: n = 4 c DD PP: n = 8 c DD PP: n = 12 c 10 5 1 1.5 2 2.5 3 3.5 4 Iterations VK, YS (U of M) DD-CINT techniques 10-30-2015 26 / 31
Numerical illustration (cont. from previous) A comparison of DD-FP and DD-PP II VK, YS (U of M) DD-CINT techniques 10-30-2015 27 / 31
Test on a 2D 1001 1000 Laplacian Table: Time is listed in seconds. A 2-D grid of processors was used. S(ζ) factorized explicitly. Number of poles: n c = 4 Exterior eigvls Interior eigvls MPI 16 4 MPI 32 4 MPI 64 4 [α, β] Its CINT-SI DD-FP CINT-SI DD-FP CINT-SI DD-FP [λ 1,λ 20 ] 3 88 37 53 26 42 20 [λ 1,λ 50 ] 3 159 65 88 38 65 30 [λ 1,λ 100 ] 5 432 172 241 98 136 65 [λ 201,λ 220 ] 3 89 37 53 26 42 20 [λ 201,λ 250 ] 4 286 113 164 67 110 48 [λ 201,λ 300 ] 4 440 214 245 122 141 84 Exterior: Number of right-hand sides number of eigvls + 20 Interior: Number of right-hand sides 2 number of eigvls VK, YS (U of M) DD-CINT techniques 10-30-2015 28 / 31
Efficiency of DD-FP Ratio 1 0.9 0.8 0.7 Efficiency of DD FP (from previous table) [λ 1,λ 50 ] [λ 201,λ 250 ] [λ 1,λ 100 ] [λ,λ ] 201 300 0.6 0.5 50 100 150 200 250 300 MPI Processes VK, YS (U of M) DD-CINT techniques 10-30-2015 29 / 31
Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques 10-30-2015 30 / 31
Conclusion In this talk We presented a DD-based form of contour integration DD can be used to: Solve the linear systems in numerical integration Derive DD-based contour integration methods The Schur complement holds a central role Considerations Higher moments of the Schur complement integral Block Krylov subspaces Preconditioning of indefinite linear systems VK, YS (U of M) DD-CINT techniques 10-30-2015 31 / 31