Domain Decomposition-based contour integration eigenvalue solvers

Size: px

Start display at page:

Download "Domain Decomposition-based contour integration eigenvalue solvers"

Kathlyn Paul
6 years ago
Views:

1 Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM ALA 2015, VK, YS (U of M) DD-CINT techniques / 31

2 Acknowledgments Joint work with Eric Polizzi and James Kestyn (UM Amherst). Special thanks to Minnesota Supercomputing Institute for allowing access to its supercomputers. Work supported by NSF and DOE (DE-SC ). VK, YS (U of M) DD-CINT techniques / 31

3 Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques / 31

4 Introduction The sparse symmetric eigenvalue problem Ax = λx where A R n n, x R n and λ R. A pair (λ,x) is called an eigenpair of A. Focus in this talk Find all (λ,x) pairs inside the interval [α,β] where α, β R and λ 1 α, β λ n. Typical approach Project A on a low-dimensional subspace by V AVy = θv Vy, x = Vy. V: Krylov, (Generalized-Jacobi)-Davidson, contour integration,... VK, YS (U of M) DD-CINT techniques / 31

5 Contour integration (CINT) V := PˆV = 1 (ζi A) 1 dζ ˆV XX ˆV, 2iπ with Γ complex contour with endpoints [α,β]. V is an exact invariant subspace Numerical approximation PˆV P ˆV n c = j=1 Γ ω j (ζ j I A) 1ˆV, n c ρ(z) = with (weight,pole) (ω j,ζ j ), j = 1,...,n c. Trapezoidal, Midpoint, Gauss-Legendre,... Zolotarev, Least-Squares,... j=1 FEAST (Polizzi), Sakurai-Sugiura (SS), SS-CIRR,... ω j ζ j z VK, YS (U of M) DD-CINT techniques / 31

6 Main characteristics of CINT Can be seen as a (rational) filtering technique Different levels of parallelism Eigenvalue problem Linear systems with multiple right-hand sides In this talk We study contour integration from a Domain Decomposition (DD) point-of-view Two ideas: Use DD to derive CINT schemes Use DD to accelerate FEAST or other CINT-based method We target parallel architectures VK, YS (U of M) DD-CINT techniques / 31

7 Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques / 31

8 Partitioning of the domain (Metis, Scotch,...) Figure: An edge-separator (vertex-based partitioning) VK, YS (U of M) DD-CINT techniques / 31

9 The local viewpoint assume M partitions Stack interior variables u 1,u 2,...,u P into u, then interface variables y, B 1 E 1 u 1 u 1 B 2 E 2 u 2 u = λ. B M E M u M u M E1 E2 EM C y y VK, YS (U of M) DD-CINT techniques / 31

10 Pictorially: Write as ( ) B E A = E. C VK, YS (U of M) DD-CINT techniques / 31

11 Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques / 31

12 Expressing (A ζi) 1 in DD Let ζ C and recall that ( ) B E A = E. C VK, YS (U of M) DD-CINT techniques / 31

13 Expressing (A ζi) 1 in DD Let ζ C and recall that ( ) B E A = E. C Then ( (B ζi) 1 +F(ζ)S(ζ) 1 F(ζ) F(ζ)S(ζ) 1 ) (A ζi) 1 = S(ζ) 1 F(ζ) S(ζ) 1, where F(ζ) = (B ζi) 1 E S(ζ) = C ζi E T (B ζi) 1 E. VK, YS (U of M) DD-CINT techniques / 31

14 Spectral projectors and DD As previously, ( (B ζi) 1 +F(ζ)S(ζ) 1 F(ζ) F(ζ)S(ζ) 1 ) (A ζi) 1 = S(ζ) 1 F(ζ) S(ζ) 1, Then, P DD = 1 (A ζi) 1 dζ 2iπ Γ ( ) H W W G VK, YS (U of M) DD-CINT techniques / 31

15 Spectral projectors and DD As previously, ( (B ζi) 1 +F(ζ)S(ζ) 1 F(ζ) F(ζ)S(ζ) 1 ) (A ζi) 1 = S(ζ) 1 F(ζ) S(ζ) 1, Then, P DD = 1 (A ζi) 1 dζ 2iπ Γ ( ) H W W G H = 1 2iπ Γ [(B ζi) 1 +F(ζ)S(ζ) 1 F(ζ) ]dζ G = 1 2iπ Γ S(ζ) 1 dζ W = 1 2iπ Γ F(ζ)S(ζ) 1 dζ. VK, YS (U of M) DD-CINT techniques / 31

16 Extracting approximate eigenspaces Let ˆV be a set of mrhs to multiply P ) ( ) ( ) (ˆVu HˆVu WˆV s Zu P DD = ˆV s W ˆVu,with +GˆV s Z s Z u = 1 2iπ Γ (B ζi) 1ˆVu dζ 1 2iπ Γ F(ζ)S(ζ) 1 [ˆV s F(ζ) ˆVu ]dζ Z s = 1 2iπ Γ S(ζ) 1 [ˆV s F(ζ) ˆVu ]dζ. VK, YS (U of M) DD-CINT techniques / 31

17 Extracting approximate eigenspaces Let ˆV be a set of mrhs to multiply P ) ( ) ( ) (ˆVu HˆVu WˆV s Zu P DD = ˆV s W ˆVu,with +GˆV s Z s Z u = 1 2iπ Γ (B ζi) 1ˆVu dζ 1 2iπ Γ F(ζ)S(ζ) 1 [ˆV s F(ζ) ˆVu ]dζ Z s = 1 2iπ Γ S(ζ) 1 [ˆV s F(ζ) ˆVu ]dζ. In practice: n c n c Z u = ω j (B ζ j I) 1ˆVu ω j F(ζ j )S(ζ) 1 [ˆV s F(ζ) ˆVu ], j=1 j=1 n c Z s = ω j S(ζ j ) 1 [ˆV s F(ζ j ) ˆVu ]. j=1 VK, YS (U of M) DD-CINT techniques / 31

18 Pseudocode - Full projector (DD-FP) 1: for j = 1 to n c do 2: W u := (B ζ j I) 1ˆVu (local) 3: W s := ˆV s E W u (local) 4: W s := S(ζ j ) 1 W s ; Z s := Z s +ω j W s (distributed) 5: W u := W u (B ζ j ) 1 EW s ; Zu := Z u +ω j W u (local) 6: end for Practical considerations For each ζ j, j = 1,...,n c : Two solves with B ζ j I + One solve with S(ζ j ) The procedure can be repeated with an updated ˆV Equivalent to FEAST tied with a DD solver VK, YS (U of M) DD-CINT techniques / 31

19 An alternative scheme CINT along the interface unknowns P DD = 1 ( ) W (A ζi) 1 dζ = [P 1,P 2 ], 2iπ Γ G G = 1 S(ζ) 1 dζ, W = 1 (B ζi) 1 ES(ζ) 1 dζ. 2iπ 2iπ Γ Advantage: Does not involve the inverse of whole matrix. Γ VK, YS (U of M) DD-CINT techniques / 31

20 An alternative scheme CINT along the interface unknowns P DD = 1 ( ) W (A ζi) 1 dζ = [P 1,P 2 ], 2iπ Γ G G = 1 S(ζ) 1 dζ, W = 1 (B ζi) 1 ES(ζ) 1 dζ. 2iπ 2iπ Γ Advantage: Does not involve the inverse of whole matrix. Γ ( ) U P DD = XX, X = Y ( ) UY P DD = YY Just capture the range of P 2 = XY P 2 randn() Also: Lanczos on P 2 P 2 (sequential, doubles the work) VK, YS (U of M) DD-CINT techniques / 31

21 Pseudocode - Partial projector (DD-PP) 1: for j = 1 to n c do 2: W u := (B ζ j I) 1ˆVu (local) 3: W s := ˆV s E W u (local) 4: W s := S(ζ j ) 1ˆVs ; Zs := Z s +ω j W s (distributed) 5: W u := W u (B ζ j ) 1 EW s ; Zu := Z u +ω j W u (local) 6: end for Practical considerations For each ζ j, j = 1,...,n c : One solve with B ζ j I + One solve with S(ζ j ) More like a one-shot method VK, YS (U of M) DD-CINT techniques / 31

22 A more detailed analysis of DD-PP Spectral Schur complement λ det[s(λ)] = 0 (λ / σ(b)) The eigenvector satisfies ( ) (B λi) 1 Ey x =, with y := S(λ)y = 0. y If (B ζi) 1 analytic in [α,β] W = 1 (B ζi) 1 ES(ζ) 1 dζ {(B λi) 1 Ey} Γ 2iπ Γ G = 1 S(ζ) 1 dζ { y} Γ 2iπ Γ Another idea: Solve S(λ)y = 0 directly ([VK,RLi,YS],[VanB,Kra],[Sak]) VK, YS (U of M) DD-CINT techniques / 31

23 Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques / 31

24 A closer look at the Schur complement So far: Eigenvalue problem Linear systems with mrhs Schur complement From the DD framework we have S 1 (ζ) E E 1M E 21 S 2 (ζ)... E 2M S(ζ) =....., where E M1 E M2... S M (ζ) S i (ζ) = C i ζi E T i (B i ζi) 1 E i, i = 1,...,M, is the local Schur complement (complex symmetric). VK, YS (U of M) DD-CINT techniques / 31

25 Solving linear systems with the Schur complement Straightforward approach Form and factorize S(ζ) Extremely robust but impractical for 3D problems Alternative Use an approximation of S(ζ) Lots of ideas (parms, LORASC,...) Typical preconditioners implemented: Block Jacobi: Use C i, C i ζi or S i (ζ), i = 1,...,M Global approximation: Use C, C ζi or S(ζ) Memory Vs robustness Important: magnitude of the imaginary part of a pole VK, YS (U of M) DD-CINT techniques / 31

26 Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques / 31

27 Implementation and computing environment Hardware ITASCA HP Linux cluster at Minnesota Supercomputing Inst. 1,091 HP ProLiant BL280c G6 blade servers, each with two-socket, quad-core 2.8 GHz Intel Xeon X5560 Nehalem EP (24 GB per node) Software 40-gigabit QDR InfiniBand (IB) interconnect The software was written in C++ and on top of PETSc (MPI) Linked to AMD, METIS, UMFPACK, MUMPS, MKL-BLAS, MKL-LAPACK Compiled with mpiicpc (-O3) VK, YS (U of M) DD-CINT techniques / 31

28 Parallel implementation (0,0) (0,1) (0,2) (0,3) (1,0) (1,1) (1,2) (1,3) (2,0) (2,1) (2,2) (2,3) (3,0) (3,1) (3,2) (3,3) COMM DD COMM XX Different levels of parallelism (1-D, 2-D, 3-D grids) COMM DD is the communicator used for Domain Decomposition Different choices for XX XX = mrhs: Parallelize the multiple right-hand sides XX = poles: Parallelize the number of poles Selection depends on the linear solver (direct or iterative) VK, YS (U of M) DD-CINT techniques / 31

29 Experimental framework CINT + Subspace Iteration CINT-SI: standard FEAST approach Direct (MUMPS) or iterative (preconditioned) solver CINT + DD DD-FP: implements the full projector DD-PP: implements the partial projector Schur complement: exact or approximate Details # MPI processes # cores Quadrature rule: Gauss-Legendre Eig/vle tolerance: 1e 8 VK, YS (U of M) DD-CINT techniques / 31

30 Numerical illustration A comparison of DD-FP and DD-PP I D Laplacean [α,β]=[1.6,1.7] Average relative residual DD FP, n = 4 c DD PP: n = 4 c DD PP: n = 8 c DD PP: n = 12 c Iterations VK, YS (U of M) DD-CINT techniques / 31

31 Numerical illustration (cont. from previous) A comparison of DD-FP and DD-PP II VK, YS (U of M) DD-CINT techniques / 31

32 Test on a 2D Laplacian Table: Time is listed in seconds. A 2-D grid of processors was used. S(ζ) factorized explicitly. Number of poles: n c = 4 Exterior eigvls Interior eigvls MPI 16 4 MPI 32 4 MPI 64 4 [α, β] Its CINT-SI DD-FP CINT-SI DD-FP CINT-SI DD-FP [λ 1,λ 20 ] [λ 1,λ 50 ] [λ 1,λ 100 ] [λ 201,λ 220 ] [λ 201,λ 250 ] [λ 201,λ 300 ] Exterior: Number of right-hand sides number of eigvls + 20 Interior: Number of right-hand sides 2 number of eigvls VK, YS (U of M) DD-CINT techniques / 31

33 Efficiency of DD-FP Ratio Efficiency of DD FP (from previous table) [λ 1,λ 50 ] [λ 201,λ 250 ] [λ 1,λ 100 ] [λ,λ ] MPI Processes VK, YS (U of M) DD-CINT techniques / 31

34 Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques / 31

35 Conclusion In this talk We presented a DD-based form of contour integration DD can be used to: Solve the linear systems in numerical integration Derive DD-based contour integration methods The Schur complement holds a central role Considerations Higher moments of the Schur complement integral Block Krylov subspaces Preconditioning of indefinite linear systems VK, YS (U of M) DD-CINT techniques / 31

Divide and conquer algorithms for large eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota

Divide and conquer algorithms for large eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota PMAA 14 Lugano, July 4, 2014 Collaborators: Joint work with: