SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

Size: px

Start display at page:

Download "SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics"

Cuthbert Cross
6 years ago
Views:

1 SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015

2 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS 5 EXAMPLE OF USE 6 CONCLUSIONS AND REMARKS

3 2D POISSON PROBLEM 2D Poisson problem solution at Cartesius pardiso 1 thread pardiso 12 threads pardiso 24 threads fishpack lapack mkl n x = n y Results on 1 node of Cartesius with 24 cores LAPACK: fastest implementation on Cartesius PARDISO: shared-memory multiprocessing parallel direct sparse solver by Olaf Schenk[ 00-04] optimized for Intel R

4 residu 2D POISSON PROBLEM 2D Poisson problem accuracy pardiso fishpack lapack n x = n y LAPACK: maximum problem size n x = n y = 1300 FISHPACK: convergence till problem size n x = n y = 1400 PARDISO: maximum problem size n x = n y = 5600

5 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS 5 EXAMPLE OF USE 6 CONCLUSIONS AND REMARKS

6 CLUSTER MACHINE Cartesius, the Dutch Supercomputer at SURFsara is a cluster machine Node Type Number Cores CPU Clock Memory thin E v3 2.6 GHz 64 GB thin E v2 2.4 GHz 64 GB fat E GHz 256 GB gpu E v2 2.5 GHz 96 GB 40,960 cores GPUs: Pflop/s (peak performance) 117 TB memory (CPU + GPGPU) Fat nodes have 4 times more memory than thin nodes, but are slower

7 NODES AND CORES A Cartesius node can have 24 or 32 cores Within a node shared memory Over nodes distributed memory Nodes can be configured in different ways 1 NODE 1 NODE 1 NODE 8 CORES (a) 8 MPI processes 8 CORES (b) 8 OpenMP threads 8 CORES (c) 4 MPI processes

8 SOFTWARE MKL LIBRARY Intel R Math Kernel Library is a library of optimized math routines for science, engineering, and financial applications. Core math functions include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier transforms, and vector math. The routines in MKL are hand-optimized specifically for Intel R processors. Sparse solvers: MKL PARDISO- Parallel Direct Sparse Solver interface Parallel Direct Sparse Solver for Cluster Interface Direct Sparse Solvers (DDS) (Interface Routines) Iterative Sparse Solvers (based on Reverse Communication Interface)

9 SOFTWARE Intel R Poisson solvers for a single node: Two-dimensional Helmholtz problem on a Cartesian plane Two-dimensional Poisson problem on a Cartesian plane Two-dimensional Laplace problem on a Cartesian plane Helmholtz problem on a sphere Poisson problem on a sphere Three-dimensional Helmholtz problem Three-dimensional Poisson problem Three-dimensional Laplace problem

10 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS 5 EXAMPLE OF USE 6 CONCLUSIONS AND REMARKS

11 1D CELL CENTERED DIRICHLET BC Hundsdorfer and Verwer: Consider cell centered grid with nodes x i = (i 1 )h; i = 1,, M; h = 1/M. 2 For Dirichlet BC we need in x 0 = 1 h and in x 2 M+1 = h, 2 the virtual values u 0 and u M+1, such that 1 2 (u 0 + u 1 ) = γ (u M + u M+1 ) = γ M. We obtain the following semi-discrete system u 1 u i u M = 1 ( 3u h u 2 ) + 2 γ h 2 0, = 1 (u h 2 i 1 2u i + u i+1 ), 2 i M 1, = 1 (u h 2 M 1 3u M ) + 2 γ h 2 M,

12 1D CELL CENTERED DIRICHLET BC 1D Poisson matrix A of size M and RHS vector b are defined by A = 1 h , b = Note: the Poisson matrix is symmetric positive indefinite Note: correction on the RHS vector b h 2 γ 0 b 2 b 3... b M 1 b M + 2 h 2 γ M

13 2D AND 3D CELL CENTERED DIRICHLET BC 2D Poisson matrix A of size M 2 M 2 for M = 4 is defined by A = 1 h For the 3D case we distinguish 3 diagonal parts [ ] for cells on edges [ ] for cells on surfaces [ ] for inner cells supplemented with 3 sub diagonals and 3 super diagonals

14 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS 5 EXAMPLE OF USE 6 CONCLUSIONS AND REMARKS

15 POISSON SOLVER FOR LARGE 2D AND 3D SIMULATIONS Poisson solvers PARDISO (MKL) CLUSTER_SPARSE_SOLVER (MKL) MUMPS Release 5.0.1

16 ANALYSIS, FACTORIZATION, SOLVE To solve we factorize A into A x = b A = L D L T For both PARDISO, CLUSTER_SPARSE_SOLVER and MUMPS we can distinguish three main phases analysis and reordering factorization solution Note 1 : Each phase can be called independently (not for FISHPACK) Note 2 : Once the matrix has been factorized we may restrict to the solution phase

17 ANALYSIS, FACTORIZATION, SOLVE Analysis phase reordering of the matrix to reduce fill-in choosing pivots using a selection criterion to preserve sparsity matrix input distributions CRS for PARDISO and CLUSTER_SPARSE_SOLVER Central assembled matrix format for MUMPS matrix only on host or distributed over processes if desired an analysis report is made

18 ANALYSIS, FACTORIZATION, SOLVE Factorization phase most time consuming phase most memory consuming phase if desired a report about the factorization is made pivot strategy required only once?

19 ANALYSIS, FACTORIZATION, SOLVE Solution phase Post-processing: iterative refinement Error analysis Compute r = Ax b then max i=1,,m r i < 1 E 12 Let x cont be the solution of the continuous problem then Residu : x x cont 2 or Residu : max i=1,,m x(i) x cont(i)

20 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS 5 EXAMPLE OF USE 6 CONCLUSIONS AND REMARKS

21 2D POISSON PROBLEM Solve U(x, y) = ( 2 x + 2 ) U(x, y) 2 y 2 using a 4-pt centered 2-nd order difference scheme. 2D POISSON PROBLEM WITH KNOWN SOLUTION U(x, y) = exp ( C((x x 0 ) 2 + (y y 0 ) 2 )) U(x, y) = ( 4C + 4C 2 ((x x 0 ) 2 + (y y 0 ) 2 )) exp ( C((x x 0 ) 2 + (y y 0 ) 2 )) on an uniform grid defined on x [0, 1] and y [0, 1] and C {,, 10 4, 10 6 } and x 0 = y 0 = 0.5

22 2D P OISSON PROBLEM Y (d) C = 10 Y X (e) C = X Y (f) C = 104 X Y (g) C = 106 X

23 Residu (2-norm) 2D POISSON PROBLEM D Poisson problem accuracy C= C= C=10 4 C= D Reordering phase on 1 node C=1 C= C=10 4 C= nx = ny (h) convergence 2D Factorize phase on 1 node C= C= C=10 4 C= nx = ny (i) reordering PARDISO 2D Solution phase on 1 node C= C= C=10 4 C= nx = ny (j) factorization PARDISO nx = ny (k) solution PARDISO

24 3D POISSON PROBLEM Solve U(x, y, z) = ( 2 x y + 2 ) U(x, y, z) 2 z2 using a 6-pt centered 2-nd order difference scheme. 3D POISSON PROBLEM WITH KNOWN SOLUTION U(x, y, z) = exp ( C((x x 0 ) 2 + (y y 0 ) 2 ) + (z z 0 ) 2 ) U(x, y, z) = ( 4C + 4C 2 ((x x 0 ) 2 + (y y 0 ) 2 ) + (z z 0 ) 2 ) exp ( C((x x 0 ) 2 + (y y 0 ) 2 + (z z 0 ) 2 )) on an uniform grid defined on x [0, 1], y [0, 1] and z [0, 1] and C {,, 10 4, 10 6 } and x 0 = y 0 = z 0 = 0.5

25 3D POISSON PROBLEM CLUSTER_SPARSE_SOLVER residu(max norm) 3D Poisson problem accuracy 12 cores CLUSTER D Reordering phase 12 cores CLUSTER_SPARSE_SOLVER (l) convergence D Factorize phase 12 cores CLUSTER (m) reordering 3D Solution phase 12 cores CLUSTER_SPARSE_SOLVER 10 3 (n) factorization (o) solution

26 3D POISSON PROBLEM CLUSTER_SPARSE_SOLVER 3D Reordering phase 12 cores CLUSTER_SPARSE_SOLVER D Factorize phase 12 cores CLUSTER 3D Solution phase 12 cores CLUSTER_SPARSE_SOLVER 10 3 (p) reordering (q) factorization (r) solution 3D Reordering phase 24 cores CLUSTER_SPARSE_SOLVER D Factorize phase 24 cores CLUSTER_SPARSE_SOLVER D Solution phase 24 cores CLUSTER_SPARSE_SOLVER 10 3 (s) reordering (t) factorization (u) solution FIGURE: Number of cores per node 12 (upper) and 24 (lower) figures

27 3D POISSON PROBLEM MUMPS D Reordering phase MUMPS D Factorize phase MUMPS 3D Solution phase MUMPS 10 3 (a) reordering (b) factorization (c) solution D Reordering phase MUMPS D Factorize phase MUMPS 3D Solution phase MUMPS (d) reordering (e) factorization (f) solution FIGURE: Number of cores per node 12 (upper) and 24 (lower) figures

28 3D POISSON PROBLEM CLUSTER_SPARSE_SOLVER VERSUS MUMPS 3D Reordering phase 24 cores CLUSTER_SPARSE_SOLVER D Factorize phase 24 cores CLUSTER_SPARSE_SOLVER D Solution phase 24 cores CLUSTER_SPARSE_SOLVER 10 3 (a) reordering (b) factorization (c) solution D Reordering phase MUMPS D Factorize phase MUMPS 3D Solution phase MUMPS (d) reordering (e) factorization (f) solution FIGURE: CLUSTER_SPARSE_SOLVER (upper) versus MUMPS (lower) figures; number of cores per node 24

29 3D POISSON PROBLEM MUMPS Speedup Speedup Speedup D Reordering phase MUMPS D Factorize phase MUMPS D Solution phase MUMPS (a) reordering (b) factorization (c) solution FIGURE: Speedup compared with 1 node

30 3D POISSON PROBLEM Analysis report for 3D MUMPS on 64 nodes n x N NZ operations host avg total MBYTES MBYTES MBYTES E E E E E E E E E

31 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS 5 EXAMPLE OF USE 6 CONCLUSIONS AND REMARKS

32 CONCLUSIONS, REMARKS AND QUESTIONS 2D Poisson problems up to n x = n y = 5400 on single node 2D Poisson problems up to n x = n y = on 32 nodes 3D Poisson problems up to n x = n y = n z = 128 on single nodes 3D Poisson problems up to n x = n y = n z = 256 on 64 nodes MUMPS is very suitable for cluster machines CLUSTER_SPARSE_SOLVER can handle larger problems than MUMPS the solution phase of CLUSTER_SPARSE_SOLVER is slower than MUMPS use MKL software where possible also for MUMPS parallelization with MUMPS or CLUSTER_SPARSE_SOLVER is NOT difficult forget about FISHPACK it is no longer the fastest solver results obtained by FISHPACK are not reliable

33 CONCLUSIONS, REMARKS AND QUESTIONS Is it possible to accelerate Anna s code? Is the 3D approach suitable for Anna? More questions

Parallel sparse direct solvers for Poisson s equation in streamer discharges

Parallel sparse direct solvers for Poisson s equation in streamer discharges Margreet Nool, Menno Genseberger 2 and Ute Ebert,3 Centrum Wiskunde & Informatica (CWI), P.O.Box 9479, 9 GB Amsterdam, The Netherlands