An improved parallel Poisson solver for space charge calculation in beam dynamics simulation

Size: px

Start display at page:

Download "An improved parallel Poisson solver for space charge calculation in beam dynamics simulation"

Jerome Bell
5 years ago
Views:

An improved parallel Poisson solver for space charge calculation in beam dynamics simulation DAWEI ZHENG, URSULA VAN RIENEN University of Rostock, Rostock, 18059, Germany JI QIANG LBNL, Berkeley, CA

1 An improved parallel Poisson solver for space charge calculation in beam dynamics simulation DAWEI ZHENG, URSULA VAN RIENEN University of Rostock, Rostock, 18059, Germany JI QIANG LBNL, Berkeley, CA 94720, USA 12th International Computational Accelerator Physics Conference (ICAP 15) Shanghai, 2015 Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 0

2 Outline Green s function (GF) and integrated Green s function (IGF) methods The improved parallel Poisson solver For the pruned Fourier transform: use a novel fast implicit zero-padded FFT For the IGF calculation: use reduced integrated Green s function (RIGF) For Fourier transform for the extended IGF: use discrete cosine transform for RIGF Study of the improved parallel Poisson solver Comparisons with the commonly used Poisson solver The strong scaling and weak scaling study of the parallel solver MPI+OpenMP routine of the parallel solver and the performance comparison Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 0

3 Physics model Lorentz transform x = ˆx y = ŷ z = γẑ Equation of motion d^p l dˆt = q l(^e l +^v l ^B l), d^r l dˆt = ^p l m lγ l t ϕ = ρ ε0 E = ϕ Field calculation ^E = γe ^E = E ^B = γ c 2 v E Field transform A schematic model of bunch motion with transformation between laboratory frame ˆK and rest frame K. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 1

4 Green s function (GF) method Green s function in free space: G(r,r ) = G(x,x,y,y,z,z ) = 1 (x x ) 2 + (y y ) 2 + (z z ) 2 (1) Using the midpoint rule for the integral (GF integral), the potential ϕ reads as ϕ(r l ) 1 (N x,n y,n z ) 4πε 0 l =(1,1,1) ρ(r l ) G(r l,r l ), (2) where l = (i,j,k), l = (i,j,k ) and G(r l,r l ) = h x h y h z G(r l,r l ). Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 2

5 Integrated Green s function (IGF) method In ϕ(r l ) 1 (N x,n y,n z ) 4πε 0 l =(1,1,1) ρ(r l ) G(r l,r l ), (2) where G(r l,r l ) (also as G(r l r l,0)) can be calculated by IGF integral 1 as: r l +h/2 G IGF (r l,r l ) = where h = (h x,h y,h z ) denotes the discrete stepsizes. r l h/2 G(r l,r )dx dy dz, (3) the discrete integrated Green s function (IGF) formula is given by IGF(r,0) = G(r, 0)dxdy dz. (4) 1 J.Qiang, S. Lidia, R.D. Ryne, and C. Limborg-Deprey, Phys. Rev. ST Accel. Beams, vol 9, (2006), and vol 10, (2007). Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 3

6 Integrated Green s function IGF(r, 0). = z2 2 arctan( xy z x 2 + y 2 + z 2 ) y 2 2 arctan( xz y x 2 + y 2 + z 2 ) x 2 2 arctan( yz x x 2 + y 2 + z 2 ) + yz ln(x + x 2 + y 2 + z 2 ) +xz ln(y + x 2 + y 2 + z 2 ) + xy ln(z + x 2 + y 2 + z 2 ) (4) Here, G(r l r l,0) is obtained by the summation of the IGFs over the cell [(r l r l ) h/2,(r l r l ) + h/2]. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 4

7 Hockney and Eastwood s routine of 3D convolution 2 Using 3D discrete Fourier transformation and convolution theory, the potential can be expressed as: [ϕ ex ] i,j,k = 1 4πε 0 FFT 1 {[FFT G ex ] i,j,k [FFTρ ex ] i,j,k } 2Nx,2N y,2n z. (5) G ex (2N x,2n y,2n z ) obtained by the real even symmetric extension of G(N x,n y,n z ) Take 3D FFT of G ex to obtain FFT( G ex ) 2 R.W. Hockney and J. W. Eastwood. Computer Simulation Using Particles. Institut of Physics Publishing, Bristol, (1992). Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 5

8 Extend ρ(n x,n y,n z ) by padding zeros to get ρ ex (2N x,2n y,2n z ) and implement a pruned 3D FFT to obtain FFTρ ex FFT in x FFT in y x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x FFT in z Multiply the FFT( G ex ) to FFT(ρ ex ) with corresponding indices. IFFT(ρ ex ): inverse pruned 3D FFT of FFTρ ex ϕ: First (N x,n y,n z ) values of ϕ ex Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 6

9 Improvement of the pruned Fourier transform Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 6

10 1D fftbackward and fftforward transform 3 Define the unit ω m m = 1, and write the pseudo-code in Fortran style: Algorithm 1 fftpadbackward Input: f Output: f, u 1: function FFTPADBACKWARD 2: for k = 1 n do 3: u(k) ω k 1 2n 4: end for 5: f (:) IFFT[f (:)]; 6: u(:) IFFT[u(:)]; 7: end function f (k); Algorithm 2 fftpadforward Input: f, u Output: f 1: function FFTPADFORWARD 2: f (:) FFT[f (:)]; 3: u(:) FFT[u(:)]; 4: for k = 1 n do 5: f (k) f (k) + ω k+1 6: end for 7: end function 2n u(k); 3 Bowman, John C., and Malcolm Roberts. Efficient Dealiased Convolutions without Padding. SIAM Journal on Scientific Computing 33, 1 (2011.1): Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 7

11 Explicitly and implicitly zero-padded FFT Define f ex = [ ] f and [f,u] 0 n = fftpadbackward n (f ), then 2n [IFFT 2n (f ex ) odd, IFFT 2n (f ex )even] = [f,u]. The fftpadbackward is obtained without the last bit reversal stage (address change) comparing the explicitly zero-padded FFT. The fftpadforward transform is obtained by the inverse procedure. The 2D and 3D fftpadbackward (fftpadforward) are the same procedures as 2D and 3D IFFT (FFT): executing 1D transform along each direction serially. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 8

12 Pad zeros in i direction Exρ i,j,k Exρ i,j,k = [ ] ρ 0 2N i Figure : The 3D backward modified FFT routine. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 9

13 1D real to complex FFT along i and ijk2jik transpose 1D R2C FFT along i Cρ j,i,k Exρ i,j,k and ijk2jik transpose [ ] ρ Exρ i,j,k = 0 2N i Complex vectors N i (Cρ) := N i (ρ) + 1 The 3D backward modified FFT routine. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 9

14 1D complex FFT along j and jik2kij transpose Cρ k,i,je 1D fftpadbackward along j and jik2kij transpose 1D R2C FFT along i Cρ j,i,k Exρ i,j,k and ijk2jik transpose [ ] ρ Exρ i,j,k = 0 2N i Complex vectors N i (Cρ) := N i (ρ) + 1 Cρ k,i,jo Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 9

15 1D complex FFT along k for Cρ k,i,je Cρ k,i,je 1D fftpadbackward along k Cρ ke,i,je Cρ ko,i,je 1D fftpadbackward along j and jik2kij transpose 1D R2C FFT along i Cρ j,i,k Exρ i,j,k and ijk2jik transpose [ ] ρ Exρ i,j,k = 0 2N i Complex vectors N i (Cρ) := N i (ρ) + 1 Cρ k,i,jo Oct 13, 2015 c 2015 UNIVERSITÄTFigure ROSTOCK : FAKULTÄT The 3D FÜR INFORMATIK backward UNDmodified ELEKTROTECHNIK, FFT INSTITUT routine. FÜR ALLGEMEINE ELEKTROTECHNIK 9

16 1D complex FFT along k for Cρ k,i,jo Cρ k,i,je 1D fftpadbackward along k Cρ ke,i,je Cρ ko,i,je 1D fftpadbackward along j and jik2kij transpose 1D R2C FFT along i Cρ j,i,k Exρ i,j,k and ijk2jik transpose [ ] ρ Exρ i,j,k = 0 2N i Complex vectors N i (Cρ) := N i (ρ) + 1 Cρ k,i,jo 1D fftpadbackward along k Cρ ke,i,jo Cρ ko,i,jo Figure : The 3D backward modified FFT routine. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 9

17 Improvement of the IGF calculation and Fourier transform for the extended GF Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 9

18 Improvement of the IGF calculation: using the reduced integrated Green s function (RIGF) The local relative error of the GF integral. The RIGF function shows as: G RIGF (r (i,j,k),0) = { G IGF (r (i,j,k),0) 1 w R w ; IGF portion h x h y h z G(r (i,j,k),0) R w N w + 1; GF portion where the w presents the indices x,y,z respectively, and R w is determined by an adaptive method. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 10,

19 Adaptive accuracy control (z is the longitudinal direction for long bunches) Calculate G IGF (r (1,1,k),0), G GF (r (1,1,k),0), and record δ G(r (1,1,k),0) = G IGF (r (1,1,k),0) G GF (r (1,1,k),0). Locate the stable area of IGF: the first k for which the magnitude of ( G IGFk 1 G IGFk )/ G IGFk 1 drops down to be less than 1/log 2 N z. Record k as k stable. Locate the accuracy tolerance: the proper R z is as the first k, where the magnitude of δ G k /δ G k stable drops down to 10 s, s 0. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 11

20 10 2 ( G IGFk 1 G IGFk )/ G IGFk k ˆηϕ for RIGF method R z Figure : The δ G Rz +1 (left) and the error ˆη ϕ with various R z (right). s is chosen as 1 or 2. When higher accuracy is preferred, we can choose large analogously s for switching (not necessary). Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 12

21 Substitution of Fourier transform for GF values Theorem Define g = {g 1,g 2,...g n+1 }, and let g ex be the real even symmetric extension (RESE) of g, i.e. g ex = {g 1,g 2,...g n+1,g n,g n 1,...g 2 } 2n. Then FFT 2n (RESE(g)) = 2 RESE(DCT n+1 (g)) FFT ( G ex ) can be expressed implicitly by DCT (G) and the RESE of it. The computational complexity reduces significantly from O((log 2 2n)) to O((log 2 n)) besides the RESE. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 13

22 Substitution of 3D FFT for GFs: 3D DCT For 3D: FFT(RESE( G)) = 8 RESE(DCT( G)). Expressing implicitly: define A(w) = FFT(RESE( G))(w), where w = {i, j, k}, w = 1,2,...,N w + 1, and B(W ) = 8 DCT( G)(W ), where W = {I,J,K}, W = 1,2,...,2N w { w, w [1,Nw + 1]; A(w) = B(W ),whenw = 2N w w + 2, w [N w + 2,2N w ];. Theorem The inverse transforms of DCT is the same as DCT itself, apart from a constant factor. Therefore, the IFFT and FFT of real-even symmetry vectors G ex are scaled the same. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 14

23 The RIGF calculation in parallel G k,j,i = RIGF(z, y, x) On each processor The RIGF calculation and the related 3D DCT. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 15

24 The followed 1D DCT along k and kji2ijk transpose G k,j,i = RIGF(z, y, x) On each processor 1D DCT along k and,kji2ijk transpose G i,j,k The RIGF calculation and the related 3D DCT. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 15

25 The 1D DCT along i and ijk2jik transpose G k,j,i = RIGF(z, y, x) On each processor 1D DCT along k and,kji2ijk transpose 1D DCT along i and ijk2jik transpose G i,j,k G j,i,k The RIGF calculation and the related 3D DCT. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 15

26 The 1D DCT along j and extension in j direction with odd/even indices G k,j,i = RIGF(z, y, x) On each processor G je,i,k 1D DCT along k and,kji2ijk transpose 1D DCT along i and ijk2jik transpose G jo,i,k 1D DCT along j and extend j with odd, even indices G i,j,k G j,i,k The RIGF calculation and the related 3D DCT. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 15

27 The jik2kij transpose for G k,i,je and G k,i,jo G k,j,i = RIGF(z, y, x) On each processor 1D DCT along k and,kji2ijk transpose 1D DCT along i and ijk2jik transpose G je,i,k jik2kij transpose G k,i,je 1D DCT along j and extend j with odd, even indices G jo,i,k jik2kij transpose G k,i,jo G i,j,k G j,i,k The Green s function calculation and the related 3D DCT. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 15

28 Multiplication in spectral domain Cρ ke,i,je Cρ ke,i,je G k,i,je Cρ ko,i,je Cρ ko,i,je Cρ ke,i,jx = RESE( G k,i,jx )even Cρ ke,i,jx, Cρ ko,i,jx = RESE( G k,i,jx ) odd Cρ ko,i,jx, where x expresses e (even) or o (odd). Cρ ke,i,jo Cρ ke,i,jo G k,i,jo Cρ ko,i,jo Cρ ko,i,jo Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 16

29 The 3D forward modified FFT routine. Cρ ke,i,je Cρ ko,i,je 1D fftpadforward along k Cρ k,i,je kij2jik transpose and 1D fftpadforward along j Cρ ke,i,jo Cρ ko,i,jo 1D fftpadforward along k Cρ k,i,jo Cρ j,i,k jik2ijk transpose and 1D C2R FFT along i Complex vectors N i (Cρ) := N i (ρ) + 1 Exρ i,j,k Real vectors The 3D forward modified FFT routine. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 17

30 Study of the improved parallel Poisson solver Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 17

31 Comparisons with the commonly used Poisson solver: electric field accuracy Example 1: An ideal uniform ellipsoidal bunch with aspect ratio of 30. (a) IGF at (:, Ny Ny 2,:) (b) RIGF at (:, 2,:) Comparison of the relative error of electric field. (c) IGF at (:,:, Nz Nz 2 ) (d) RIGF at (:,:, 2 ) Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 18

32 Bunch RMS size during tracking in IMPACT-T 4 Example 2: An extremely long bunch, R x = 0.15m,R y = 0.15m,L z = 1.0e5m, at the initial sections of a virtual light source accelerator. dt = 1.0e 12 s, t step = 20,000,000, N x,y,z = 64. A schematic comparison of the RMS size in x, y, z direction (left to right). 4 IMPACT-T: A 3D Parallel Particle Tracking Code in Time Domain jiqiang/impact-t/index.html. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 19

33 Normalized RMS emittance of the bunch during tracking in IMPACT-T A schematic comparison of normalized RMS emittance in x, y, z direction (left to right). Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 20

34 Speed-up study with the commonly used Poisson solver (serial routine, Example 1 ) Comparison of different GF methods elapsed time with increasing grids resolution. N w +1 MGCG IGF+ (updated) H&E RIGF+implicitly zero-padded s s s s s s s s s s s s The computation program is implemented in C and compiled by gcc under Intel Core 2.60 GHz CPU. MGCG multigrid conjugate gradient method. H&E Hockney and Eastwood s method. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 21

35 Speed-up study with the commonly used Poisson solver in IMPACT-T code (parallel routine) Comparison between IGF and RIGF N w N p t RIGF t IGF N w N p t RIGF t IGF s s s s s s s s s s s s The parallel Poisson solver is written in Fortran 90 with Open MPI, compiled by Intel compiler, and executed in the Edison supercomputer at the NERSC. t step = 20,000. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 22

36 Weak scaling study Weak scaling: keep the "problem-process size ratio N x N y N z /N p constant. Figure : The "size-process ratios are 16 3 (left), 32 3 (middle), and 64 3 (right), N p = 1,2,4,..., Time (s) 3 2 Time (s) Time (s) Cores Cores Cores Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 23

37 Strong scaling study Strong scaling: the problems size N x N y N z is fixed while increasing N p. N w starts at 64 (left), 128 (middle) and 256 (right). Np = 1,2,4,..., Time (s) 10 2 Time (s) 10 1 Time (s) Cores Cores Cores Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 24

38 The Open MPI+OpenMP routine The OpenMP (Open Multiprocessing) is a shared memory multiprocessing API (Application Programming Interface), which cooperates with the MPI process together in the same computational node The preparation of OpenMP to the OpenMP library adds to the computing cost The OpenMP parallel parts are mostly for do-loops" The SIMD (single instruction multiple data), up to date, is not further optimized Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 25

39 Time (s) Threads Time (s) Threads Numa_node depth Numa_node depth CPU time for different OpenMP threads per process in use. For Intel compiler: -cc numa_node (NUMA: Non-uniform memory access), -cc depth are two important options for thread affinity in compiling. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 26

40 Conclusion and outlook We reviewed the GF and IGF methods. The improved parallel Poisson solver s routine with three aspects of efficiency improvement: using the RIGF, using the discrete cosine transform for RIGF, and using a novel fast implicit zero-padded convolution routine. Study of the improved parallel Poisson solver: comparisons with commonly used Poisson solver; strong scaling and weak scaling; MPI+OpenMP routine. Further HPC routine update with MIC (Many Integrated Core) Architecture (Open MPI+OpenMP). Limitation of the data transpose with MPI _alltoall. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 27

EFFICIENT 3D POISSON SOLVERS FOR SPACE-CHARGE SIMULATION

Abstract EFFICIENT 3D POISSON SOLVERS FOR SPACE-CHARGE SIMULATION Three-dimensional Poisson solver plays an important role in the self-consistent space-charge simulation. In this paper, we present several