An improved parallel Poisson solver for space charge calculation in beam dynamics simulation

Size: px
Start display at page:

Download "An improved parallel Poisson solver for space charge calculation in beam dynamics simulation"

Transcription

1 An improved parallel Poisson solver for space charge calculation in beam dynamics simulation DAWEI ZHENG, URSULA VAN RIENEN University of Rostock, Rostock, 18059, Germany JI QIANG LBNL, Berkeley, CA 94720, USA 12th International Computational Accelerator Physics Conference (ICAP 15) Shanghai, 2015 Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 0

2 Outline Green s function (GF) and integrated Green s function (IGF) methods The improved parallel Poisson solver For the pruned Fourier transform: use a novel fast implicit zero-padded FFT For the IGF calculation: use reduced integrated Green s function (RIGF) For Fourier transform for the extended IGF: use discrete cosine transform for RIGF Study of the improved parallel Poisson solver Comparisons with the commonly used Poisson solver The strong scaling and weak scaling study of the parallel solver MPI+OpenMP routine of the parallel solver and the performance comparison Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 0

3 Physics model Lorentz transform x = ˆx y = ŷ z = γẑ Equation of motion d^p l dˆt = q l(^e l +^v l ^B l), d^r l dˆt = ^p l m lγ l t ϕ = ρ ε0 E = ϕ Field calculation ^E = γe ^E = E ^B = γ c 2 v E Field transform A schematic model of bunch motion with transformation between laboratory frame ˆK and rest frame K. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 1

4 Green s function (GF) method Green s function in free space: G(r,r ) = G(x,x,y,y,z,z ) = 1 (x x ) 2 + (y y ) 2 + (z z ) 2 (1) Using the midpoint rule for the integral (GF integral), the potential ϕ reads as ϕ(r l ) 1 (N x,n y,n z ) 4πε 0 l =(1,1,1) ρ(r l ) G(r l,r l ), (2) where l = (i,j,k), l = (i,j,k ) and G(r l,r l ) = h x h y h z G(r l,r l ). Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 2

5 Integrated Green s function (IGF) method In ϕ(r l ) 1 (N x,n y,n z ) 4πε 0 l =(1,1,1) ρ(r l ) G(r l,r l ), (2) where G(r l,r l ) (also as G(r l r l,0)) can be calculated by IGF integral 1 as: r l +h/2 G IGF (r l,r l ) = where h = (h x,h y,h z ) denotes the discrete stepsizes. r l h/2 G(r l,r )dx dy dz, (3) the discrete integrated Green s function (IGF) formula is given by IGF(r,0) = G(r, 0)dxdy dz. (4) 1 J.Qiang, S. Lidia, R.D. Ryne, and C. Limborg-Deprey, Phys. Rev. ST Accel. Beams, vol 9, (2006), and vol 10, (2007). Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 3

6 Integrated Green s function IGF(r, 0). = z2 2 arctan( xy z x 2 + y 2 + z 2 ) y 2 2 arctan( xz y x 2 + y 2 + z 2 ) x 2 2 arctan( yz x x 2 + y 2 + z 2 ) + yz ln(x + x 2 + y 2 + z 2 ) +xz ln(y + x 2 + y 2 + z 2 ) + xy ln(z + x 2 + y 2 + z 2 ) (4) Here, G(r l r l,0) is obtained by the summation of the IGFs over the cell [(r l r l ) h/2,(r l r l ) + h/2]. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 4

7 Hockney and Eastwood s routine of 3D convolution 2 Using 3D discrete Fourier transformation and convolution theory, the potential can be expressed as: [ϕ ex ] i,j,k = 1 4πε 0 FFT 1 {[FFT G ex ] i,j,k [FFTρ ex ] i,j,k } 2Nx,2N y,2n z. (5) G ex (2N x,2n y,2n z ) obtained by the real even symmetric extension of G(N x,n y,n z ) Take 3D FFT of G ex to obtain FFT( G ex ) 2 R.W. Hockney and J. W. Eastwood. Computer Simulation Using Particles. Institut of Physics Publishing, Bristol, (1992). Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 5

8 Extend ρ(n x,n y,n z ) by padding zeros to get ρ ex (2N x,2n y,2n z ) and implement a pruned 3D FFT to obtain FFTρ ex FFT in x FFT in y x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x FFT in z Multiply the FFT( G ex ) to FFT(ρ ex ) with corresponding indices. IFFT(ρ ex ): inverse pruned 3D FFT of FFTρ ex ϕ: First (N x,n y,n z ) values of ϕ ex Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 6

9 Improvement of the pruned Fourier transform Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 6

10 1D fftbackward and fftforward transform 3 Define the unit ω m m = 1, and write the pseudo-code in Fortran style: Algorithm 1 fftpadbackward Input: f Output: f, u 1: function FFTPADBACKWARD 2: for k = 1 n do 3: u(k) ω k 1 2n 4: end for 5: f (:) IFFT[f (:)]; 6: u(:) IFFT[u(:)]; 7: end function f (k); Algorithm 2 fftpadforward Input: f, u Output: f 1: function FFTPADFORWARD 2: f (:) FFT[f (:)]; 3: u(:) FFT[u(:)]; 4: for k = 1 n do 5: f (k) f (k) + ω k+1 6: end for 7: end function 2n u(k); 3 Bowman, John C., and Malcolm Roberts. Efficient Dealiased Convolutions without Padding. SIAM Journal on Scientific Computing 33, 1 (2011.1): Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 7

11 Explicitly and implicitly zero-padded FFT Define f ex = [ ] f and [f,u] 0 n = fftpadbackward n (f ), then 2n [IFFT 2n (f ex ) odd, IFFT 2n (f ex )even] = [f,u]. The fftpadbackward is obtained without the last bit reversal stage (address change) comparing the explicitly zero-padded FFT. The fftpadforward transform is obtained by the inverse procedure. The 2D and 3D fftpadbackward (fftpadforward) are the same procedures as 2D and 3D IFFT (FFT): executing 1D transform along each direction serially. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 8

12 Pad zeros in i direction Exρ i,j,k Exρ i,j,k = [ ] ρ 0 2N i Figure : The 3D backward modified FFT routine. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 9

13 1D real to complex FFT along i and ijk2jik transpose 1D R2C FFT along i Cρ j,i,k Exρ i,j,k and ijk2jik transpose [ ] ρ Exρ i,j,k = 0 2N i Complex vectors N i (Cρ) := N i (ρ) + 1 The 3D backward modified FFT routine. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 9

14 1D complex FFT along j and jik2kij transpose Cρ k,i,je 1D fftpadbackward along j and jik2kij transpose 1D R2C FFT along i Cρ j,i,k Exρ i,j,k and ijk2jik transpose [ ] ρ Exρ i,j,k = 0 2N i Complex vectors N i (Cρ) := N i (ρ) + 1 Cρ k,i,jo Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 9

15 1D complex FFT along k for Cρ k,i,je Cρ k,i,je 1D fftpadbackward along k Cρ ke,i,je Cρ ko,i,je 1D fftpadbackward along j and jik2kij transpose 1D R2C FFT along i Cρ j,i,k Exρ i,j,k and ijk2jik transpose [ ] ρ Exρ i,j,k = 0 2N i Complex vectors N i (Cρ) := N i (ρ) + 1 Cρ k,i,jo Oct 13, 2015 c 2015 UNIVERSITÄTFigure ROSTOCK : FAKULTÄT The 3D FÜR INFORMATIK backward UNDmodified ELEKTROTECHNIK, FFT INSTITUT routine. FÜR ALLGEMEINE ELEKTROTECHNIK 9

16 1D complex FFT along k for Cρ k,i,jo Cρ k,i,je 1D fftpadbackward along k Cρ ke,i,je Cρ ko,i,je 1D fftpadbackward along j and jik2kij transpose 1D R2C FFT along i Cρ j,i,k Exρ i,j,k and ijk2jik transpose [ ] ρ Exρ i,j,k = 0 2N i Complex vectors N i (Cρ) := N i (ρ) + 1 Cρ k,i,jo 1D fftpadbackward along k Cρ ke,i,jo Cρ ko,i,jo Figure : The 3D backward modified FFT routine. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 9

17 Improvement of the IGF calculation and Fourier transform for the extended GF Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 9

18 Improvement of the IGF calculation: using the reduced integrated Green s function (RIGF) The local relative error of the GF integral. The RIGF function shows as: G RIGF (r (i,j,k),0) = { G IGF (r (i,j,k),0) 1 w R w ; IGF portion h x h y h z G(r (i,j,k),0) R w N w + 1; GF portion where the w presents the indices x,y,z respectively, and R w is determined by an adaptive method. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 10,

19 Adaptive accuracy control (z is the longitudinal direction for long bunches) Calculate G IGF (r (1,1,k),0), G GF (r (1,1,k),0), and record δ G(r (1,1,k),0) = G IGF (r (1,1,k),0) G GF (r (1,1,k),0). Locate the stable area of IGF: the first k for which the magnitude of ( G IGFk 1 G IGFk )/ G IGFk 1 drops down to be less than 1/log 2 N z. Record k as k stable. Locate the accuracy tolerance: the proper R z is as the first k, where the magnitude of δ G k /δ G k stable drops down to 10 s, s 0. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 11

20 10 2 ( G IGFk 1 G IGFk )/ G IGFk k ˆηϕ for RIGF method R z Figure : The δ G Rz +1 (left) and the error ˆη ϕ with various R z (right). s is chosen as 1 or 2. When higher accuracy is preferred, we can choose large analogously s for switching (not necessary). Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 12

21 Substitution of Fourier transform for GF values Theorem Define g = {g 1,g 2,...g n+1 }, and let g ex be the real even symmetric extension (RESE) of g, i.e. g ex = {g 1,g 2,...g n+1,g n,g n 1,...g 2 } 2n. Then FFT 2n (RESE(g)) = 2 RESE(DCT n+1 (g)) FFT ( G ex ) can be expressed implicitly by DCT (G) and the RESE of it. The computational complexity reduces significantly from O((log 2 2n)) to O((log 2 n)) besides the RESE. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 13

22 Substitution of 3D FFT for GFs: 3D DCT For 3D: FFT(RESE( G)) = 8 RESE(DCT( G)). Expressing implicitly: define A(w) = FFT(RESE( G))(w), where w = {i, j, k}, w = 1,2,...,N w + 1, and B(W ) = 8 DCT( G)(W ), where W = {I,J,K}, W = 1,2,...,2N w { w, w [1,Nw + 1]; A(w) = B(W ),whenw = 2N w w + 2, w [N w + 2,2N w ];. Theorem The inverse transforms of DCT is the same as DCT itself, apart from a constant factor. Therefore, the IFFT and FFT of real-even symmetry vectors G ex are scaled the same. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 14

23 The RIGF calculation in parallel G k,j,i = RIGF(z, y, x) On each processor The RIGF calculation and the related 3D DCT. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 15

24 The followed 1D DCT along k and kji2ijk transpose G k,j,i = RIGF(z, y, x) On each processor 1D DCT along k and,kji2ijk transpose G i,j,k The RIGF calculation and the related 3D DCT. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 15

25 The 1D DCT along i and ijk2jik transpose G k,j,i = RIGF(z, y, x) On each processor 1D DCT along k and,kji2ijk transpose 1D DCT along i and ijk2jik transpose G i,j,k G j,i,k The RIGF calculation and the related 3D DCT. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 15

26 The 1D DCT along j and extension in j direction with odd/even indices G k,j,i = RIGF(z, y, x) On each processor G je,i,k 1D DCT along k and,kji2ijk transpose 1D DCT along i and ijk2jik transpose G jo,i,k 1D DCT along j and extend j with odd, even indices G i,j,k G j,i,k The RIGF calculation and the related 3D DCT. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 15

27 The jik2kij transpose for G k,i,je and G k,i,jo G k,j,i = RIGF(z, y, x) On each processor 1D DCT along k and,kji2ijk transpose 1D DCT along i and ijk2jik transpose G je,i,k jik2kij transpose G k,i,je 1D DCT along j and extend j with odd, even indices G jo,i,k jik2kij transpose G k,i,jo G i,j,k G j,i,k The Green s function calculation and the related 3D DCT. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 15

28 Multiplication in spectral domain Cρ ke,i,je Cρ ke,i,je G k,i,je Cρ ko,i,je Cρ ko,i,je Cρ ke,i,jx = RESE( G k,i,jx )even Cρ ke,i,jx, Cρ ko,i,jx = RESE( G k,i,jx ) odd Cρ ko,i,jx, where x expresses e (even) or o (odd). Cρ ke,i,jo Cρ ke,i,jo G k,i,jo Cρ ko,i,jo Cρ ko,i,jo Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 16

29 The 3D forward modified FFT routine. Cρ ke,i,je Cρ ko,i,je 1D fftpadforward along k Cρ k,i,je kij2jik transpose and 1D fftpadforward along j Cρ ke,i,jo Cρ ko,i,jo 1D fftpadforward along k Cρ k,i,jo Cρ j,i,k jik2ijk transpose and 1D C2R FFT along i Complex vectors N i (Cρ) := N i (ρ) + 1 Exρ i,j,k Real vectors The 3D forward modified FFT routine. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 17

30 Study of the improved parallel Poisson solver Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 17

31 Comparisons with the commonly used Poisson solver: electric field accuracy Example 1: An ideal uniform ellipsoidal bunch with aspect ratio of 30. (a) IGF at (:, Ny Ny 2,:) (b) RIGF at (:, 2,:) Comparison of the relative error of electric field. (c) IGF at (:,:, Nz Nz 2 ) (d) RIGF at (:,:, 2 ) Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 18

32 Bunch RMS size during tracking in IMPACT-T 4 Example 2: An extremely long bunch, R x = 0.15m,R y = 0.15m,L z = 1.0e5m, at the initial sections of a virtual light source accelerator. dt = 1.0e 12 s, t step = 20,000,000, N x,y,z = 64. A schematic comparison of the RMS size in x, y, z direction (left to right). 4 IMPACT-T: A 3D Parallel Particle Tracking Code in Time Domain jiqiang/impact-t/index.html. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 19

33 Normalized RMS emittance of the bunch during tracking in IMPACT-T A schematic comparison of normalized RMS emittance in x, y, z direction (left to right). Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 20

34 Speed-up study with the commonly used Poisson solver (serial routine, Example 1 ) Comparison of different GF methods elapsed time with increasing grids resolution. N w +1 MGCG IGF+ (updated) H&E RIGF+implicitly zero-padded s s s s s s s s s s s s The computation program is implemented in C and compiled by gcc under Intel Core 2.60 GHz CPU. MGCG multigrid conjugate gradient method. H&E Hockney and Eastwood s method. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 21

35 Speed-up study with the commonly used Poisson solver in IMPACT-T code (parallel routine) Comparison between IGF and RIGF N w N p t RIGF t IGF N w N p t RIGF t IGF s s s s s s s s s s s s The parallel Poisson solver is written in Fortran 90 with Open MPI, compiled by Intel compiler, and executed in the Edison supercomputer at the NERSC. t step = 20,000. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 22

36 Weak scaling study Weak scaling: keep the "problem-process size ratio N x N y N z /N p constant. Figure : The "size-process ratios are 16 3 (left), 32 3 (middle), and 64 3 (right), N p = 1,2,4,..., Time (s) 3 2 Time (s) Time (s) Cores Cores Cores Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 23

37 Strong scaling study Strong scaling: the problems size N x N y N z is fixed while increasing N p. N w starts at 64 (left), 128 (middle) and 256 (right). Np = 1,2,4,..., Time (s) 10 2 Time (s) 10 1 Time (s) Cores Cores Cores Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 24

38 The Open MPI+OpenMP routine The OpenMP (Open Multiprocessing) is a shared memory multiprocessing API (Application Programming Interface), which cooperates with the MPI process together in the same computational node The preparation of OpenMP to the OpenMP library adds to the computing cost The OpenMP parallel parts are mostly for do-loops" The SIMD (single instruction multiple data), up to date, is not further optimized Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 25

39 Time (s) Threads Time (s) Threads Numa_node depth Numa_node depth CPU time for different OpenMP threads per process in use. For Intel compiler: -cc numa_node (NUMA: Non-uniform memory access), -cc depth are two important options for thread affinity in compiling. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 26

40 Conclusion and outlook We reviewed the GF and IGF methods. The improved parallel Poisson solver s routine with three aspects of efficiency improvement: using the RIGF, using the discrete cosine transform for RIGF, and using a novel fast implicit zero-padded convolution routine. Study of the improved parallel Poisson solver: comparisons with commonly used Poisson solver; strong scaling and weak scaling; MPI+OpenMP routine. Further HPC routine update with MIC (Many Integrated Core) Architecture (Open MPI+OpenMP). Limitation of the data transpose with MPI _alltoall. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 27

EFFICIENT 3D POISSON SOLVERS FOR SPACE-CHARGE SIMULATION

EFFICIENT 3D POISSON SOLVERS FOR SPACE-CHARGE SIMULATION Abstract EFFICIENT 3D POISSON SOLVERS FOR SPACE-CHARGE SIMULATION Three-dimensional Poisson solver plays an important role in the self-consistent space-charge simulation. In this paper, we present several

More information

Fast Space Charge Calculations with a Multigrid Poisson Solver & Applications

Fast Space Charge Calculations with a Multigrid Poisson Solver & Applications Fast Space Charge Calculations with a Multigrid Poisson Solver & Applications Gisela Pöplau Ursula van Rienen Rostock University DESY, Hamburg, April 26, 2005 Bas van der Geer Marieke de Loos Pulsar Physics

More information

arxiv: v1 [physics.acc-ph] 21 Nov 2011

arxiv: v1 [physics.acc-ph] 21 Nov 2011 arxiv:1111.4971v1 [physics.acc-ph] 21 Nov 2011 On FFT based convolutions and correlations, with application to solving Poisson s equation in an open rectangular pipe Abstract Robert D. Ryne Lawrence Berkeley

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

The Fastest Convolution in the West

The Fastest Convolution in the West The Fastest Convolution in the West Malcolm Roberts and John Bowman University of Alberta 2011-02-15 www.math.ualberta.ca/ mroberts 1 Outline Convolution Definition Applications Fast Convolutions Dealiasing

More information

3D Space Charge Routines: The Software Package MOEVE and FFT Compared

3D Space Charge Routines: The Software Package MOEVE and FFT Compared 3D Space Charge Routines: The Software Package MOEVE and FFT Compared Gisela Pöplau DESY, Hamburg, December 4, 2007 Overview Algorithms for 3D space charge calculations Properties of FFT and iterative

More information

Multiscale Modelling, taking into account Collisions

Multiscale Modelling, taking into account Collisions Multiscale Modelling, taking into account Collisions Andreas Adelmann (Paul Scherrer Institut) October 28, 2017 Multiscale Modelling, taking into account Collisions October 28, 2017 Page 1 / 22 1 Motivation

More information

Large Scale Parallel Wake Field Computations with PBCI

Large Scale Parallel Wake Field Computations with PBCI Large Scale Parallel Wake Field Computations with PBCI E. Gjonaj, X. Dong, R. Hampel, M. Kärkkäinen, T. Lau, W.F.O. Müller and T. Weiland ICAP `06 Chamonix, 2-6 October 2006 Technische Universität Darmstadt,

More information

arxiv: v1 [physics.comp-ph] 22 Nov 2012

arxiv: v1 [physics.comp-ph] 22 Nov 2012 A Customized 3D GPU Poisson Solver for Free BCs Nazim Dugan a, Luigi Genovese b, Stefan Goedecker a, a Department of Physics, University of Basel, Klingelbergstr. 82, 4056 Basel, Switzerland b Laboratoire

More information

ERLANGEN REGIONAL COMPUTING CENTER

ERLANGEN REGIONAL COMPUTING CENTER ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,

More information

Accelerator Theory and Simulation at Rostock University

Accelerator Theory and Simulation at Rostock University Accelerator Theory and Simulation at Rostock University Hans-Walter Glock, Thomas Flisgen Gisela Pöplau Linear Collider Forum, 1st Meeting, June 14-15, 2010 DESY, Hamburg 15.06.2010 2010 UNIVERSITÄT ROSTOCK

More information

A particle-in-cell method with adaptive phase-space remapping for kinetic plasmas

A particle-in-cell method with adaptive phase-space remapping for kinetic plasmas A particle-in-cell method with adaptive phase-space remapping for kinetic plasmas Bei Wang 1 Greg Miller 2 Phil Colella 3 1 Princeton Institute of Computational Science and Engineering Princeton University

More information

EXASCALE COMPUTING. Implementation of a 2D Electrostatic Particle in Cell algorithm in UniÞed Parallel C with dynamic load-balancing

EXASCALE COMPUTING. Implementation of a 2D Electrostatic Particle in Cell algorithm in UniÞed Parallel C with dynamic load-balancing ExaScience Lab Intel Labs Europe EXASCALE COMPUTING Implementation of a 2D Electrostatic Particle in Cell algorithm in UniÞed Parallel C with dynamic load-balancing B. Verleye P. Henry R. Wuyts G. Lapenta

More information

Carlo Cavazzoni, HPC department, CINECA

Carlo Cavazzoni, HPC department, CINECA Large Scale Parallelism Carlo Cavazzoni, HPC department, CINECA Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have a mixed architecture + accelerators

More information

Tangent Plane. Linear Approximation. The Gradient

Tangent Plane. Linear Approximation. The Gradient Calculus 3 Lia Vas Tangent Plane. Linear Approximation. The Gradient The tangent plane. Let z = f(x, y) be a function of two variables with continuous partial derivatives. Recall that the vectors 1, 0,

More information

Finite Element Methods with Linear B-Splines

Finite Element Methods with Linear B-Splines Finite Element Methods with Linear B-Splines K. Höllig, J. Hörner and M. Pfeil Universität Stuttgart Institut für Mathematische Methoden in den Ingenieurwissenschaften, Numerik und geometrische Modellierung

More information

Laboratory Report Rostock University

Laboratory Report Rostock University Laboratory Report Rostock University TESLA Collaboration Meeting DESY, September 15-17, 2003 Karsten Rothemund, Jelena Maksimovic, Viktor Maksimovic, Gisela Pöplau, Dirk Hecht, Ulla van Rienen Partly supported

More information

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville Performance of the fusion code GYRO on three four generations of Crays Mark Fahey mfahey@utk.edu University of Tennessee, Knoxville Contents Introduction GYRO Overview Benchmark Problem Test Platforms

More information

Arbitrary High-Order Discontinuous Galerkin Method for Solving Electromagnetic Field Problems in Time Domain

Arbitrary High-Order Discontinuous Galerkin Method for Solving Electromagnetic Field Problems in Time Domain Arbitrary High-Order Discontinuous Galerkin Method for Solving Electromagnetic Field Problems in Time Domain KAI PAPKE, CHRISTIAN R. BAHLS AND URSULA VAN RIENEN University of Rostock, Institute of General

More information

Practical in Numerical Astronomy, SS 2012 LECTURE 9

Practical in Numerical Astronomy, SS 2012 LECTURE 9 Practical in Numerical Astronomy, SS 01 Elliptic partial differential equations. Poisson solvers. LECTURE 9 1. Gravity force and the equations of hydrodynamics. Poisson equation versus Poisson integral.

More information

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM

More information

Introduction to Practical FFT and NFFT

Introduction to Practical FFT and NFFT Introduction to Practical FFT and NFFT Michael Pippig and Daniel Potts Faculty of Mathematics Chemnitz University of Technology 07.09.2010 supported by BMBF grant 01IH08001B Table of Contents 1 Serial

More information

Logo. A Massively-Parallel Multicore Acceleration of a Point Contact Solid Mechanics Simulation DRAFT

Logo. A Massively-Parallel Multicore Acceleration of a Point Contact Solid Mechanics Simulation DRAFT Paper 1 Logo Civil-Comp Press, 2017 Proceedings of the Fifth International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering, P. Iványi, B.H.V Topping and G. Várady (Editors)

More information

The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows

The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows Vanja Zecevic, Michael Kirkpatrick and Steven Armfield Department of Aerospace Mechanical & Mechatronic Engineering The University of

More information

Numerical simulation of the Gross-Pitaevskii equation by pseudo-spectral and finite element methods comparison of GPS code and FreeFem++

Numerical simulation of the Gross-Pitaevskii equation by pseudo-spectral and finite element methods comparison of GPS code and FreeFem++ . p.1/13 10 ec. 2014, LJLL, Paris FreeFem++ workshop : BECASIM session Numerical simulation of the Gross-Pitaevskii equation by pseudo-spectral and finite element methods comparison of GPS code and FreeFem++

More information

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Katharina Kormann 1 Klaus Reuter 2 Markus Rampp 2 Eric Sonnendrücker 1 1 Max Planck Institut für Plasmaphysik 2 Max Planck Computing

More information

Lab Report TEMF - TU Darmstadt

Lab Report TEMF - TU Darmstadt Institut für Theorie Elektromagnetischer Felder Lab Report TEMF - TU Darmstadt W. Ackermann, R. Hampel, M. Kunze T. Lau, W.F.O. Müller, S. Setzer, T. Weiland, I. Zagorodnov TESLA Collaboration Meeting

More information

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul

More information

Efficient algorithms for symmetric tensor contractions

Efficient algorithms for symmetric tensor contractions Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to

More information

Fourier methods. Lukas Palatinus. Institute of Physics AS CR, Prague, Czechia

Fourier methods. Lukas Palatinus. Institute of Physics AS CR, Prague, Czechia Fourier methods Lukas Palatinus Institute of Physics AS CR, Prague, Czechia Outline Fourier transform: mathematician s viewpoint crystallographer s viewpoint programmer s viewpoint Practical aspects libraries,

More information

Homework #1 Solution

Homework #1 Solution February 7, 4 Department of Electrical and Computer Engineering University of Wisconsin Madison ECE 734 VLSI Array Structures for Digital Signal Processing Homework # Solution Due: February 6, 4 in class.

More information

Efficient Algorithms for the Fast Computation of Space Charge Effects Caused by Charged Particles in Particle Accelerators

Efficient Algorithms for the Fast Computation of Space Charge Effects Caused by Charged Particles in Particle Accelerators Efficient Algorithms for the Fast Computation of Space Charge Effects Caused by Charged Particles in Particle Accelerators Dissertation zur Erlangung des akademischen Grades Doktor-Ingenieur (Dr.-Ing.)

More information

State-of-the-Art Channel Coding

State-of-the-Art Channel Coding Institut für State-of-the-Art Channel Coding Prof. Dr.-Ing. Volker Kühn Institute of Communications Engineering University of Rostock, Germany Email: volker.kuehn@uni-rostock.de http://www.int.uni-rostock.de/

More information

Artificial Intelligence & Neuro Cognitive Systems Fakultät für Informatik. Robot Dynamics. Dr.-Ing. John Nassour J.

Artificial Intelligence & Neuro Cognitive Systems Fakultät für Informatik. Robot Dynamics. Dr.-Ing. John Nassour J. Artificial Intelligence & Neuro Cognitive Systems Fakultät für Informatik Robot Dynamics Dr.-Ing. John Nassour 25.1.218 J.Nassour 1 Introduction Dynamics concerns the motion of bodies Includes Kinematics

More information

A Hybrid Method for the Wave Equation. beilina

A Hybrid Method for the Wave Equation.   beilina A Hybrid Method for the Wave Equation http://www.math.unibas.ch/ beilina 1 The mathematical model The model problem is the wave equation 2 u t 2 = (a 2 u) + f, x Ω R 3, t > 0, (1) u(x, 0) = 0, x Ω, (2)

More information

HPMPC - A new software package with efficient solvers for Model Predictive Control

HPMPC - A new software package with efficient solvers for Model Predictive Control - A new software package with efficient solvers for Model Predictive Control Technical University of Denmark CITIES Second General Consortium Meeting, DTU, Lyngby Campus, 26-27 May 2015 Introduction Model

More information

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse

More information

25. Chain Rule. Now, f is a function of t only. Expand by multiplication:

25. Chain Rule. Now, f is a function of t only. Expand by multiplication: 25. Chain Rule The Chain Rule is present in all differentiation. If z = f(x, y) represents a two-variable function, then it is plausible to consider the cases when x and y may be functions of other variable(s).

More information

Periodic motions. Periodic motions are known since the beginning of mankind: Motion of planets around the Sun; Pendulum; And many more...

Periodic motions. Periodic motions are known since the beginning of mankind: Motion of planets around the Sun; Pendulum; And many more... Periodic motions Periodic motions are known since the beginning of mankind: Motion of planets around the Sun; Pendulum; And many more... Periodic motions There are several quantities which describe a periodic

More information

! Circular Convolution. " Linear convolution with circular convolution. ! Discrete Fourier Transform. " Linear convolution through circular

! Circular Convolution.  Linear convolution with circular convolution. ! Discrete Fourier Transform.  Linear convolution through circular Previously ESE 531: Digital Signal Processing Lec 22: April 18, 2017 Fast Fourier Transform (con t)! Circular Convolution " Linear convolution with circular convolution! Discrete Fourier Transform " Linear

More information

Introduction of a Stabilized Bi-Conjugate Gradient iterative solver for Helmholtz s Equation on the CMA GRAPES Global and Regional models.

Introduction of a Stabilized Bi-Conjugate Gradient iterative solver for Helmholtz s Equation on the CMA GRAPES Global and Regional models. Introduction of a Stabilized Bi-Conjugate Gradient iterative solver for Helmholtz s Equation on the CMA GRAPES Global and Regional models. Peng Hong Bo (IBM), Zaphiris Christidis (Lenovo) and Zhiyan Jin

More information

AN FFT-BASED APPROACH IN ACCELERATION OF DISCRETE GREEN S FUNCTION METHOD FOR AN- TENNA ANALYSIS

AN FFT-BASED APPROACH IN ACCELERATION OF DISCRETE GREEN S FUNCTION METHOD FOR AN- TENNA ANALYSIS Progress In Electromagnetics Research M, Vol. 29, 17 28, 2013 AN FFT-BASED APPROACH IN ACCELERATION OF DISCRETE GREEN S FUNCTION METHOD FOR AN- TENNA ANALYSIS Salma Mirhadi *, Mohammad Soleimani, and Ali

More information

Introduction to Practical FFT and NFFT

Introduction to Practical FFT and NFFT Introduction to Practical FFT and NFFT Michael Pippig and Daniel Potts Department of Mathematics Chemnitz University of Technology September 14, 211 supported by BMBF grant 1IH81B Table of Contents 1 Serial

More information

ORBIT Code Review and Future Directions. S. Cousineau, A. Shishlo, J. Holmes ECloud07

ORBIT Code Review and Future Directions. S. Cousineau, A. Shishlo, J. Holmes ECloud07 ORBIT Code Review and Future Directions S. Cousineau, A. Shishlo, J. Holmes ECloud07 ORBIT Code ORBIT (Objective Ring Beam Injection and Transport code) ORBIT is an object-oriented, open-source code started

More information

16. Solution of elliptic partial differential equation

16. Solution of elliptic partial differential equation 16. Solution of elliptic partial differential equation Recall in the first lecture of this course. Assume you know how to use a computer to compute; but have not done any serious numerical computations

More information

Novel Simulation Methods in the code-framework Warp

Novel Simulation Methods in the code-framework Warp Novel Simulation Methods in the code-framework Warp J.-L. Vay1,3, D.P. Grote2,3, R.H. Cohen2,3, A. Friedman2,3, S.M. Lund2,3, E. Cormier-Michel1, W.M. Fawley1, M.A. Furman1, C.G.R. Geddes1 1Lawrence Berkeley

More information

Hybrid parallelization of a pseudo-spectral DNS code and its computational performance on RZG s idataplex system Hydra

Hybrid parallelization of a pseudo-spectral DNS code and its computational performance on RZG s idataplex system Hydra Hybrid parallelization of a pseudo-spectral DNS code and its computational performance on RZG s idataplex system Hydra Markus Rampp 1, Liang Shi 2, Marc Avila 3,2, Björn Hof 2,4 1 Computing Center of the

More information

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing

More information

Chain Rule. MATH 311, Calculus III. J. Robert Buchanan. Spring Department of Mathematics

Chain Rule. MATH 311, Calculus III. J. Robert Buchanan. Spring Department of Mathematics 3.33pt Chain Rule MATH 311, Calculus III J. Robert Buchanan Department of Mathematics Spring 2019 Single Variable Chain Rule Suppose y = g(x) and z = f (y) then dz dx = d (f (g(x))) dx = f (g(x))g (x)

More information

FAS and Solver Performance

FAS and Solver Performance FAS and Solver Performance Matthew Knepley Mathematics and Computer Science Division Argonne National Laboratory Fall AMS Central Section Meeting Chicago, IL Oct 05 06, 2007 M. Knepley (ANL) FAS AMS 07

More information

Introduction to numerical computations on the GPU

Introduction to numerical computations on the GPU Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming

More information

Curvilinear Coordinates

Curvilinear Coordinates University of Alabama Department of Physics and Astronomy PH 106-4 / LeClair Fall 2008 Curvilinear Coordinates Note that we use the convention that the cartesian unit vectors are ˆx, ŷ, and ẑ, rather than

More information

Mass-conserving and positive-definite semi-lagrangian advection in NCEP GFS: Decomposition for massively parallel computing without halo

Mass-conserving and positive-definite semi-lagrangian advection in NCEP GFS: Decomposition for massively parallel computing without halo Mass-conserving and positive-definite semi-lagrangian advection in NCEP GFS: Decomposition for massively parallel computing without halo Hann-Ming Henry Juang Environmental Modeling Center, NCEP NWS,NOAA,DOC,USA

More information

The Fast Multipole Method in molecular dynamics

The Fast Multipole Method in molecular dynamics The Fast Multipole Method in molecular dynamics Berk Hess KTH Royal Institute of Technology, Stockholm, Sweden ADAC6 workshop Zurich, 20-06-2018 Slide BioExcel Slide Molecular Dynamics of biomolecules

More information

Beam dynamics calculation

Beam dynamics calculation September 6 Beam dynamics calculation S.B. Vorozhtsov, Е.Е. Perepelkin and V.L. Smirnov Dubna, JINR http://parallel-compute.com Outline Problem formulation Numerical methods OpenMP and CUDA realization

More information

Turbulence in geodynamo simulations

Turbulence in geodynamo simulations Turbulence in geodynamo simulations Nathanaël Schaeffer, A. Fournier, D. Jault, J. Aubert, H-C. Nataf,... ISTerre / CNRS / Université Grenoble Alpes Journée des utilisateurs CIMENT, Grenoble, 23 June 2016

More information

Interpolation and function representation with grids in stochastic control

Interpolation and function representation with grids in stochastic control with in classical Sparse for Sparse for with in FIME EDF R&D & FiME, Laboratoire de Finance des Marchés de l Energie (www.fime-lab.org) September 2015 Schedule with in classical Sparse for Sparse for 1

More information

Application of Wavelets to N body Particle In Cell Simulations

Application of Wavelets to N body Particle In Cell Simulations Application of Wavelets to N body Particle In Cell Simulations Balša Terzić (NIU) Northern Illinois University, Department of Physics April 7, 2006 Motivation Gaining insight into the dynamics of multi

More information

Find the indicated derivative. 1) Find y(4) if y = 3 sin x. A) y(4) = 3 cos x B) y(4) = 3 sin x C) y(4) = - 3 cos x D) y(4) = - 3 sin x

Find the indicated derivative. 1) Find y(4) if y = 3 sin x. A) y(4) = 3 cos x B) y(4) = 3 sin x C) y(4) = - 3 cos x D) y(4) = - 3 sin x Assignment 5 Name Find the indicated derivative. ) Find y(4) if y = sin x. ) A) y(4) = cos x B) y(4) = sin x y(4) = - cos x y(4) = - sin x ) y = (csc x + cot x)(csc x - cot x) ) A) y = 0 B) y = y = - csc

More information

Domain Decomposition-based contour integration eigenvalue solvers

Domain Decomposition-based contour integration eigenvalue solvers Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM

More information

Beam-Beam Simulation for PEP-II

Beam-Beam Simulation for PEP-II Beam-Beam Simulation for PEP-II Yunhai Cai Accelerator Research Department-A, SLAC e + e Factories 2003, SLAC October 14, 2003 Outline Method of simulation: - Boundary condition - Parallel computing -

More information

PDE Solvers for Fluid Flow

PDE Solvers for Fluid Flow PDE Solvers for Fluid Flow issues and algorithms for the Streaming Supercomputer Eran Guendelman February 5, 2002 Topics Equations for incompressible fluid flow 3 model PDEs: Hyperbolic, Elliptic, Parabolic

More information

(a) The points (3, 1, 2) and ( 1, 3, 4) are the endpoints of a diameter of a sphere.

(a) The points (3, 1, 2) and ( 1, 3, 4) are the endpoints of a diameter of a sphere. MATH 4 FINAL EXAM REVIEW QUESTIONS Problem. a) The points,, ) and,, 4) are the endpoints of a diameter of a sphere. i) Determine the center and radius of the sphere. ii) Find an equation for the sphere.

More information

Optimization of Particle-In-Cell simulations for Vlasov-Poisson system with strong magnetic field

Optimization of Particle-In-Cell simulations for Vlasov-Poisson system with strong magnetic field Optimization of Particle-In-Cell simulations for Vlasov-Poisson system with strong magnetic field Edwin Chacon-Golcher Sever A. Hirstoaga Mathieu Lutz Abstract We study the dynamics of charged particles

More information

Solving PDEs with CUDA Jonathan Cohen

Solving PDEs with CUDA Jonathan Cohen Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear

More information

Computation of photoelectron and Auger-electron diffraction III. Evaluation of angle-resolved intensities PAD3

Computation of photoelectron and Auger-electron diffraction III. Evaluation of angle-resolved intensities PAD3 Computer Physics Communications 112 (1998) 911101 Computation of photoelectron and Auger-electron diffraction III. Evaluation of angle-resolved intensities PAD3 X.Chen,G.R.Harp 1,Y.Ueda,D.K.Saldin 2 Department

More information

A simple Concept for the Performance Analysis of Cluster-Computing

A simple Concept for the Performance Analysis of Cluster-Computing A simple Concept for the Performance Analysis of Cluster-Computing H. Kredel 1, S. Richling 2, J.P. Kruse 3, E. Strohmaier 4, H.G. Kruse 1 1 IT-Center, University of Mannheim, Germany 2 IT-Center, University

More information

Speeding up simulations of relativistic systems using an optimal boosted frame

Speeding up simulations of relativistic systems using an optimal boosted frame Speeding up simulations of relativistic systems using an optimal boosted frame J.-L. Vay1,3, W. M. Fawley1, C. G. R. Geddes1, E. Cormier-Michel1, D. P. Grote2,3 1Lawrence Berkeley National Laboratory,

More information

Some notes on efficient computing and setting up high performance computing environments

Some notes on efficient computing and setting up high performance computing environments Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient

More information

Particle in Cell method

Particle in Cell method Particle in Cell method Birdsall and Langdon: Plasma Physics via Computer Simulation Dawson: Particle simulation of plasmas Hockney and Eastwood: Computer Simulations using Particles we start with an electrostatic

More information

Performance Evaluation of Scientific Applications on POWER8

Performance Evaluation of Scientific Applications on POWER8 Performance Evaluation of Scientific Applications on POWER8 2014 Nov 16 Andrew V. Adinetz 1, Paul F. Baumeister 1, Hans Böttiger 3, Thorsten Hater 1, Thilo Maurer 3, Dirk Pleiter 1, Wolfram Schenck 4,

More information

Direct Numerical Simulations of converging-diverging channel flow

Direct Numerical Simulations of converging-diverging channel flow Intro Numerical code Results Conclusion Direct Numerical Simulations of converging-diverging channel flow J.-P. Laval (1), M. Marquillie (1) Jean-Philippe.Laval@univ-lille1.fr (1) Laboratoire de Me canique

More information

arxiv: v2 [math.na] 21 Apr 2014

arxiv: v2 [math.na] 21 Apr 2014 Combustion Theory and Modelling Vol. 00, No. 00, Month 200x, 1 25 RESEARCH ARTICLE High-Order Algorithms for Compressible Reacting Flow with Complex Chemistry arxiv:1309.7327v2 [math.na] 21 Apr 2014 M.

More information

SOLUTIONS, PROBLEM SET 11

SOLUTIONS, PROBLEM SET 11 SOLUTIONS, PROBLEM SET 11 1 In this problem we investigate the Lagrangian formulation of dynamics in a rotating frame. Consider a frame of reference which we will consider to be inertial. Suppose that

More information

Spectral simulations of wall-bounded flows on massively-parallel computers

Spectral simulations of wall-bounded flows on massively-parallel computers Paper 1 1 Spectral simulations of wall-bounded flows on massively-parallel computers By Qiang Li, Philipp Schlatter & Dan S. Henningson Linné Flow Centre, KTH Mechanics SE-100 44 Stockholm, Sweden 1.

More information

ELEG 305: Digital Signal Processing

ELEG 305: Digital Signal Processing ELEG 305: Digital Signal Processing Lecture 18: Applications of FFT Algorithms & Linear Filtering DFT Computation; Implementation of Discrete Time Systems Kenneth E. Barner Department of Electrical and

More information

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Yuta Hirokawa Graduate School of Systems and Information Engineering, University of Tsukuba hirokawa@hpcs.cs.tsukuba.ac.jp

More information

Cyclops Tensor Framework

Cyclops Tensor Framework Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r

More information

Metrics and Curvature

Metrics and Curvature Metrics and Curvature How to measure curvature? Metrics Euclidian/Minkowski Curved spaces General 4 dimensional space Cosmological principle Homogeneity and isotropy: evidence Robertson-Walker metrics

More information

Parallel Sparse Tensor Decompositions using HiCOO Format

Parallel Sparse Tensor Decompositions using HiCOO Format Figure sources: A brief survey of tensors by Berton Earnshaw and NVIDIA Tensor Cores Parallel Sparse Tensor Decompositions using HiCOO Format Jiajia Li, Jee Choi, Richard Vuduc May 8, 8 @ SIAM ALA 8 Outline

More information

Lattice Quantum Chromodynamics on the MIC architectures

Lattice Quantum Chromodynamics on the MIC architectures Lattice Quantum Chromodynamics on the MIC architectures Piotr Korcyl Universität Regensburg Intel MIC Programming Workshop @ LRZ 28 June 2017 Piotr Korcyl Lattice Quantum Chromodynamics on the MIC 1/ 25

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:305:45 CBC C222 Lecture 8 Frequency Analysis 14/02/18 http://www.ee.unlv.edu/~b1morris/ee482/

More information

S12.1 SOLUTIONS TO PROBLEMS 12 (ODD NUMBERS)

S12.1 SOLUTIONS TO PROBLEMS 12 (ODD NUMBERS) OLUTION TO PROBLEM 2 (ODD NUMBER) 2. The electric field is E = φ = 2xi + 2y j and at (2, ) E = 4i + 2j. Thus E = 2 5 and its direction is 2i + j. At ( 3, 2), φ = 6i + 4 j. Thus the direction of most rapid

More information

FYS 3120: Classical Mechanics and Electrodynamics

FYS 3120: Classical Mechanics and Electrodynamics FYS 3120: Classical Mechanics and Electrodynamics Formula Collection Spring semester 2014 1 Analytical Mechanics The Lagrangian L = L(q, q, t), (1) is a function of the generalized coordinates q = {q i

More information

Selected electrostatic problems (Lecture 14)

Selected electrostatic problems (Lecture 14) Selected electrostatic problems (Lecture 14) February 1, 2016 236/441 Lecture outline In this lecture, we demonstrate how to solve several electrostatic problems related to calculation of the fields generated

More information

WRF performance tuning for the Intel Woodcrest Processor

WRF performance tuning for the Intel Woodcrest Processor WRF performance tuning for the Intel Woodcrest Processor A. Semenov, T. Kashevarova, P. Mankevich, D. Shkurko, K. Arturov, N. Panov Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {alexander.l.semenov,tamara.p.kashevarova,pavel.v.mankevich,

More information

Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems

Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems G. Hager HPC Services, Computing Center Erlangen, Germany E. Jeckelmann Theoretical Physics, Univ.

More information

Edwin Chacon-Golcher 1, Sever A. Hirstoaga 2 and Mathieu Lutz 3. Introduction

Edwin Chacon-Golcher 1, Sever A. Hirstoaga 2 and Mathieu Lutz 3. Introduction ESAIM: PROCEEDINGS AND SURVEYS, March 2016, Vol. 53, p. 177-190 M. Campos Pinto and F. Charles, Editors OPTIMIZATION OF PARTICLE-IN-CELL SIMULATIONS FOR VLASOV-POISSON SYSTEM WITH STRONG MAGNETIC FIELD

More information

Electromagnetically Induced Flows in Water

Electromagnetically Induced Flows in Water Electromagnetically Induced Flows in Water Michiel de Reus 8 maart 213 () Electromagnetically Induced Flows 1 / 56 Outline 1 Introduction 2 Maxwell equations Complex Maxwell equations 3 Gaussian sources

More information

R. Glenn Brook, Bilel Hadri*, Vincent C. Betro, Ryan C. Hulguin, and Ryan Braby Cray Users Group 2012 Stuttgart, Germany April 29 May 3, 2012

R. Glenn Brook, Bilel Hadri*, Vincent C. Betro, Ryan C. Hulguin, and Ryan Braby Cray Users Group 2012 Stuttgart, Germany April 29 May 3, 2012 R. Glenn Brook, Bilel Hadri*, Vincent C. Betro, Ryan C. Hulguin, and Ryan Braby Cray Users Group 2012 Stuttgart, Germany April 29 May 3, 2012 * presenting author Contents Overview on AACE Overview on MIC

More information

Convolutional networks. Sebastian Seung

Convolutional networks. Sebastian Seung Convolutional networks Sebastian Seung Convolutional network Neural network with spatial organization every neuron has a location usually on a grid Translation invariance synaptic strength depends on locations

More information

Particle-in-Cell Code BEAMPATH for Beam Dynamics Simulations in Linear Accelerators and Beamlines*

Particle-in-Cell Code BEAMPATH for Beam Dynamics Simulations in Linear Accelerators and Beamlines* SLAC-PUB-1081 5 October 004 Particle-in-Cell Code BEAMPATH for Beam Dynamics Simulations in Linear Accelerators and Beamlines* Yuri K. Batygin Stanford Linear Accelerator Center, Stanford University, Stanford,

More information

Graduate Accelerator Physics. G. A. Krafft Jefferson Lab Old Dominion University Lecture 1

Graduate Accelerator Physics. G. A. Krafft Jefferson Lab Old Dominion University Lecture 1 Graduate Accelerator Physics G. A. Krafft Jefferson Lab Old Dominion University Lecture 1 Course Outline Course Content Introduction to Accelerators and Short Historical Overview Basic Units and Definitions

More information

EOS 250 Geophysical Fields and Fluxes Volume Integrals

EOS 250 Geophysical Fields and Fluxes Volume Integrals EOS 25 Geophysical Fields and Fluxes olume Integrals February 17, 21 Overview These notes cover the following: Quantities defined per unit volume : volume integrals olume Integrals We have seen in the

More information

IAMG Proceedings. The projective method approach in the electromagnetic integral equations solver. presenting author

IAMG Proceedings. The projective method approach in the electromagnetic integral equations solver. presenting author The projective method approach in the electromagnetic integral equations solver M. KRUGLYAKOV 1 AND L. BLOSHANSKAYA 2 1 Lomonsov Moscow State University, Russia m.kruglyakov@gmail.com 2 SUNY New Paltz,

More information

Numerical Modelling in Fortran: day 10. Paul Tackley, 2016

Numerical Modelling in Fortran: day 10. Paul Tackley, 2016 Numerical Modelling in Fortran: day 10 Paul Tackley, 2016 Today s Goals 1. Useful libraries and other software 2. Implicit time stepping 3. Projects: Agree on topic (by final lecture) (No lecture next

More information

The Shallow Water Equations

The Shallow Water Equations If you have not already done so, you are strongly encouraged to read the companion file on the non-divergent barotropic vorticity equation, before proceeding to this shallow water case. We do not repeat

More information

arxiv: v1 [hep-lat] 7 Oct 2010

arxiv: v1 [hep-lat] 7 Oct 2010 arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA

More information

2.20 Fall 2018 Math Review

2.20 Fall 2018 Math Review 2.20 Fall 2018 Math Review September 10, 2018 These notes are to help you through the math used in this class. This is just a refresher, so if you never learned one of these topics you should look more

More information