An improved parallel Poisson solver for space charge calculation in beam dynamics simulation
|
|
- Jerome Bell
- 5 years ago
- Views:
Transcription
1 An improved parallel Poisson solver for space charge calculation in beam dynamics simulation DAWEI ZHENG, URSULA VAN RIENEN University of Rostock, Rostock, 18059, Germany JI QIANG LBNL, Berkeley, CA 94720, USA 12th International Computational Accelerator Physics Conference (ICAP 15) Shanghai, 2015 Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 0
2 Outline Green s function (GF) and integrated Green s function (IGF) methods The improved parallel Poisson solver For the pruned Fourier transform: use a novel fast implicit zero-padded FFT For the IGF calculation: use reduced integrated Green s function (RIGF) For Fourier transform for the extended IGF: use discrete cosine transform for RIGF Study of the improved parallel Poisson solver Comparisons with the commonly used Poisson solver The strong scaling and weak scaling study of the parallel solver MPI+OpenMP routine of the parallel solver and the performance comparison Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 0
3 Physics model Lorentz transform x = ˆx y = ŷ z = γẑ Equation of motion d^p l dˆt = q l(^e l +^v l ^B l), d^r l dˆt = ^p l m lγ l t ϕ = ρ ε0 E = ϕ Field calculation ^E = γe ^E = E ^B = γ c 2 v E Field transform A schematic model of bunch motion with transformation between laboratory frame ˆK and rest frame K. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 1
4 Green s function (GF) method Green s function in free space: G(r,r ) = G(x,x,y,y,z,z ) = 1 (x x ) 2 + (y y ) 2 + (z z ) 2 (1) Using the midpoint rule for the integral (GF integral), the potential ϕ reads as ϕ(r l ) 1 (N x,n y,n z ) 4πε 0 l =(1,1,1) ρ(r l ) G(r l,r l ), (2) where l = (i,j,k), l = (i,j,k ) and G(r l,r l ) = h x h y h z G(r l,r l ). Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 2
5 Integrated Green s function (IGF) method In ϕ(r l ) 1 (N x,n y,n z ) 4πε 0 l =(1,1,1) ρ(r l ) G(r l,r l ), (2) where G(r l,r l ) (also as G(r l r l,0)) can be calculated by IGF integral 1 as: r l +h/2 G IGF (r l,r l ) = where h = (h x,h y,h z ) denotes the discrete stepsizes. r l h/2 G(r l,r )dx dy dz, (3) the discrete integrated Green s function (IGF) formula is given by IGF(r,0) = G(r, 0)dxdy dz. (4) 1 J.Qiang, S. Lidia, R.D. Ryne, and C. Limborg-Deprey, Phys. Rev. ST Accel. Beams, vol 9, (2006), and vol 10, (2007). Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 3
6 Integrated Green s function IGF(r, 0). = z2 2 arctan( xy z x 2 + y 2 + z 2 ) y 2 2 arctan( xz y x 2 + y 2 + z 2 ) x 2 2 arctan( yz x x 2 + y 2 + z 2 ) + yz ln(x + x 2 + y 2 + z 2 ) +xz ln(y + x 2 + y 2 + z 2 ) + xy ln(z + x 2 + y 2 + z 2 ) (4) Here, G(r l r l,0) is obtained by the summation of the IGFs over the cell [(r l r l ) h/2,(r l r l ) + h/2]. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 4
7 Hockney and Eastwood s routine of 3D convolution 2 Using 3D discrete Fourier transformation and convolution theory, the potential can be expressed as: [ϕ ex ] i,j,k = 1 4πε 0 FFT 1 {[FFT G ex ] i,j,k [FFTρ ex ] i,j,k } 2Nx,2N y,2n z. (5) G ex (2N x,2n y,2n z ) obtained by the real even symmetric extension of G(N x,n y,n z ) Take 3D FFT of G ex to obtain FFT( G ex ) 2 R.W. Hockney and J. W. Eastwood. Computer Simulation Using Particles. Institut of Physics Publishing, Bristol, (1992). Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 5
8 Extend ρ(n x,n y,n z ) by padding zeros to get ρ ex (2N x,2n y,2n z ) and implement a pruned 3D FFT to obtain FFTρ ex FFT in x FFT in y x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x FFT in z Multiply the FFT( G ex ) to FFT(ρ ex ) with corresponding indices. IFFT(ρ ex ): inverse pruned 3D FFT of FFTρ ex ϕ: First (N x,n y,n z ) values of ϕ ex Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 6
9 Improvement of the pruned Fourier transform Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 6
10 1D fftbackward and fftforward transform 3 Define the unit ω m m = 1, and write the pseudo-code in Fortran style: Algorithm 1 fftpadbackward Input: f Output: f, u 1: function FFTPADBACKWARD 2: for k = 1 n do 3: u(k) ω k 1 2n 4: end for 5: f (:) IFFT[f (:)]; 6: u(:) IFFT[u(:)]; 7: end function f (k); Algorithm 2 fftpadforward Input: f, u Output: f 1: function FFTPADFORWARD 2: f (:) FFT[f (:)]; 3: u(:) FFT[u(:)]; 4: for k = 1 n do 5: f (k) f (k) + ω k+1 6: end for 7: end function 2n u(k); 3 Bowman, John C., and Malcolm Roberts. Efficient Dealiased Convolutions without Padding. SIAM Journal on Scientific Computing 33, 1 (2011.1): Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 7
11 Explicitly and implicitly zero-padded FFT Define f ex = [ ] f and [f,u] 0 n = fftpadbackward n (f ), then 2n [IFFT 2n (f ex ) odd, IFFT 2n (f ex )even] = [f,u]. The fftpadbackward is obtained without the last bit reversal stage (address change) comparing the explicitly zero-padded FFT. The fftpadforward transform is obtained by the inverse procedure. The 2D and 3D fftpadbackward (fftpadforward) are the same procedures as 2D and 3D IFFT (FFT): executing 1D transform along each direction serially. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 8
12 Pad zeros in i direction Exρ i,j,k Exρ i,j,k = [ ] ρ 0 2N i Figure : The 3D backward modified FFT routine. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 9
13 1D real to complex FFT along i and ijk2jik transpose 1D R2C FFT along i Cρ j,i,k Exρ i,j,k and ijk2jik transpose [ ] ρ Exρ i,j,k = 0 2N i Complex vectors N i (Cρ) := N i (ρ) + 1 The 3D backward modified FFT routine. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 9
14 1D complex FFT along j and jik2kij transpose Cρ k,i,je 1D fftpadbackward along j and jik2kij transpose 1D R2C FFT along i Cρ j,i,k Exρ i,j,k and ijk2jik transpose [ ] ρ Exρ i,j,k = 0 2N i Complex vectors N i (Cρ) := N i (ρ) + 1 Cρ k,i,jo Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 9
15 1D complex FFT along k for Cρ k,i,je Cρ k,i,je 1D fftpadbackward along k Cρ ke,i,je Cρ ko,i,je 1D fftpadbackward along j and jik2kij transpose 1D R2C FFT along i Cρ j,i,k Exρ i,j,k and ijk2jik transpose [ ] ρ Exρ i,j,k = 0 2N i Complex vectors N i (Cρ) := N i (ρ) + 1 Cρ k,i,jo Oct 13, 2015 c 2015 UNIVERSITÄTFigure ROSTOCK : FAKULTÄT The 3D FÜR INFORMATIK backward UNDmodified ELEKTROTECHNIK, FFT INSTITUT routine. FÜR ALLGEMEINE ELEKTROTECHNIK 9
16 1D complex FFT along k for Cρ k,i,jo Cρ k,i,je 1D fftpadbackward along k Cρ ke,i,je Cρ ko,i,je 1D fftpadbackward along j and jik2kij transpose 1D R2C FFT along i Cρ j,i,k Exρ i,j,k and ijk2jik transpose [ ] ρ Exρ i,j,k = 0 2N i Complex vectors N i (Cρ) := N i (ρ) + 1 Cρ k,i,jo 1D fftpadbackward along k Cρ ke,i,jo Cρ ko,i,jo Figure : The 3D backward modified FFT routine. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 9
17 Improvement of the IGF calculation and Fourier transform for the extended GF Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 9
18 Improvement of the IGF calculation: using the reduced integrated Green s function (RIGF) The local relative error of the GF integral. The RIGF function shows as: G RIGF (r (i,j,k),0) = { G IGF (r (i,j,k),0) 1 w R w ; IGF portion h x h y h z G(r (i,j,k),0) R w N w + 1; GF portion where the w presents the indices x,y,z respectively, and R w is determined by an adaptive method. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 10,
19 Adaptive accuracy control (z is the longitudinal direction for long bunches) Calculate G IGF (r (1,1,k),0), G GF (r (1,1,k),0), and record δ G(r (1,1,k),0) = G IGF (r (1,1,k),0) G GF (r (1,1,k),0). Locate the stable area of IGF: the first k for which the magnitude of ( G IGFk 1 G IGFk )/ G IGFk 1 drops down to be less than 1/log 2 N z. Record k as k stable. Locate the accuracy tolerance: the proper R z is as the first k, where the magnitude of δ G k /δ G k stable drops down to 10 s, s 0. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 11
20 10 2 ( G IGFk 1 G IGFk )/ G IGFk k ˆηϕ for RIGF method R z Figure : The δ G Rz +1 (left) and the error ˆη ϕ with various R z (right). s is chosen as 1 or 2. When higher accuracy is preferred, we can choose large analogously s for switching (not necessary). Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 12
21 Substitution of Fourier transform for GF values Theorem Define g = {g 1,g 2,...g n+1 }, and let g ex be the real even symmetric extension (RESE) of g, i.e. g ex = {g 1,g 2,...g n+1,g n,g n 1,...g 2 } 2n. Then FFT 2n (RESE(g)) = 2 RESE(DCT n+1 (g)) FFT ( G ex ) can be expressed implicitly by DCT (G) and the RESE of it. The computational complexity reduces significantly from O((log 2 2n)) to O((log 2 n)) besides the RESE. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 13
22 Substitution of 3D FFT for GFs: 3D DCT For 3D: FFT(RESE( G)) = 8 RESE(DCT( G)). Expressing implicitly: define A(w) = FFT(RESE( G))(w), where w = {i, j, k}, w = 1,2,...,N w + 1, and B(W ) = 8 DCT( G)(W ), where W = {I,J,K}, W = 1,2,...,2N w { w, w [1,Nw + 1]; A(w) = B(W ),whenw = 2N w w + 2, w [N w + 2,2N w ];. Theorem The inverse transforms of DCT is the same as DCT itself, apart from a constant factor. Therefore, the IFFT and FFT of real-even symmetry vectors G ex are scaled the same. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 14
23 The RIGF calculation in parallel G k,j,i = RIGF(z, y, x) On each processor The RIGF calculation and the related 3D DCT. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 15
24 The followed 1D DCT along k and kji2ijk transpose G k,j,i = RIGF(z, y, x) On each processor 1D DCT along k and,kji2ijk transpose G i,j,k The RIGF calculation and the related 3D DCT. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 15
25 The 1D DCT along i and ijk2jik transpose G k,j,i = RIGF(z, y, x) On each processor 1D DCT along k and,kji2ijk transpose 1D DCT along i and ijk2jik transpose G i,j,k G j,i,k The RIGF calculation and the related 3D DCT. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 15
26 The 1D DCT along j and extension in j direction with odd/even indices G k,j,i = RIGF(z, y, x) On each processor G je,i,k 1D DCT along k and,kji2ijk transpose 1D DCT along i and ijk2jik transpose G jo,i,k 1D DCT along j and extend j with odd, even indices G i,j,k G j,i,k The RIGF calculation and the related 3D DCT. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 15
27 The jik2kij transpose for G k,i,je and G k,i,jo G k,j,i = RIGF(z, y, x) On each processor 1D DCT along k and,kji2ijk transpose 1D DCT along i and ijk2jik transpose G je,i,k jik2kij transpose G k,i,je 1D DCT along j and extend j with odd, even indices G jo,i,k jik2kij transpose G k,i,jo G i,j,k G j,i,k The Green s function calculation and the related 3D DCT. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 15
28 Multiplication in spectral domain Cρ ke,i,je Cρ ke,i,je G k,i,je Cρ ko,i,je Cρ ko,i,je Cρ ke,i,jx = RESE( G k,i,jx )even Cρ ke,i,jx, Cρ ko,i,jx = RESE( G k,i,jx ) odd Cρ ko,i,jx, where x expresses e (even) or o (odd). Cρ ke,i,jo Cρ ke,i,jo G k,i,jo Cρ ko,i,jo Cρ ko,i,jo Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 16
29 The 3D forward modified FFT routine. Cρ ke,i,je Cρ ko,i,je 1D fftpadforward along k Cρ k,i,je kij2jik transpose and 1D fftpadforward along j Cρ ke,i,jo Cρ ko,i,jo 1D fftpadforward along k Cρ k,i,jo Cρ j,i,k jik2ijk transpose and 1D C2R FFT along i Complex vectors N i (Cρ) := N i (ρ) + 1 Exρ i,j,k Real vectors The 3D forward modified FFT routine. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 17
30 Study of the improved parallel Poisson solver Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 17
31 Comparisons with the commonly used Poisson solver: electric field accuracy Example 1: An ideal uniform ellipsoidal bunch with aspect ratio of 30. (a) IGF at (:, Ny Ny 2,:) (b) RIGF at (:, 2,:) Comparison of the relative error of electric field. (c) IGF at (:,:, Nz Nz 2 ) (d) RIGF at (:,:, 2 ) Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 18
32 Bunch RMS size during tracking in IMPACT-T 4 Example 2: An extremely long bunch, R x = 0.15m,R y = 0.15m,L z = 1.0e5m, at the initial sections of a virtual light source accelerator. dt = 1.0e 12 s, t step = 20,000,000, N x,y,z = 64. A schematic comparison of the RMS size in x, y, z direction (left to right). 4 IMPACT-T: A 3D Parallel Particle Tracking Code in Time Domain jiqiang/impact-t/index.html. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 19
33 Normalized RMS emittance of the bunch during tracking in IMPACT-T A schematic comparison of normalized RMS emittance in x, y, z direction (left to right). Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 20
34 Speed-up study with the commonly used Poisson solver (serial routine, Example 1 ) Comparison of different GF methods elapsed time with increasing grids resolution. N w +1 MGCG IGF+ (updated) H&E RIGF+implicitly zero-padded s s s s s s s s s s s s The computation program is implemented in C and compiled by gcc under Intel Core 2.60 GHz CPU. MGCG multigrid conjugate gradient method. H&E Hockney and Eastwood s method. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 21
35 Speed-up study with the commonly used Poisson solver in IMPACT-T code (parallel routine) Comparison between IGF and RIGF N w N p t RIGF t IGF N w N p t RIGF t IGF s s s s s s s s s s s s The parallel Poisson solver is written in Fortran 90 with Open MPI, compiled by Intel compiler, and executed in the Edison supercomputer at the NERSC. t step = 20,000. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 22
36 Weak scaling study Weak scaling: keep the "problem-process size ratio N x N y N z /N p constant. Figure : The "size-process ratios are 16 3 (left), 32 3 (middle), and 64 3 (right), N p = 1,2,4,..., Time (s) 3 2 Time (s) Time (s) Cores Cores Cores Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 23
37 Strong scaling study Strong scaling: the problems size N x N y N z is fixed while increasing N p. N w starts at 64 (left), 128 (middle) and 256 (right). Np = 1,2,4,..., Time (s) 10 2 Time (s) 10 1 Time (s) Cores Cores Cores Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 24
38 The Open MPI+OpenMP routine The OpenMP (Open Multiprocessing) is a shared memory multiprocessing API (Application Programming Interface), which cooperates with the MPI process together in the same computational node The preparation of OpenMP to the OpenMP library adds to the computing cost The OpenMP parallel parts are mostly for do-loops" The SIMD (single instruction multiple data), up to date, is not further optimized Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 25
39 Time (s) Threads Time (s) Threads Numa_node depth Numa_node depth CPU time for different OpenMP threads per process in use. For Intel compiler: -cc numa_node (NUMA: Non-uniform memory access), -cc depth are two important options for thread affinity in compiling. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 26
40 Conclusion and outlook We reviewed the GF and IGF methods. The improved parallel Poisson solver s routine with three aspects of efficiency improvement: using the RIGF, using the discrete cosine transform for RIGF, and using a novel fast implicit zero-padded convolution routine. Study of the improved parallel Poisson solver: comparisons with commonly used Poisson solver; strong scaling and weak scaling; MPI+OpenMP routine. Further HPC routine update with MIC (Many Integrated Core) Architecture (Open MPI+OpenMP). Limitation of the data transpose with MPI _alltoall. Oct 13, 2015 c 2015 UNIVERSITÄT ROSTOCK FAKULTÄT FÜR INFORMATIK UND ELEKTROTECHNIK, INSTITUT FÜR ALLGEMEINE ELEKTROTECHNIK 27
EFFICIENT 3D POISSON SOLVERS FOR SPACE-CHARGE SIMULATION
Abstract EFFICIENT 3D POISSON SOLVERS FOR SPACE-CHARGE SIMULATION Three-dimensional Poisson solver plays an important role in the self-consistent space-charge simulation. In this paper, we present several
More informationFast Space Charge Calculations with a Multigrid Poisson Solver & Applications
Fast Space Charge Calculations with a Multigrid Poisson Solver & Applications Gisela Pöplau Ursula van Rienen Rostock University DESY, Hamburg, April 26, 2005 Bas van der Geer Marieke de Loos Pulsar Physics
More informationarxiv: v1 [physics.acc-ph] 21 Nov 2011
arxiv:1111.4971v1 [physics.acc-ph] 21 Nov 2011 On FFT based convolutions and correlations, with application to solving Poisson s equation in an open rectangular pipe Abstract Robert D. Ryne Lawrence Berkeley
More informationSPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS
More informationThe Fastest Convolution in the West
The Fastest Convolution in the West Malcolm Roberts and John Bowman University of Alberta 2011-02-15 www.math.ualberta.ca/ mroberts 1 Outline Convolution Definition Applications Fast Convolutions Dealiasing
More information3D Space Charge Routines: The Software Package MOEVE and FFT Compared
3D Space Charge Routines: The Software Package MOEVE and FFT Compared Gisela Pöplau DESY, Hamburg, December 4, 2007 Overview Algorithms for 3D space charge calculations Properties of FFT and iterative
More informationMultiscale Modelling, taking into account Collisions
Multiscale Modelling, taking into account Collisions Andreas Adelmann (Paul Scherrer Institut) October 28, 2017 Multiscale Modelling, taking into account Collisions October 28, 2017 Page 1 / 22 1 Motivation
More informationLarge Scale Parallel Wake Field Computations with PBCI
Large Scale Parallel Wake Field Computations with PBCI E. Gjonaj, X. Dong, R. Hampel, M. Kärkkäinen, T. Lau, W.F.O. Müller and T. Weiland ICAP `06 Chamonix, 2-6 October 2006 Technische Universität Darmstadt,
More informationarxiv: v1 [physics.comp-ph] 22 Nov 2012
A Customized 3D GPU Poisson Solver for Free BCs Nazim Dugan a, Luigi Genovese b, Stefan Goedecker a, a Department of Physics, University of Basel, Klingelbergstr. 82, 4056 Basel, Switzerland b Laboratoire
More informationERLANGEN REGIONAL COMPUTING CENTER
ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,
More informationAccelerator Theory and Simulation at Rostock University
Accelerator Theory and Simulation at Rostock University Hans-Walter Glock, Thomas Flisgen Gisela Pöplau Linear Collider Forum, 1st Meeting, June 14-15, 2010 DESY, Hamburg 15.06.2010 2010 UNIVERSITÄT ROSTOCK
More informationA particle-in-cell method with adaptive phase-space remapping for kinetic plasmas
A particle-in-cell method with adaptive phase-space remapping for kinetic plasmas Bei Wang 1 Greg Miller 2 Phil Colella 3 1 Princeton Institute of Computational Science and Engineering Princeton University
More informationEXASCALE COMPUTING. Implementation of a 2D Electrostatic Particle in Cell algorithm in UniÞed Parallel C with dynamic load-balancing
ExaScience Lab Intel Labs Europe EXASCALE COMPUTING Implementation of a 2D Electrostatic Particle in Cell algorithm in UniÞed Parallel C with dynamic load-balancing B. Verleye P. Henry R. Wuyts G. Lapenta
More informationCarlo Cavazzoni, HPC department, CINECA
Large Scale Parallelism Carlo Cavazzoni, HPC department, CINECA Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have a mixed architecture + accelerators
More informationTangent Plane. Linear Approximation. The Gradient
Calculus 3 Lia Vas Tangent Plane. Linear Approximation. The Gradient The tangent plane. Let z = f(x, y) be a function of two variables with continuous partial derivatives. Recall that the vectors 1, 0,
More informationFinite Element Methods with Linear B-Splines
Finite Element Methods with Linear B-Splines K. Höllig, J. Hörner and M. Pfeil Universität Stuttgart Institut für Mathematische Methoden in den Ingenieurwissenschaften, Numerik und geometrische Modellierung
More informationLaboratory Report Rostock University
Laboratory Report Rostock University TESLA Collaboration Meeting DESY, September 15-17, 2003 Karsten Rothemund, Jelena Maksimovic, Viktor Maksimovic, Gisela Pöplau, Dirk Hecht, Ulla van Rienen Partly supported
More informationPerformance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville
Performance of the fusion code GYRO on three four generations of Crays Mark Fahey mfahey@utk.edu University of Tennessee, Knoxville Contents Introduction GYRO Overview Benchmark Problem Test Platforms
More informationArbitrary High-Order Discontinuous Galerkin Method for Solving Electromagnetic Field Problems in Time Domain
Arbitrary High-Order Discontinuous Galerkin Method for Solving Electromagnetic Field Problems in Time Domain KAI PAPKE, CHRISTIAN R. BAHLS AND URSULA VAN RIENEN University of Rostock, Institute of General
More informationPractical in Numerical Astronomy, SS 2012 LECTURE 9
Practical in Numerical Astronomy, SS 01 Elliptic partial differential equations. Poisson solvers. LECTURE 9 1. Gravity force and the equations of hydrodynamics. Poisson equation versus Poisson integral.
More informationTowards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters
Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM
More informationIntroduction to Practical FFT and NFFT
Introduction to Practical FFT and NFFT Michael Pippig and Daniel Potts Faculty of Mathematics Chemnitz University of Technology 07.09.2010 supported by BMBF grant 01IH08001B Table of Contents 1 Serial
More informationLogo. A Massively-Parallel Multicore Acceleration of a Point Contact Solid Mechanics Simulation DRAFT
Paper 1 Logo Civil-Comp Press, 2017 Proceedings of the Fifth International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering, P. Iványi, B.H.V Topping and G. Várady (Editors)
More informationThe Lattice Boltzmann Method for Laminar and Turbulent Channel Flows
The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows Vanja Zecevic, Michael Kirkpatrick and Steven Armfield Department of Aerospace Mechanical & Mechatronic Engineering The University of
More informationNumerical simulation of the Gross-Pitaevskii equation by pseudo-spectral and finite element methods comparison of GPS code and FreeFem++
. p.1/13 10 ec. 2014, LJLL, Paris FreeFem++ workshop : BECASIM session Numerical simulation of the Gross-Pitaevskii equation by pseudo-spectral and finite element methods comparison of GPS code and FreeFem++
More informationMassively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem
Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Katharina Kormann 1 Klaus Reuter 2 Markus Rampp 2 Eric Sonnendrücker 1 1 Max Planck Institut für Plasmaphysik 2 Max Planck Computing
More informationLab Report TEMF - TU Darmstadt
Institut für Theorie Elektromagnetischer Felder Lab Report TEMF - TU Darmstadt W. Ackermann, R. Hampel, M. Kunze T. Lau, W.F.O. Müller, S. Setzer, T. Weiland, I. Zagorodnov TESLA Collaboration Meeting
More informationModel Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University
Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul
More informationEfficient algorithms for symmetric tensor contractions
Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to
More informationFourier methods. Lukas Palatinus. Institute of Physics AS CR, Prague, Czechia
Fourier methods Lukas Palatinus Institute of Physics AS CR, Prague, Czechia Outline Fourier transform: mathematician s viewpoint crystallographer s viewpoint programmer s viewpoint Practical aspects libraries,
More informationHomework #1 Solution
February 7, 4 Department of Electrical and Computer Engineering University of Wisconsin Madison ECE 734 VLSI Array Structures for Digital Signal Processing Homework # Solution Due: February 6, 4 in class.
More informationEfficient Algorithms for the Fast Computation of Space Charge Effects Caused by Charged Particles in Particle Accelerators
Efficient Algorithms for the Fast Computation of Space Charge Effects Caused by Charged Particles in Particle Accelerators Dissertation zur Erlangung des akademischen Grades Doktor-Ingenieur (Dr.-Ing.)
More informationState-of-the-Art Channel Coding
Institut für State-of-the-Art Channel Coding Prof. Dr.-Ing. Volker Kühn Institute of Communications Engineering University of Rostock, Germany Email: volker.kuehn@uni-rostock.de http://www.int.uni-rostock.de/
More informationArtificial Intelligence & Neuro Cognitive Systems Fakultät für Informatik. Robot Dynamics. Dr.-Ing. John Nassour J.
Artificial Intelligence & Neuro Cognitive Systems Fakultät für Informatik Robot Dynamics Dr.-Ing. John Nassour 25.1.218 J.Nassour 1 Introduction Dynamics concerns the motion of bodies Includes Kinematics
More informationA Hybrid Method for the Wave Equation. beilina
A Hybrid Method for the Wave Equation http://www.math.unibas.ch/ beilina 1 The mathematical model The model problem is the wave equation 2 u t 2 = (a 2 u) + f, x Ω R 3, t > 0, (1) u(x, 0) = 0, x Ω, (2)
More informationHPMPC - A new software package with efficient solvers for Model Predictive Control
- A new software package with efficient solvers for Model Predictive Control Technical University of Denmark CITIES Second General Consortium Meeting, DTU, Lyngby Campus, 26-27 May 2015 Introduction Model
More informationStatic-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems
Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse
More information25. Chain Rule. Now, f is a function of t only. Expand by multiplication:
25. Chain Rule The Chain Rule is present in all differentiation. If z = f(x, y) represents a two-variable function, then it is plausible to consider the cases when x and y may be functions of other variable(s).
More informationPeriodic motions. Periodic motions are known since the beginning of mankind: Motion of planets around the Sun; Pendulum; And many more...
Periodic motions Periodic motions are known since the beginning of mankind: Motion of planets around the Sun; Pendulum; And many more... Periodic motions There are several quantities which describe a periodic
More information! Circular Convolution. " Linear convolution with circular convolution. ! Discrete Fourier Transform. " Linear convolution through circular
Previously ESE 531: Digital Signal Processing Lec 22: April 18, 2017 Fast Fourier Transform (con t)! Circular Convolution " Linear convolution with circular convolution! Discrete Fourier Transform " Linear
More informationIntroduction of a Stabilized Bi-Conjugate Gradient iterative solver for Helmholtz s Equation on the CMA GRAPES Global and Regional models.
Introduction of a Stabilized Bi-Conjugate Gradient iterative solver for Helmholtz s Equation on the CMA GRAPES Global and Regional models. Peng Hong Bo (IBM), Zaphiris Christidis (Lenovo) and Zhiyan Jin
More informationAN FFT-BASED APPROACH IN ACCELERATION OF DISCRETE GREEN S FUNCTION METHOD FOR AN- TENNA ANALYSIS
Progress In Electromagnetics Research M, Vol. 29, 17 28, 2013 AN FFT-BASED APPROACH IN ACCELERATION OF DISCRETE GREEN S FUNCTION METHOD FOR AN- TENNA ANALYSIS Salma Mirhadi *, Mohammad Soleimani, and Ali
More informationIntroduction to Practical FFT and NFFT
Introduction to Practical FFT and NFFT Michael Pippig and Daniel Potts Department of Mathematics Chemnitz University of Technology September 14, 211 supported by BMBF grant 1IH81B Table of Contents 1 Serial
More informationORBIT Code Review and Future Directions. S. Cousineau, A. Shishlo, J. Holmes ECloud07
ORBIT Code Review and Future Directions S. Cousineau, A. Shishlo, J. Holmes ECloud07 ORBIT Code ORBIT (Objective Ring Beam Injection and Transport code) ORBIT is an object-oriented, open-source code started
More information16. Solution of elliptic partial differential equation
16. Solution of elliptic partial differential equation Recall in the first lecture of this course. Assume you know how to use a computer to compute; but have not done any serious numerical computations
More informationNovel Simulation Methods in the code-framework Warp
Novel Simulation Methods in the code-framework Warp J.-L. Vay1,3, D.P. Grote2,3, R.H. Cohen2,3, A. Friedman2,3, S.M. Lund2,3, E. Cormier-Michel1, W.M. Fawley1, M.A. Furman1, C.G.R. Geddes1 1Lawrence Berkeley
More informationHybrid parallelization of a pseudo-spectral DNS code and its computational performance on RZG s idataplex system Hydra
Hybrid parallelization of a pseudo-spectral DNS code and its computational performance on RZG s idataplex system Hydra Markus Rampp 1, Liang Shi 2, Marc Avila 3,2, Björn Hof 2,4 1 Computing Center of the
More informationParallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29
Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing
More informationChain Rule. MATH 311, Calculus III. J. Robert Buchanan. Spring Department of Mathematics
3.33pt Chain Rule MATH 311, Calculus III J. Robert Buchanan Department of Mathematics Spring 2019 Single Variable Chain Rule Suppose y = g(x) and z = f (y) then dz dx = d (f (g(x))) dx = f (g(x))g (x)
More informationFAS and Solver Performance
FAS and Solver Performance Matthew Knepley Mathematics and Computer Science Division Argonne National Laboratory Fall AMS Central Section Meeting Chicago, IL Oct 05 06, 2007 M. Knepley (ANL) FAS AMS 07
More informationIntroduction to numerical computations on the GPU
Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming
More informationCurvilinear Coordinates
University of Alabama Department of Physics and Astronomy PH 106-4 / LeClair Fall 2008 Curvilinear Coordinates Note that we use the convention that the cartesian unit vectors are ˆx, ŷ, and ẑ, rather than
More informationMass-conserving and positive-definite semi-lagrangian advection in NCEP GFS: Decomposition for massively parallel computing without halo
Mass-conserving and positive-definite semi-lagrangian advection in NCEP GFS: Decomposition for massively parallel computing without halo Hann-Ming Henry Juang Environmental Modeling Center, NCEP NWS,NOAA,DOC,USA
More informationThe Fast Multipole Method in molecular dynamics
The Fast Multipole Method in molecular dynamics Berk Hess KTH Royal Institute of Technology, Stockholm, Sweden ADAC6 workshop Zurich, 20-06-2018 Slide BioExcel Slide Molecular Dynamics of biomolecules
More informationBeam dynamics calculation
September 6 Beam dynamics calculation S.B. Vorozhtsov, Е.Е. Perepelkin and V.L. Smirnov Dubna, JINR http://parallel-compute.com Outline Problem formulation Numerical methods OpenMP and CUDA realization
More informationTurbulence in geodynamo simulations
Turbulence in geodynamo simulations Nathanaël Schaeffer, A. Fournier, D. Jault, J. Aubert, H-C. Nataf,... ISTerre / CNRS / Université Grenoble Alpes Journée des utilisateurs CIMENT, Grenoble, 23 June 2016
More informationInterpolation and function representation with grids in stochastic control
with in classical Sparse for Sparse for with in FIME EDF R&D & FiME, Laboratoire de Finance des Marchés de l Energie (www.fime-lab.org) September 2015 Schedule with in classical Sparse for Sparse for 1
More informationApplication of Wavelets to N body Particle In Cell Simulations
Application of Wavelets to N body Particle In Cell Simulations Balša Terzić (NIU) Northern Illinois University, Department of Physics April 7, 2006 Motivation Gaining insight into the dynamics of multi
More informationFind the indicated derivative. 1) Find y(4) if y = 3 sin x. A) y(4) = 3 cos x B) y(4) = 3 sin x C) y(4) = - 3 cos x D) y(4) = - 3 sin x
Assignment 5 Name Find the indicated derivative. ) Find y(4) if y = sin x. ) A) y(4) = cos x B) y(4) = sin x y(4) = - cos x y(4) = - sin x ) y = (csc x + cot x)(csc x - cot x) ) A) y = 0 B) y = y = - csc
More informationDomain Decomposition-based contour integration eigenvalue solvers
Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM
More informationBeam-Beam Simulation for PEP-II
Beam-Beam Simulation for PEP-II Yunhai Cai Accelerator Research Department-A, SLAC e + e Factories 2003, SLAC October 14, 2003 Outline Method of simulation: - Boundary condition - Parallel computing -
More informationPDE Solvers for Fluid Flow
PDE Solvers for Fluid Flow issues and algorithms for the Streaming Supercomputer Eran Guendelman February 5, 2002 Topics Equations for incompressible fluid flow 3 model PDEs: Hyperbolic, Elliptic, Parabolic
More information(a) The points (3, 1, 2) and ( 1, 3, 4) are the endpoints of a diameter of a sphere.
MATH 4 FINAL EXAM REVIEW QUESTIONS Problem. a) The points,, ) and,, 4) are the endpoints of a diameter of a sphere. i) Determine the center and radius of the sphere. ii) Find an equation for the sphere.
More informationOptimization of Particle-In-Cell simulations for Vlasov-Poisson system with strong magnetic field
Optimization of Particle-In-Cell simulations for Vlasov-Poisson system with strong magnetic field Edwin Chacon-Golcher Sever A. Hirstoaga Mathieu Lutz Abstract We study the dynamics of charged particles
More informationSolving PDEs with CUDA Jonathan Cohen
Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear
More informationComputation of photoelectron and Auger-electron diffraction III. Evaluation of angle-resolved intensities PAD3
Computer Physics Communications 112 (1998) 911101 Computation of photoelectron and Auger-electron diffraction III. Evaluation of angle-resolved intensities PAD3 X.Chen,G.R.Harp 1,Y.Ueda,D.K.Saldin 2 Department
More informationA simple Concept for the Performance Analysis of Cluster-Computing
A simple Concept for the Performance Analysis of Cluster-Computing H. Kredel 1, S. Richling 2, J.P. Kruse 3, E. Strohmaier 4, H.G. Kruse 1 1 IT-Center, University of Mannheim, Germany 2 IT-Center, University
More informationSpeeding up simulations of relativistic systems using an optimal boosted frame
Speeding up simulations of relativistic systems using an optimal boosted frame J.-L. Vay1,3, W. M. Fawley1, C. G. R. Geddes1, E. Cormier-Michel1, D. P. Grote2,3 1Lawrence Berkeley National Laboratory,
More informationSome notes on efficient computing and setting up high performance computing environments
Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient
More informationDefinition. A signal is a sequence of numbers. sequence is also referred to as being in l 1 (Z), or just in l 1. A sequence {x(n)} satisfying
Signals and Systems. Definition. A signal is a sequence of numbers {x(n)} n Z satisfying n Z x(n)
More informationParticle in Cell method
Particle in Cell method Birdsall and Langdon: Plasma Physics via Computer Simulation Dawson: Particle simulation of plasmas Hockney and Eastwood: Computer Simulations using Particles we start with an electrostatic
More informationPerformance Evaluation of Scientific Applications on POWER8
Performance Evaluation of Scientific Applications on POWER8 2014 Nov 16 Andrew V. Adinetz 1, Paul F. Baumeister 1, Hans Böttiger 3, Thorsten Hater 1, Thilo Maurer 3, Dirk Pleiter 1, Wolfram Schenck 4,
More informationDirect Numerical Simulations of converging-diverging channel flow
Intro Numerical code Results Conclusion Direct Numerical Simulations of converging-diverging channel flow J.-P. Laval (1), M. Marquillie (1) Jean-Philippe.Laval@univ-lille1.fr (1) Laboratoire de Me canique
More informationarxiv: v2 [math.na] 21 Apr 2014
Combustion Theory and Modelling Vol. 00, No. 00, Month 200x, 1 25 RESEARCH ARTICLE High-Order Algorithms for Compressible Reacting Flow with Complex Chemistry arxiv:1309.7327v2 [math.na] 21 Apr 2014 M.
More informationSOLUTIONS, PROBLEM SET 11
SOLUTIONS, PROBLEM SET 11 1 In this problem we investigate the Lagrangian formulation of dynamics in a rotating frame. Consider a frame of reference which we will consider to be inertial. Suppose that
More informationSpectral simulations of wall-bounded flows on massively-parallel computers
Paper 1 1 Spectral simulations of wall-bounded flows on massively-parallel computers By Qiang Li, Philipp Schlatter & Dan S. Henningson Linné Flow Centre, KTH Mechanics SE-100 44 Stockholm, Sweden 1.
More informationELEG 305: Digital Signal Processing
ELEG 305: Digital Signal Processing Lecture 18: Applications of FFT Algorithms & Linear Filtering DFT Computation; Implementation of Discrete Time Systems Kenneth E. Barner Department of Electrical and
More informationPerformance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster
Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Yuta Hirokawa Graduate School of Systems and Information Engineering, University of Tsukuba hirokawa@hpcs.cs.tsukuba.ac.jp
More informationCyclops Tensor Framework
Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r
More informationMetrics and Curvature
Metrics and Curvature How to measure curvature? Metrics Euclidian/Minkowski Curved spaces General 4 dimensional space Cosmological principle Homogeneity and isotropy: evidence Robertson-Walker metrics
More informationParallel Sparse Tensor Decompositions using HiCOO Format
Figure sources: A brief survey of tensors by Berton Earnshaw and NVIDIA Tensor Cores Parallel Sparse Tensor Decompositions using HiCOO Format Jiajia Li, Jee Choi, Richard Vuduc May 8, 8 @ SIAM ALA 8 Outline
More informationLattice Quantum Chromodynamics on the MIC architectures
Lattice Quantum Chromodynamics on the MIC architectures Piotr Korcyl Universität Regensburg Intel MIC Programming Workshop @ LRZ 28 June 2017 Piotr Korcyl Lattice Quantum Chromodynamics on the MIC 1/ 25
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:305:45 CBC C222 Lecture 8 Frequency Analysis 14/02/18 http://www.ee.unlv.edu/~b1morris/ee482/
More informationS12.1 SOLUTIONS TO PROBLEMS 12 (ODD NUMBERS)
OLUTION TO PROBLEM 2 (ODD NUMBER) 2. The electric field is E = φ = 2xi + 2y j and at (2, ) E = 4i + 2j. Thus E = 2 5 and its direction is 2i + j. At ( 3, 2), φ = 6i + 4 j. Thus the direction of most rapid
More informationFYS 3120: Classical Mechanics and Electrodynamics
FYS 3120: Classical Mechanics and Electrodynamics Formula Collection Spring semester 2014 1 Analytical Mechanics The Lagrangian L = L(q, q, t), (1) is a function of the generalized coordinates q = {q i
More informationSelected electrostatic problems (Lecture 14)
Selected electrostatic problems (Lecture 14) February 1, 2016 236/441 Lecture outline In this lecture, we demonstrate how to solve several electrostatic problems related to calculation of the fields generated
More informationWRF performance tuning for the Intel Woodcrest Processor
WRF performance tuning for the Intel Woodcrest Processor A. Semenov, T. Kashevarova, P. Mankevich, D. Shkurko, K. Arturov, N. Panov Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {alexander.l.semenov,tamara.p.kashevarova,pavel.v.mankevich,
More informationParallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems
Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems G. Hager HPC Services, Computing Center Erlangen, Germany E. Jeckelmann Theoretical Physics, Univ.
More informationEdwin Chacon-Golcher 1, Sever A. Hirstoaga 2 and Mathieu Lutz 3. Introduction
ESAIM: PROCEEDINGS AND SURVEYS, March 2016, Vol. 53, p. 177-190 M. Campos Pinto and F. Charles, Editors OPTIMIZATION OF PARTICLE-IN-CELL SIMULATIONS FOR VLASOV-POISSON SYSTEM WITH STRONG MAGNETIC FIELD
More informationElectromagnetically Induced Flows in Water
Electromagnetically Induced Flows in Water Michiel de Reus 8 maart 213 () Electromagnetically Induced Flows 1 / 56 Outline 1 Introduction 2 Maxwell equations Complex Maxwell equations 3 Gaussian sources
More informationR. Glenn Brook, Bilel Hadri*, Vincent C. Betro, Ryan C. Hulguin, and Ryan Braby Cray Users Group 2012 Stuttgart, Germany April 29 May 3, 2012
R. Glenn Brook, Bilel Hadri*, Vincent C. Betro, Ryan C. Hulguin, and Ryan Braby Cray Users Group 2012 Stuttgart, Germany April 29 May 3, 2012 * presenting author Contents Overview on AACE Overview on MIC
More informationConvolutional networks. Sebastian Seung
Convolutional networks Sebastian Seung Convolutional network Neural network with spatial organization every neuron has a location usually on a grid Translation invariance synaptic strength depends on locations
More informationParticle-in-Cell Code BEAMPATH for Beam Dynamics Simulations in Linear Accelerators and Beamlines*
SLAC-PUB-1081 5 October 004 Particle-in-Cell Code BEAMPATH for Beam Dynamics Simulations in Linear Accelerators and Beamlines* Yuri K. Batygin Stanford Linear Accelerator Center, Stanford University, Stanford,
More informationGraduate Accelerator Physics. G. A. Krafft Jefferson Lab Old Dominion University Lecture 1
Graduate Accelerator Physics G. A. Krafft Jefferson Lab Old Dominion University Lecture 1 Course Outline Course Content Introduction to Accelerators and Short Historical Overview Basic Units and Definitions
More informationEOS 250 Geophysical Fields and Fluxes Volume Integrals
EOS 25 Geophysical Fields and Fluxes olume Integrals February 17, 21 Overview These notes cover the following: Quantities defined per unit volume : volume integrals olume Integrals We have seen in the
More informationIAMG Proceedings. The projective method approach in the electromagnetic integral equations solver. presenting author
The projective method approach in the electromagnetic integral equations solver M. KRUGLYAKOV 1 AND L. BLOSHANSKAYA 2 1 Lomonsov Moscow State University, Russia m.kruglyakov@gmail.com 2 SUNY New Paltz,
More informationNumerical Modelling in Fortran: day 10. Paul Tackley, 2016
Numerical Modelling in Fortran: day 10 Paul Tackley, 2016 Today s Goals 1. Useful libraries and other software 2. Implicit time stepping 3. Projects: Agree on topic (by final lecture) (No lecture next
More informationThe Shallow Water Equations
If you have not already done so, you are strongly encouraged to read the companion file on the non-divergent barotropic vorticity equation, before proceeding to this shallow water case. We do not repeat
More informationarxiv: v1 [hep-lat] 7 Oct 2010
arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA
More information2.20 Fall 2018 Math Review
2.20 Fall 2018 Math Review September 10, 2018 These notes are to help you through the math used in this class. This is just a refresher, so if you never learned one of these topics you should look more
More information