Introduction to numerical computations on the GPU

Size: px

Start display at page:

Download "Introduction to numerical computations on the GPU"

Kerrie Johns
5 years ago
Views:

1 Introduction to numerical computations on the GPU Lucian Covaci Tuesday 1 November 11 1

2 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming framework (thread hierarchy and memory access) 2 very simple examples my work (kernel polynomial method): Thrust and Cusp libraries

3 Cuda enabled Nvidia GeForce and Tesla Tuesday 1 November 11 3

4 4

5 fastest supercomputer to date: Tianhe-1A (China) 14,336 Xeon X5670 processors and 7,168 Nvidia Tesla M2050 speed: theoretical peak performance of petaflops (4.7 million GFlops) Tuesday 1 November 11 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16 Cuda specifications (wikipedia)

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35 Scientific computing on GPUs: GPGPU applications showcase on nvidia.com

36 36 Physics applications theoretical speedup vs. real speedup: parallelizable problem? single versus double precision computation best suited so far: visualization and imaging fluid dynamics (lattice Boltzmann simulations) molecular dynamics Monte Carlo simulations of spin systems wave packet dynamics in quantum mechanics (Chebyshev expansions) mean-field solutions of many body problems (KPM) many more to come... programming languages: C/C++, Fortran (PGI), Python, Matlab, Perl, Java, Mathematica

37 37 Let f :[ 1 : 1]!R; Set of orthogonal functions: h n mi = Z 1 1 Chebyshev polynomial expansion integrable function n = T n(x) p 1 x 2 p 1 x 2 n(x) m (x)dx h n mi = 1+ n0 2 n,m Chebyshev polynomials: T n (x) =cos[narccos(x)] T 0 (x) =1 T 1 (x) =x T n+1 =2xT n (x) T n 1 P Z 1 1 U n (x) =sin[(n + 1) arccos(x)]/ sin[narccos(x)] U 0 (x) =1 U 1 (x) =0 U n+1 =2xU n (x) U n 1 T n (y) (y x) p 1 y 2 dy = U n 1(x) Then any integrable function f(x), can be expressed as: f(x) = 1 p 1 x 2 [µ X µ n T n (x)]; µ n = n=1 Z 1 1 f(x)t n (x) dx

38 38 Expansion of the Green s function: G ij (!) = i p 1! 2 " µ 0 +2 # 1X µ n e in arccos(!) n=1 µ n = hi T n (H) ji where ji = c j" 0i for regular GF ji = c j# 0i for anomalous GF hi = h0 c i" define: j n i = T n (H) ji j 0 i = ji j 1 i = H ji j n+1 i = 2H j n i j n 1 i for each iteration step: µ n = hi j n i sparse matrix-vector multiplication choose i> and j> to obtain different components of the Green s function A. Weisse et al., Rev. Mod. Phys. 78, 275 (2006) L. Covaci, F. Peeters and M. Berciu, Phys. Rev. Lett. 105, (2010)

39 39 use Thrust and Cusp libraries to perform sparse matrix-vector multiplications and vector dot products on three Geforce GTX 580

40 40 Thrust CUDA library with interface resembling C++ Standard template library (e.g. define vector and map containers, use algorithms)

41 41 Cusp CUDA library for sparse linear algebra. It provides a high-level interface for manipulating sparse matrices and solving sparse linear systems. Cusp is based on Thrust.

42 42 Summary GPU hardware is optimized to run massively parallel grid of threads all performing the same computation (SIMD) not all algorithms can be speedup on the GPU basic ingredients for a fast code: ability to start a large number of threads on independent components of an array (linear algebra, molecular dynamics, grid partitioning for Monte Carlo on spin systems) minimize memory transfer between CPU and GPU

43 References nvidia.com for information on CUDA Cuda by Example, J. Sanders and E. Kandrot Online lecture notes from Harvard 2011 course information and examples on Thrust information and examples on Cusp Tuesday 1 November 11 43

Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice

Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice Michal Borovský Department of Theoretical Physics and Astrophysics, University of P. J. Šafárik in Košice,