Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Size: px

Start display at page:

Download "Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters"

Abel Gregory
5 years ago
Views:

1 Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt,

2 Contents Motivation walberla software concepts LBM simulations on Tsubame Future Work 2

$){ USE_Sweep(){$

3 Computational Science and LSS Applications Multiphysics fluid, structure medical imaging laser USE_SweepSection( getlbmsweepuid() ){ USE_Sweep(){ swusefunction( LBM",sweep::LBMsweep,FS UIDSet::all(),hsCPU,BSUIDSet::all()); } USE_After(){ //Communication } } Computer Science HPC / hardware Performance engineering software engineering Applied Math LBM multigrid FEM numerics 3

4 Problems Hardware: Modern HPC clusters are massively parallel Intra-core, intra-node, and inter-node Software: Applications become more complex with increasing computational power More complex (physical) models Code development in interdisciplinary teams Algorithm: Many variants exist Components and parameters depend on computational domain or grid, type of problem, 4

5 Applications WALBERLA 5

6 walberla: parallel block-structured grid framework 6

7 GPU Geometric multigrid solver on Tsubame runtime in ms Computational Steering (VIPER) unknowns in million CFD, fluid-structure interaction 7

8 Boltzmann equation Mesoscopic approach to solving the Navier-Stokes equations Boltzmann equation describes the statistical distribution of one particle in a fluid f t + ζ f f is the probability distribution function (PDF), velocity, and Ω(f) is the change due to collision Models behavior of fluids in statistical physics Lattice Boltzmann Method (LBM) solves the discrete Boltzmann equation = Ω (f ) ζ the particle 8

9 Particulate Flow Simulation D3Q19 LBM cell Collide and Stream K. Iglberger F = m a M = J α simulation done by Ch. Feichtinger 9

10 CPU-GPU cluster software concepts WALBERLA 10

11 walberla: Block concept 11

12 walberla: Sweep concept 12

13 walberla: Communication concept 13

14 Overlapping of work and communication 14

15 WaLBerla: Subblocks Assumption: A block corresponds to a (shared-memory) compute node Can possibly be heterogeneous (CPU + GPU) Distributed memory communication (via MPI) is not required within one block Divide one block into subblocks of different sizes for (static) load balancing Subblocks map to (local) devices 15

16 Domain decomposition on one compute node 16

17 LBM Simulations on Tsubame 2.0 RESULTS 17

18 Tsubame 2.0 in Japan Compute nodes: 1442 Processor: Intel Xeon X5670 GPU: 3 x Nvidia Tesla M2050 LINPACK performance: 1.2 Petaflops Power consumption: 1.4 MW Interconnect: QDR Infiniband 18

19 Performance Model I Input Algorithm: LBM kernel Generic Implementation Hardware information (bandwidth, peak performance) Assumption t = t + max( t, t + t + t,, total comp, outer comp, inner buffer comm GPUCPU comm MPI ) Computation time limited by memory bandwidth and instruction throughput Communication time limited by network bandwidth and latency (for direct and collective communication) 19

20 Performance Model II Single node performance on Tsubame Machine balance B m = sustainable bandwidth peak performance Code balance B c = no. bytes loaded and stored no. executed FLOPS = Lightspeed estimate l = min 1, B B m c 20

21 Single Compute Node Performance I 21

22 Single Compute Node Performance II 22

23 Single Compute Node Performance III 23

24 Single Compute Node Performance IV 24

25 Weak scaling, 3 GPUs per node 25

26 Strong scaling, 3 GPUs per node 26

27 Test case: Packed bed of hollow cylinders 27

28 Porous media: 100x100x1536, 1D dom. decomp. 28

29 Porous media: 100x100x1536, 1D dom. decomp. 29

30 Porous media: 100x100x1536, 1D/2D/3D 30

31 Porous media: 256x256x3600, 1D/2D 31

32 Future Work Tests on Nvidia Kepler cluster Main focus in walberla currently on Juqueen and SuperMUC Programming paradigms on future HPC clusters? Code generation techniques to improve portability Dynamic load balancing 32

A Framework for Hybrid Parallel Flow Simulations with a Trillion Cells in Complex Geometries

A Framework for Hybrid Parallel Flow Simulations with a Trillion Cells in Complex Geometries SC13, November 21 st 2013 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler, Ulrich