Performance Analysis of a List-Based Lattice-Boltzmann Kernel

Size: px
Start display at page:

Download "Performance Analysis of a List-Based Lattice-Boltzmann Kernel"

Transcription

1 Performance Analysis of a List-Based Lattice-Boltzmann Kernel First Talk MuCoSim, 29. June 2016 Michael Hußnätter RRZE HPC Group Friedrich-Alexander University of Erlangen-Nuremberg

2 Outline Lattice Boltzmann List-Based Data Layout Run Length Encoding Roofline Analysis 2

3 Lattice Boltzmann Overview (1) Originating from lattice gas automaton Discrete time steps and discrete particle grid Particles only reside at the grid nodes 3

4 Lattice Boltzmann Overview (1) Originating from lattice gas automaton Discrete time steps and discrete particle grid Particles only reside at the grid nodes Grid nodes are connected by velocity vectors (c α ) Particle distribution is changed in two-step approach Particle Distribution Functions (PDFs) aggregate particles (f α ) 4

5 Lattice Boltzmann Overview (2) Combining Cellular Gas Automaton and Boltzmann equation leads to: f α x + c α Δt, t + Δt f α x, t = ω f α f α eq where f α eq depends on macroscopic velocity and density of the lattice NW W D2Q9 N C NO O Easy implementation by two-step approach: SW S Stream step: f α x + c α Δt, t + Δt = f α x, t + Δt Collide step: f α x, t + Δt = f α (x, t) ω(f α f eq α ) SO 5

6 Lattice Boltzmann PDF Streaming Two possibilities for PDF streaming: pull scheme 6

7 Lattice Boltzmann PDF Streaming Two possibilities for PDF streaming: pull scheme push scheme 7

8 Lattice Boltzmann PDF Streaming Two possibilities for PDF streaming: pull scheme push scheme 8

9 Lattice Boltzmann No-Slip Boundary Reflecting PDFs into same cell but opposite direction: F S Fluid node Solid node time step t = 0 9

10 Lattice Boltzmann No-Slip Boundary Reflecting PDFs into same cell but opposite direction: F S Fluid node Solid node time step t =

11 Lattice Boltzmann No-Slip Boundary Reflecting PDFs into same cell but opposite direction: F S Fluid node Solid node time step t = 1 11

12 Lattice Boltzmann Data Layout Grid Section F Fluid node S Solid node Cell Storage N N N N N N N N N N N N N N N S S S S S S S S S S S S S S S Field data layout (SoA) Easy address calculation for neighboring PDFs Source and destination cell storage 12

13 Lattice Boltzmann Simple Kernel (1) foreach cell in cellstorage do (2) if cell is fluidcell then (3) stream collide (4) end (5) end (6) swap cell storages 13

14 List-Layout Motivation LBM performance usually limited by memory capacity and memory bandwidth direct addressing scheme wastes valuable memory resources when it comes to complex domains Goal: Reduce memory requirements by omitting non-fluid cells which at the same time eliminates if in main loop Challenge: Convenient address calculation is lost (Godenschwager) 14

15 List-Layout Basics Grid Section F Fluid node S Solid node Cell Storage N N N N N N N N N Adjacency List N* N* N* N* N* N* N* N* N* 15

16 List-Layout Basics Grid Section F Fluid node S Solid node Cell Storage N N N N N N N N N Adjacency List N* N* N* N* N* N* N* N* N* 16

17 List-Layout Basics Grid Section F Fluid node S Solid node Cell Storage N N N N N N N N N Adjacency List N* N* N* N* N* N* N* N* N* 17

18 List-Layout Basics Grid Section F Fluid node S Solid node Cell Storage N N N N N N N N N Adjacency List N* N* N* N* N* N* N* N* N* 18

19 List-Layout No-Slip Boundary No-Slip without any intermediate time step: Cell Storage N NE E SE S SW W NW N* NE* E* SE* S* SW* W* NW* Adjacency List 19

20 List-Layout No-Slip Boundary No-Slip without any intermediate time step: Cell Storage N NE E SE S SW W NW N* NE* E* SE* S* SW* W* NW* Adjacency List 20

21 List-Layout Kernel (1) foreach cell in cellstorage do (2) get pullpointers from adjacencylist (3) stream collide (4) end (5) swap cell storages 21

22 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* W* W* W* W* W* W* RLE List

23 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* W* W* W* W* W* W* RLE List

24 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* RLE List

25 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* RLE List

26 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* RLE List

27 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* RLE List

28 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* RLE List

29 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* RLE List

30 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List RLE List W* W* W* W* W* W* 30

31 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List RLE List W* W* W* W* W* W* 31

32 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List RLE List W* W* W* W* W* W* 32

33 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List RLE List W* W* W* W* W* W* 33

34 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List RLE List W* W* W* W* W* W* 34

35 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List RLE List W* W* W* W* W* W* 35

36 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* W* W* W* W* W* W* RLE List

37 List-Layout Kernel with RLE (1) foreach rleblock in rlelist do // RLE loop (2) get pullpointers from adjacencylist (3) foreach cell in rleblock do // one macroscopic loop (4) calculate macroscopic values (5) end (6) foreach cell in rleblock do // nine collide loops (7) collide and store directions pairwise (8) end (9) end (10) swap cellstorages 37

38 Roofline Analysis

39 Roofline Emmy s Characteristics Maximal floating point performance for operands in L1: 2 load ports, 1 store ports, 1 cy throughput per add (mul) 2.2 GHz delivers 88 GFLOP/s Achievable memory bandwidth: Determined on full socket with likwid-bench s copy_avx and yielded 40.6 GByte/s 39

40 GFLOP/s Roofline Determining Bottleneck Roofline Estimation 1 1/16 1/ Operational Intensity [FLOP/Byte] 40

41 GFLOP/s Roofline Determining Bottleneck Roofline Estimation 1 1/16 1/ Operational Intensity [FLOP/Byte] 198 FLOP / LUP 3 * 19 * 8 Byte / LUP = 456 Byte / LUP Operational Intensity: 0.43 FLOP / Byte 41

42 GFLOP/s Roofline Determining Bottleneck Roofline Estimation 1 1/16 1/ Operational Intensity [FLOP/Byte] 198 FLOP / LUP 3 * 19 * 8 Byte / LUP = 456 Byte / LUP Operational Intensity: 0.43 FLOP / Byte 42

43 GFLOP/s Roofline Determining Bottleneck Mem Limit 2 Roofline Estimation 1 1/16 1/ Operational Intensity [FLOP/Byte] 198 FLOP / LUP 3 * 19 * 8 Byte / LUP = 456 Byte / LUP Operational Intensity: 0.43 FLOP / Byte 43

44 Roofline FLOP vs FLUP Lattice Boltzmann: More FLOPs will not neccessary lead to shorter time to solution FLOPs per lattice update highly depend on implemenation Fluid Lattice UPdate(s) per second introduced for comparable results Considered implementation requires 456 Byte per FLUP Adapted Roofline performance estimation based on achievable memory bandwidth for certain number of cores 44

45 Roofline TestCase: Channel 25,000,000 cells High pressure boundary (green) Low pressure boundary (red) 45

46 GByte/s Roofline Emmy s MemBandwidth 50 Theoretical Limit 1600 MHz Quad-Channel copy_avx 1 Load / 1 Store copy_avx 19 Load / 1 Store # cores 46

47 MFLUP/s Roofline Performance Evaluation # cores Roofline Roofline 19/1 List LBM 47

48 Upcoming Talk Overview Short recap of Lattice Boltzmann 48

49 Upcoming Talk Overview Short recap of Lattice Boltzmann Detailed ECM performance estimation and evaluation for IvyBridge and Haswell 49

50 };

51 Backup Slide SoA vs AoS Struct of Arrays (SoA) C C C N N N S S S Array of Structs (AoS) C N S W E NW NE SW SE C N S 51

A Framework for Hybrid Parallel Flow Simulations with a Trillion Cells in Complex Geometries

A Framework for Hybrid Parallel Flow Simulations with a Trillion Cells in Complex Geometries A Framework for Hybrid Parallel Flow Simulations with a Trillion Cells in Complex Geometries SC13, November 21 st 2013 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler, Ulrich

More information

ERLANGEN REGIONAL COMPUTING CENTER

ERLANGEN REGIONAL COMPUTING CENTER ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,

More information

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information

More Science per Joule: Bottleneck Computing

More Science per Joule: Bottleneck Computing More Science per Joule: Bottleneck Computing Georg Hager Erlangen Regional Computing Center (RRZE) University of Erlangen-Nuremberg Germany PPAM 2013 September 9, 2013 Warsaw, Poland Motivation (1): Scalability

More information

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Some thoughts about energy efficient application execution on NEC LX Series compute clusters Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science

More information

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance

More information

Exploring performance and power properties of modern multicore chips via simple machine models

Exploring performance and power properties of modern multicore chips via simple machine models Exploring performance and power properties of modern multicore chips via simple machine models G. Hager, J. Treibig, J. Habich, and G. Wellein Erlangen Regional Computing Center (RRZE) Martensstr. 1, 9158

More information

COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD

COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD XVIII International Conference on Water Resources CMWR 2010 J. Carrera (Ed) c CIMNE, Barcelona, 2010 COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD James.E. McClure, Jan F. Prins

More information

Performance Evaluation of Scientific Applications on POWER8

Performance Evaluation of Scientific Applications on POWER8 Performance Evaluation of Scientific Applications on POWER8 2014 Nov 16 Andrew V. Adinetz 1, Paul F. Baumeister 1, Hans Böttiger 3, Thorsten Hater 1, Thilo Maurer 3, Dirk Pleiter 1, Wolfram Schenck 4,

More information

Applications of Lattice Boltzmann Methods

Applications of Lattice Boltzmann Methods Applications of Lattice Boltzmann Methods Dominik Bartuschat, Martin Bauer, Simon Bogner, Christian Godenschwager, Florian Schornbaum, Ulrich Rüde Erlangen, Germany March 1, 2016 NUMET 2016 D.Bartuschat,

More information

Simulation of floating bodies with lattice Boltzmann

Simulation of floating bodies with lattice Boltzmann Simulation of floating bodies with lattice Boltzmann by Simon Bogner, 17.11.2011, Lehrstuhl für Systemsimulation, Friedrich-Alexander Universität Erlangen 1 Simulation of floating bodies with lattice Boltzmann

More information

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

On the Use of a Many core Processor for Computational Fluid Dynamics Simulations

On the Use of a Many core Processor for Computational Fluid Dynamics Simulations On the Use of a Many core Processor for Computational Fluid Dynamics Simulations Sebastian Raase, Tomas Nordström Halmstad University, Sweden {sebastian.raase,tomas.nordstrom} @ hh.se Preface based on

More information

591 TFLOPS Multi-TRILLION Particles Simulation on SuperMUC

591 TFLOPS Multi-TRILLION Particles Simulation on SuperMUC International Supercomputing Conference 2013 591 TFLOPS Multi-TRILLION Particles Simulation on SuperMUC W. Eckhardt TUM, A. Heinecke TUM, R. Bader LRZ, M. Brehm LRZ, N. Hammer LRZ, H. Huber LRZ, H.-G.

More information

Exploiting In-Memory Processing Capabilities for Density Functional Theory Applications

Exploiting In-Memory Processing Capabilities for Density Functional Theory Applications Exploiting In-Memory Processing Capabilities for Density Functional Theory Applications 2016 Aug 23 P. F. Baumeister, T. Hater, D. Pleiter H. Boettiger, T. Maurer, J. R. Brunheroto Contributors IBM R&D

More information

Measuring freeze-out parameters on the Bielefeld GPU cluster

Measuring freeze-out parameters on the Bielefeld GPU cluster Measuring freeze-out parameters on the Bielefeld GPU cluster Outline Fluctuations and the QCD phase diagram Fluctuations from Lattice QCD The Bielefeld hybrid GPU cluster Freeze-out conditions from QCD

More information

P214 Efficient Computation of Passive Seismic Interferometry

P214 Efficient Computation of Passive Seismic Interferometry P214 Efficient Computation of Passive Seismic Interferometry J.W. Thorbecke* (Delft University of Technology) & G.G. Drijkoningen (Delft University of Technology) SUMMARY Seismic interferometry is from

More information

Accelerating Quantum Chromodynamics Calculations with GPUs

Accelerating Quantum Chromodynamics Calculations with GPUs Accelerating Quantum Chromodynamics Calculations with GPUs Guochun Shi, Steven Gottlieb, Aaron Torok, Volodymyr Kindratenko NCSA & Indiana University National Center for Supercomputing Applications University

More information

Simulation of Lid-driven Cavity Flow by Parallel Implementation of Lattice Boltzmann Method on GPUs

Simulation of Lid-driven Cavity Flow by Parallel Implementation of Lattice Boltzmann Method on GPUs Simulation of Lid-driven Cavity Flow by Parallel Implementation of Lattice Boltzmann Method on GPUs S. Berat Çelik 1, Cüneyt Sert 2, Barbaros ÇETN 3 1,2 METU, Mechanical Engineering, Ankara, TURKEY 3 METU-NCC,

More information

GPU-accelerated Computing at Scale. Dirk Pleiter I GTC Europe 10 October 2018

GPU-accelerated Computing at Scale. Dirk Pleiter I GTC Europe 10 October 2018 GPU-accelerated Computing at Scale irk Pleiter I GTC Europe 10 October 2018 Outline Supercomputers at JSC Future science challenges Outlook and conclusions 2 3 Supercomputers at JSC JUQUEEN (until 2018)

More information

Lecture 19. Architectural Directions

Lecture 19. Architectural Directions Lecture 19 Architectural Directions Today s lecture Advanced Architectures NUMA Blue Gene 2010 Scott B. Baden / CSE 160 / Winter 2010 2 Final examination Announcements Thursday, March 17, in this room:

More information

Efficient implementation of the overlap operator on multi-gpus

Efficient implementation of the overlap operator on multi-gpus Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator

More information

Case Study: Quantum Chromodynamics

Case Study: Quantum Chromodynamics Case Study: Quantum Chromodynamics Michael Clark Harvard University with R. Babich, K. Barros, R. Brower, J. Chen and C. Rebbi Outline Primer to QCD QCD on a GPU Mixed Precision Solvers Multigrid solver

More information

External and Internal Incompressible Viscous Flows Computation using Taylor Series Expansion and Least Square based Lattice Boltzmann Method

External and Internal Incompressible Viscous Flows Computation using Taylor Series Expansion and Least Square based Lattice Boltzmann Method Available online at http://ijim.srbiau.ac.ir/ Int. J. Industrial Mathematics (ISSN 2008-5621) Vol. 10, No. 2, 2018 Article ID IJIM-00726, 8 pages Research Article External and Internal Incompressible Viscous

More information

Drag Force Simulations of Particle Agglomerates with the Lattice-Boltzmann Method

Drag Force Simulations of Particle Agglomerates with the Lattice-Boltzmann Method Drag Force Simulations of Particle Agglomerates with the Lattice-Boltzmann Method Christian Feichtinger, Nils Thuerey, Ulrich Ruede Christian Binder, Hans-Joachim Schmid, Wolfgang Peukert Friedrich-Alexander-Universität

More information

Porting a sphere optimization program from LAPACK to ScaLAPACK

Porting a sphere optimization program from LAPACK to ScaLAPACK Porting a sphere optimization program from LAPACK to ScaLAPACK Mathematical Sciences Institute, Australian National University. For presentation at Computational Techniques and Applications Conference

More information

Window-aware Load Shedding for Aggregation Queries over Data Streams

Window-aware Load Shedding for Aggregation Queries over Data Streams Window-aware Load Shedding for Aggregation Queries over Data Streams Nesime Tatbul Stan Zdonik Talk Outline Background Load shedding in Aurora Windowed aggregation queries Window-aware load shedding Experimental

More information

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)

More information

A simple Concept for the Performance Analysis of Cluster-Computing

A simple Concept for the Performance Analysis of Cluster-Computing A simple Concept for the Performance Analysis of Cluster-Computing H. Kredel 1, S. Richling 2, J.P. Kruse 3, E. Strohmaier 4, H.G. Kruse 1 1 IT-Center, University of Mannheim, Germany 2 IT-Center, University

More information

A hierarchical Model for the Analysis of Efficiency and Speed-up of Multi-Core Cluster-Computers

A hierarchical Model for the Analysis of Efficiency and Speed-up of Multi-Core Cluster-Computers A hierarchical Model for the Analysis of Efficiency and Speed-up of Multi-Core Cluster-Computers H. Kredel 1, H. G. Kruse 1 retired, S. Richling2 1 IT-Center, University of Mannheim, Germany 2 IT-Center,

More information

Numerical Characterization of Multi-Dielectric Green s Function for 3-D Capacitance Extraction with Floating Random Walk Algorithm

Numerical Characterization of Multi-Dielectric Green s Function for 3-D Capacitance Extraction with Floating Random Walk Algorithm Numerical Characterization of Multi-Dielectric Green s Function for 3-D Capacitance Extraction with Floating Random Walk Algorithm Hao Zhuang 1, 2, Wenjian Yu 1 *, Gang Hu 1, Zuochang Ye 3 1 Department

More information

Compiling Techniques

Compiling Techniques Lecture 11: Introduction to 13 November 2015 Table of contents 1 Introduction Overview The Backend The Big Picture 2 Code Shape Overview Introduction Overview The Backend The Big Picture Source code FrontEnd

More information

Numerical Simulation Of Pore Fluid Flow And Fine Sediment Infiltration Into The Riverbed

Numerical Simulation Of Pore Fluid Flow And Fine Sediment Infiltration Into The Riverbed City University of New York (CUNY) CUNY Academic Works International Conference on Hydroinformatics 8-1-2014 Numerical Simulation Of Pore Fluid Flow And Fine Sediment Infiltration Into The Riverbed Tobias

More information

2.5D algorithms for distributed-memory computing

2.5D algorithms for distributed-memory computing ntroduction for distributed-memory computing C Berkeley July, 2012 1/ 62 ntroduction Outline ntroduction Strong scaling 2.5D factorization 2/ 62 ntroduction Strong scaling Solving science problems faster

More information

Research of Micro-Rectangular-Channel Flow Based on Lattice Boltzmann Method

Research of Micro-Rectangular-Channel Flow Based on Lattice Boltzmann Method Research Journal of Applied Sciences, Engineering and Technology 6(14): 50-55, 013 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 013 Submitted: November 08, 01 Accepted: December 8,

More information

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Khramtsov D.P., Nekrasov D.A., Pokusaev B.G. Department of Thermodynamics, Thermal Engineering and Energy Saving Technologies,

More information

- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline

- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline - Part 4 - Multicore and Manycore Technology: Chances and Challenges Vincent Heuveline 1 Numerical Simulation of Tropical Cyclones Goal oriented adaptivity for tropical cyclones ~10⁴km ~1500km ~100km 2

More information

A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method

A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method Jee Choi 1, Aparna Chandramowlishwaran 3, Kamesh Madduri 4, and Richard Vuduc 2 1 ECE, Georgia Tech 2 CSE, Georgia

More information

EXTENDED FREE SURFACE FLOW MODEL BASED ON THE LATTICE BOLTZMANN APPROACH

EXTENDED FREE SURFACE FLOW MODEL BASED ON THE LATTICE BOLTZMANN APPROACH METALLURGY AND FOUNDRY ENGINEERING Vol. 36, 2010, No. 2 Micha³ Szucki*, Józef S. Suchy***, Pawe³ ak*, Janusz Lelito**, Beata Gracz* EXTENDED FREE SURFACE FLOW MODEL BASED ON THE LATTICE BOLTZMANN APPROACH

More information

QuickCheck. Collisions between molecules. Collisions between molecules

QuickCheck. Collisions between molecules. Collisions between molecules Collisions between molecules We model molecules as rigid spheres of radius r as shown at the right. The mean free path of a molecule is the average distance it travels between collisions. The average time

More information

arxiv: v1 [hep-lat] 8 Nov 2014

arxiv: v1 [hep-lat] 8 Nov 2014 Staggered Dslash Performance on Intel Xeon Phi Architecture arxiv:1411.2087v1 [hep-lat] 8 Nov 2014 Department of Physics, Indiana University, Bloomington IN 47405, USA E-mail: ruizli AT umail.iu.edu Steven

More information

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 6.8 Shortest Paths Shortest Paths Shortest path problem. Given a directed graph G = (V,

More information

Routing Algorithms. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

Routing Algorithms. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Routing Algorithms CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Main Features Table Computation The routing tables must be computed

More information

Porting RSL to C++ Ryusuke Villemin, Christophe Hery. Pixar Technical Memo 12-08

Porting RSL to C++ Ryusuke Villemin, Christophe Hery. Pixar Technical Memo 12-08 Porting RSL to C++ Ryusuke Villemin, Christophe Hery Pixar Technical Memo 12-08 1 Introduction In a modern renderer, relying on recursive ray-tracing, the number of shader calls increases by one or two

More information

Collisions between molecules

Collisions between molecules Collisions between molecules We model molecules as rigid spheres of radius r as shown at the right. The mean free path of a molecule is the average distance it travels between collisions. The average time

More information

Two case studies of Monte Carlo simulation on GPU

Two case studies of Monte Carlo simulation on GPU Two case studies of Monte Carlo simulation on GPU National Institute for Computational Sciences University of Tennessee Seminar series on HPC, Feb. 27, 2014 Outline 1 Introduction 2 Discrete energy lattice

More information

Janus: FPGA Based System for Scientific Computing Filippo Mantovani

Janus: FPGA Based System for Scientific Computing Filippo Mantovani Janus: FPGA Based System for Scientific Computing Filippo Mantovani Physics Department Università degli Studi di Ferrara Ferrara, 28/09/2009 Overview: 1. The physical problem: - Ising model and Spin Glass

More information

An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8)) An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8)) Tung Chou January 5, 2012 QUAD Stream cipher. Security relies on MQ (Multivariate Quadratics). QUAD The Provably-secure QUAD(q, n, r) Stream Cipher

More information

Parallel Simulations of Self-propelled Microorganisms

Parallel Simulations of Self-propelled Microorganisms Parallel Simulations of Self-propelled Microorganisms K. Pickl a,b M. Hofmann c T. Preclik a H. Köstler a A.-S. Smith b,d U. Rüde a,b ParCo 2013, Munich a Lehrstuhl für Informatik 10 (Systemsimulation),

More information

Lattice Boltzmann model for the Elder problem

Lattice Boltzmann model for the Elder problem 1549 Lattice Boltzmann model for the Elder problem D.T. Thorne a and M.C. Sukop a a Department of Earth Sciences, Florida International University, PC 344, University Park, 11200 SW 8th Street, Miami,

More information

Lattice Boltzmann Method for Moving Boundaries

Lattice Boltzmann Method for Moving Boundaries Lattice Boltzmann Method for Moving Boundaries Hans Groot March 18, 2009 Outline 1 Introduction 2 Moving Boundary Conditions 3 Cylinder in Transient Couette Flow 4 Collision-Advection Process for Moving

More information

Matrix Assembly in FEA

Matrix Assembly in FEA Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,

More information

Cactus Tools for Petascale Computing

Cactus Tools for Petascale Computing Cactus Tools for Petascale Computing Erik Schnetter Reno, November 2007 Gamma Ray Bursts ~10 7 km He Protoneutron Star Accretion Collapse to a Black Hole Jet Formation and Sustainment Fe-group nuclei Si

More information

ONE DIMENSIONAL CELLULAR AUTOMATA(CA). By Bertrand Rurangwa

ONE DIMENSIONAL CELLULAR AUTOMATA(CA). By Bertrand Rurangwa ONE DIMENSIONAL CELLULAR AUTOMATA(CA). By Bertrand Rurangwa bertrand LUT, 21May2010 Cellula automata(ca) OUTLINE - Introduction. -Short history. -Complex system. -Why to study CA. -One dimensional CA.

More information

Maxim > Design Support > Technical Documents > Application Notes > Battery Management > APP 131

Maxim > Design Support > Technical Documents > Application Notes > Battery Management > APP 131 Maxim > Design Support > Technical Documents > Application Notes > Battery Management > APP 131 Keywords: battery fuel gauge, battery monitors, integrated circuits, ICs, coulomb counter, Li-Ion battery

More information

Simulation of T-junction using LBM and VOF ENERGY 224 Final Project Yifan Wang,

Simulation of T-junction using LBM and VOF ENERGY 224 Final Project Yifan Wang, Simulation of T-junction using LBM and VOF ENERGY 224 Final Project Yifan Wang, yfwang09@stanford.edu 1. Problem setting In this project, we present a benchmark simulation for segmented flows, which contain

More information

Algorithms: Lecture 12. Chalmers University of Technology

Algorithms: Lecture 12. Chalmers University of Technology Algorithms: Lecture 1 Chalmers University of Technology Today s Topics Shortest Paths Network Flow Algorithms Shortest Path in a Graph Shortest Path Problem Shortest path network. Directed graph G = (V,

More information

Mitchell Chapter 10. Living systems are open systems that exchange energy, materials & information

Mitchell Chapter 10. Living systems are open systems that exchange energy, materials & information Living systems compute Mitchell Chapter 10 Living systems are open systems that exchange energy, materials & information E.g. Erwin Shrodinger (1944) & Lynn Margulis (2000) books: What is Life? discuss

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

The Finite Cell Method: High order simulation of complex structures without meshing

The Finite Cell Method: High order simulation of complex structures without meshing The Finite Cell Method: High order simulation of complex structures without meshing E. Rank, A. Düster, D. Schillinger, Z. Yang Fakultät für Bauingenieur und Vermessungswesen Technische Universität München,

More information

Gas Turbine Technologies Torino (Italy) 26 January 2006

Gas Turbine Technologies Torino (Italy) 26 January 2006 Pore Scale Mesoscopic Modeling of Reactive Mixtures in the Porous Media for SOFC Application: Physical Model, Numerical Code Development and Preliminary Validation Michele CALI, Pietro ASINARI Dipartimento

More information

arxiv: v1 [cs.pf] 5 Mar 2018

arxiv: v1 [cs.pf] 5 Mar 2018 On the accuracy and usefulness of analytic energy models for contemporary multicore processors Johannes Hofmann 1, Georg Hager 2, and Dietmar Fey 1 arxiv:183.1618v1 [cs.pf] 5 Mar 218 1 Computer Architecture,

More information

Lattice Boltzmann fluid-dynamics on the QPACE supercomputer

Lattice Boltzmann fluid-dynamics on the QPACE supercomputer , Procedia Procedia Computer Computer Science Science 1 00 (2010) (2010) 1069 1076 1 8 Procedia Computer Science www.elsevier.com/locate/procedia International Conference on Computational Science, ICCS

More information

Array-of-Struct particles for ipic3d on MIC. Alec Johnson and Giovanni Lapenta. EASC2014 Stockholm, Sweden April 3, 2014

Array-of-Struct particles for ipic3d on MIC. Alec Johnson and Giovanni Lapenta. EASC2014 Stockholm, Sweden April 3, 2014 Array-of-Struct particles for ipic3d on MIC Alec Johnson and Giovanni Lapenta Centre for mathematical Plasma Astrophysics Mathematics Department KU Leuven, Belgium EASC2014 Stockholm, Sweden April 3, 2014

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

Pedestrian traffic models

Pedestrian traffic models December 1, 2014 Table of contents 1 2 3 to Pedestrian Dynamics Pedestrian dynamics two-dimensional nature should take into account interactions with other individuals that might cross walking path interactions

More information

Lecture #3. Review: Power

Lecture #3. Review: Power Lecture #3 OUTLINE Power calculations Circuit elements Voltage and current sources Electrical resistance (Ohm s law) Kirchhoff s laws Reading Chapter 2 Lecture 3, Slide 1 Review: Power If an element is

More information

Cellular Automata CS 591 Complex Adaptive Systems Spring Professor: Melanie Moses 2/02/09

Cellular Automata CS 591 Complex Adaptive Systems Spring Professor: Melanie Moses 2/02/09 Cellular Automata CS 591 Complex Adaptive Systems Spring 2009 Professor: Melanie Moses 2/02/09 Introduction to Cellular Automata (CA) Invented by John von Neumann (circa~1950). A cellular automata consists

More information

A Mathematical Model of the Skype VoIP Congestion Control Algorithm

A Mathematical Model of the Skype VoIP Congestion Control Algorithm A Mathematical Model of the Skype VoIP Congestion Control Algorithm Luca De Cicco, S. Mascolo, V. Palmisano Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari 47th IEEE Conference on Decision

More information

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters --

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters -- Parallel Processing for Energy Efficiency October 3, 2013 NTNU, Trondheim, Norway Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer

More information

The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows

The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows Vanja Zecevic, Michael Kirkpatrick and Steven Armfield Department of Aerospace Mechanical & Mechatronic Engineering The University of

More information

How do Wireless Chains Behave? The Impact of MAC Interactions

How do Wireless Chains Behave? The Impact of MAC Interactions The Impact of MAC Interactions S. Razak 1 Vinay Kolar 2 N. Abu-Ghazaleh 1 K. Harras 1 1 Department of Computer Science Carnegie Mellon University, Qatar 2 Department of Wireless Networks RWTH Aachen University,

More information

Number Representation and Waveform Quantization

Number Representation and Waveform Quantization 1 Number Representation and Waveform Quantization 1 Introduction This lab presents two important concepts for working with digital signals. The first section discusses how numbers are stored in memory.

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information

Communication avoiding parallel algorithms for dense matrix factorizations

Communication avoiding parallel algorithms for dense matrix factorizations Communication avoiding parallel dense matrix factorizations 1/ 44 Communication avoiding parallel algorithms for dense matrix factorizations Edgar Solomonik Department of EECS, UC Berkeley October 2013

More information

Using OpenMP on a Hydrodynamic Lattice-Boltzmann Code

Using OpenMP on a Hydrodynamic Lattice-Boltzmann Code Using OpenMP on a Hydrodynamic Lattice-Boltzmann Code Gino Bella Nicola Rossi Salvatore Filippone Stefano Ubertini Università degli Studi di Roma Tor Vergata 1 Introduction The motion of a uid ow is governed

More information

arxiv: v2 [math.na] 21 Aug 2016

arxiv: v2 [math.na] 21 Aug 2016 GPU-ACCELERATED BERNSTEIN-BEZIER DISCONTINUOUS GALERKIN METHODS FOR WAVE PROBLEMS JESSE CHAN AND T WARBURTON arxiv:15165v [mathna] 1 Aug 16 Abstract We evaluate the computational performance of the Bernstein-Bezier

More information

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y) 5.1 Banded Storage u = temperature u= u h temperature at gridpoints u h = 1 u= Laplace s equation u= h u = u h = grid size u=1 The five-point difference operator 1 u h =1 uh (x + h, y) 2u h (x, y)+u h

More information

CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms

CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms Professor Henry Carter Fall 2016 Recap Space-time tradeoffs allow for faster algorithms at the cost of space complexity overhead Dynamic

More information

arxiv: v1 [cs.dc] 4 Sep 2014

arxiv: v1 [cs.dc] 4 Sep 2014 and NVIDIA R GPUs arxiv:1409.1510v1 [cs.dc] 4 Sep 2014 O. Kaczmarek, C. Schmidt and P. Steinbrecher Fakultät für Physik, Universität Bielefeld, D-33615 Bielefeld, Germany E-mail: okacz, schmidt, p.steinbrecher@physik.uni-bielefeld.de

More information

Power Allocation and Coverage for a Relay-Assisted Downlink with Voice Users

Power Allocation and Coverage for a Relay-Assisted Downlink with Voice Users Power Allocation and Coverage for a Relay-Assisted Downlink with Voice Users Junjik Bae, Randall Berry, and Michael L. Honig Department of Electrical Engineering and Computer Science Northwestern University,

More information

ME615 Project Presentation Aeroacoustic Simulations using Lattice Boltzmann Method

ME615 Project Presentation Aeroacoustic Simulations using Lattice Boltzmann Method ME615 Project Presentation Aeroacoustic Simulations using Lattice Boltzmann Method Kameswararao Anupindi Graduate Research Assistant School of Mechanical Engineering Purdue Universit December 11, 11 Outline...

More information

Modeling and Tuning Parallel Performance in Dense Linear Algebra

Modeling and Tuning Parallel Performance in Dense Linear Algebra Modeling and Tuning Parallel Performance in Dense Linear Algebra Initial Experiences with the Tile QR Factorization on a Multi Core System CScADS Workshop on Automatic Tuning for Petascale Systems Snowbird,

More information

Equivalence between kinetic method for fluid-dynamic equation and macroscopic finite-difference scheme

Equivalence between kinetic method for fluid-dynamic equation and macroscopic finite-difference scheme Equivalence between kinetic method for fluid-dynamic equation and macroscopic finite-difference scheme Pietro Asinari (1), Taku Ohwada (2) (1) Department of Energetics, Politecnico di Torino, Torino 10129,

More information

High-Performance Scientific Computing

High-Performance Scientific Computing High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org

More information

CS 700: Quantitative Methods & Experimental Design in Computer Science

CS 700: Quantitative Methods & Experimental Design in Computer Science CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,

More information

CprE 281: Digital Logic

CprE 281: Digital Logic CprE 281: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Synchronous Sequential Circuits Basic Design Steps CprE 281: Digital Logic Iowa State University, Ames,

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

416 Distributed Systems

416 Distributed Systems 416 Distributed Systems RAID, Feb 26 2018 Thanks to Greg Ganger and Remzi Arapaci-Dusseau for slides Outline Using multiple disks Why have multiple disks? problem and approaches RAID levels and performance

More information

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT

More information

Lattice Boltzmann Method for Fluid Simulations

Lattice Boltzmann Method for Fluid Simulations 1 / 16 Lattice Boltzmann Method for Fluid Simulations Yuanxun Bill Bao & Justin Meskas Simon Fraser University April 7, 2011 2 / 16 Ludwig Boltzmann and His Kinetic Theory of Gases The Boltzmann Transport

More information

Scientific Computing II

Scientific Computing II Scientific Computing II Molecular Dynamics Simulation Michael Bader SCCS Summer Term 2015 Molecular Dynamics Simulation, Summer Term 2015 1 Continuum Mechanics for Fluid Mechanics? Molecular Dynamics the

More information

VEHICULAR TRAFFIC FLOW MODELS

VEHICULAR TRAFFIC FLOW MODELS BBCR Group meeting Fri. 25 th Nov, 2011 VEHICULAR TRAFFIC FLOW MODELS AN OVERVIEW Khadige Abboud Outline Introduction VANETs Why we need to know traffic flow theories Traffic flow models Microscopic Macroscopic

More information

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner

More information

IMPLEMENTING THE LATTICE-BOLTZMANN

IMPLEMENTING THE LATTICE-BOLTZMANN IMPLEMENTING THE LATTICE-BOLTZMANN METHOD A RESEARCH ON BOUNDARY CONDITION TECHNIQUES by S.C. Wetstein in partial fulfillment of the requirements for the degree of Bachelor of Science in Applied Physics

More information

The Blue Gene/P at Jülich Case Study & Optimization. W.Frings, Forschungszentrum Jülich,

The Blue Gene/P at Jülich Case Study & Optimization. W.Frings, Forschungszentrum Jülich, The Blue Gene/P at Jülich Case Study & Optimization W.Frings, Forschungszentrum Jülich, 26.08.2008 Jugene Case-Studies: Overview Case Study: PEPC Case Study: racoon Case Study: QCD CPU0CPU3 CPU1CPU2 2

More information

distributed approaches For Proportional and max-min fairness in random access ad-hoc networks

distributed approaches For Proportional and max-min fairness in random access ad-hoc networks distributed approaches For Proportional and max-min fairness in random access ad-hoc networks Xin Wang, Koushik Kar Rensselaer Polytechnic Institute OUTline Introduction Motivation and System model Proportional

More information

Power System Analysis Prof. A. K. Sinha Department of Electrical Engineering Indian Institute of Technology, Kharagpur. Lecture - 21 Power Flow VI

Power System Analysis Prof. A. K. Sinha Department of Electrical Engineering Indian Institute of Technology, Kharagpur. Lecture - 21 Power Flow VI Power System Analysis Prof. A. K. Sinha Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 21 Power Flow VI (Refer Slide Time: 00:57) Welcome to lesson 21. In this

More information

Parallel Transposition of Sparse Data Structures

Parallel Transposition of Sparse Data Structures Parallel Transposition of Sparse Data Structures Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng Department of Computer Science, Virginia Tech Niels Bohr Institute, University of Copenhagen Scientific Computing

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information