Performance Analysis of a List-Based Lattice-Boltzmann Kernel
|
|
- Clara McGee
- 5 years ago
- Views:
Transcription
1 Performance Analysis of a List-Based Lattice-Boltzmann Kernel First Talk MuCoSim, 29. June 2016 Michael Hußnätter RRZE HPC Group Friedrich-Alexander University of Erlangen-Nuremberg
2 Outline Lattice Boltzmann List-Based Data Layout Run Length Encoding Roofline Analysis 2
3 Lattice Boltzmann Overview (1) Originating from lattice gas automaton Discrete time steps and discrete particle grid Particles only reside at the grid nodes 3
4 Lattice Boltzmann Overview (1) Originating from lattice gas automaton Discrete time steps and discrete particle grid Particles only reside at the grid nodes Grid nodes are connected by velocity vectors (c α ) Particle distribution is changed in two-step approach Particle Distribution Functions (PDFs) aggregate particles (f α ) 4
5 Lattice Boltzmann Overview (2) Combining Cellular Gas Automaton and Boltzmann equation leads to: f α x + c α Δt, t + Δt f α x, t = ω f α f α eq where f α eq depends on macroscopic velocity and density of the lattice NW W D2Q9 N C NO O Easy implementation by two-step approach: SW S Stream step: f α x + c α Δt, t + Δt = f α x, t + Δt Collide step: f α x, t + Δt = f α (x, t) ω(f α f eq α ) SO 5
6 Lattice Boltzmann PDF Streaming Two possibilities for PDF streaming: pull scheme 6
7 Lattice Boltzmann PDF Streaming Two possibilities for PDF streaming: pull scheme push scheme 7
8 Lattice Boltzmann PDF Streaming Two possibilities for PDF streaming: pull scheme push scheme 8
9 Lattice Boltzmann No-Slip Boundary Reflecting PDFs into same cell but opposite direction: F S Fluid node Solid node time step t = 0 9
10 Lattice Boltzmann No-Slip Boundary Reflecting PDFs into same cell but opposite direction: F S Fluid node Solid node time step t =
11 Lattice Boltzmann No-Slip Boundary Reflecting PDFs into same cell but opposite direction: F S Fluid node Solid node time step t = 1 11
12 Lattice Boltzmann Data Layout Grid Section F Fluid node S Solid node Cell Storage N N N N N N N N N N N N N N N S S S S S S S S S S S S S S S Field data layout (SoA) Easy address calculation for neighboring PDFs Source and destination cell storage 12
13 Lattice Boltzmann Simple Kernel (1) foreach cell in cellstorage do (2) if cell is fluidcell then (3) stream collide (4) end (5) end (6) swap cell storages 13
14 List-Layout Motivation LBM performance usually limited by memory capacity and memory bandwidth direct addressing scheme wastes valuable memory resources when it comes to complex domains Goal: Reduce memory requirements by omitting non-fluid cells which at the same time eliminates if in main loop Challenge: Convenient address calculation is lost (Godenschwager) 14
15 List-Layout Basics Grid Section F Fluid node S Solid node Cell Storage N N N N N N N N N Adjacency List N* N* N* N* N* N* N* N* N* 15
16 List-Layout Basics Grid Section F Fluid node S Solid node Cell Storage N N N N N N N N N Adjacency List N* N* N* N* N* N* N* N* N* 16
17 List-Layout Basics Grid Section F Fluid node S Solid node Cell Storage N N N N N N N N N Adjacency List N* N* N* N* N* N* N* N* N* 17
18 List-Layout Basics Grid Section F Fluid node S Solid node Cell Storage N N N N N N N N N Adjacency List N* N* N* N* N* N* N* N* N* 18
19 List-Layout No-Slip Boundary No-Slip without any intermediate time step: Cell Storage N NE E SE S SW W NW N* NE* E* SE* S* SW* W* NW* Adjacency List 19
20 List-Layout No-Slip Boundary No-Slip without any intermediate time step: Cell Storage N NE E SE S SW W NW N* NE* E* SE* S* SW* W* NW* Adjacency List 20
21 List-Layout Kernel (1) foreach cell in cellstorage do (2) get pullpointers from adjacencylist (3) stream collide (4) end (5) swap cell storages 21
22 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* W* W* W* W* W* W* RLE List
23 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* W* W* W* W* W* W* RLE List
24 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* RLE List
25 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* RLE List
26 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* RLE List
27 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* RLE List
28 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* RLE List
29 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* RLE List
30 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List RLE List W* W* W* W* W* W* 30
31 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List RLE List W* W* W* W* W* W* 31
32 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List RLE List W* W* W* W* W* W* 32
33 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List RLE List W* W* W* W* W* W* 33
34 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List RLE List W* W* W* W* W* W* 34
35 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List RLE List W* W* W* W* W* W* 35
36 List-Layout Run Length Encoding Grid Section F S Fluid node Solid node Cell Storage E E E E E E W W W W W W Adjacency List E* E* E* E* E* E* W* W* W* W* W* W* RLE List
37 List-Layout Kernel with RLE (1) foreach rleblock in rlelist do // RLE loop (2) get pullpointers from adjacencylist (3) foreach cell in rleblock do // one macroscopic loop (4) calculate macroscopic values (5) end (6) foreach cell in rleblock do // nine collide loops (7) collide and store directions pairwise (8) end (9) end (10) swap cellstorages 37
38 Roofline Analysis
39 Roofline Emmy s Characteristics Maximal floating point performance for operands in L1: 2 load ports, 1 store ports, 1 cy throughput per add (mul) 2.2 GHz delivers 88 GFLOP/s Achievable memory bandwidth: Determined on full socket with likwid-bench s copy_avx and yielded 40.6 GByte/s 39
40 GFLOP/s Roofline Determining Bottleneck Roofline Estimation 1 1/16 1/ Operational Intensity [FLOP/Byte] 40
41 GFLOP/s Roofline Determining Bottleneck Roofline Estimation 1 1/16 1/ Operational Intensity [FLOP/Byte] 198 FLOP / LUP 3 * 19 * 8 Byte / LUP = 456 Byte / LUP Operational Intensity: 0.43 FLOP / Byte 41
42 GFLOP/s Roofline Determining Bottleneck Roofline Estimation 1 1/16 1/ Operational Intensity [FLOP/Byte] 198 FLOP / LUP 3 * 19 * 8 Byte / LUP = 456 Byte / LUP Operational Intensity: 0.43 FLOP / Byte 42
43 GFLOP/s Roofline Determining Bottleneck Mem Limit 2 Roofline Estimation 1 1/16 1/ Operational Intensity [FLOP/Byte] 198 FLOP / LUP 3 * 19 * 8 Byte / LUP = 456 Byte / LUP Operational Intensity: 0.43 FLOP / Byte 43
44 Roofline FLOP vs FLUP Lattice Boltzmann: More FLOPs will not neccessary lead to shorter time to solution FLOPs per lattice update highly depend on implemenation Fluid Lattice UPdate(s) per second introduced for comparable results Considered implementation requires 456 Byte per FLUP Adapted Roofline performance estimation based on achievable memory bandwidth for certain number of cores 44
45 Roofline TestCase: Channel 25,000,000 cells High pressure boundary (green) Low pressure boundary (red) 45
46 GByte/s Roofline Emmy s MemBandwidth 50 Theoretical Limit 1600 MHz Quad-Channel copy_avx 1 Load / 1 Store copy_avx 19 Load / 1 Store # cores 46
47 MFLUP/s Roofline Performance Evaluation # cores Roofline Roofline 19/1 List LBM 47
48 Upcoming Talk Overview Short recap of Lattice Boltzmann 48
49 Upcoming Talk Overview Short recap of Lattice Boltzmann Detailed ECM performance estimation and evaluation for IvyBridge and Haswell 49
50 };
51 Backup Slide SoA vs AoS Struct of Arrays (SoA) C C C N N N S S S Array of Structs (AoS) C N S W E NW NE SW SE C N S 51
A Framework for Hybrid Parallel Flow Simulations with a Trillion Cells in Complex Geometries
A Framework for Hybrid Parallel Flow Simulations with a Trillion Cells in Complex Geometries SC13, November 21 st 2013 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler, Ulrich
More informationERLANGEN REGIONAL COMPUTING CENTER
ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,
More informationLattice Boltzmann simulations on heterogeneous CPU-GPU clusters
Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts
More informationMore Science per Joule: Bottleneck Computing
More Science per Joule: Bottleneck Computing Georg Hager Erlangen Regional Computing Center (RRZE) University of Erlangen-Nuremberg Germany PPAM 2013 September 9, 2013 Warsaw, Poland Motivation (1): Scalability
More informationSome thoughts about energy efficient application execution on NEC LX Series compute clusters
Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science
More informationOn Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code
On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance
More informationExploring performance and power properties of modern multicore chips via simple machine models
Exploring performance and power properties of modern multicore chips via simple machine models G. Hager, J. Treibig, J. Habich, and G. Wellein Erlangen Regional Computing Center (RRZE) Martensstr. 1, 9158
More informationCOMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD
XVIII International Conference on Water Resources CMWR 2010 J. Carrera (Ed) c CIMNE, Barcelona, 2010 COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD James.E. McClure, Jan F. Prins
More informationPerformance Evaluation of Scientific Applications on POWER8
Performance Evaluation of Scientific Applications on POWER8 2014 Nov 16 Andrew V. Adinetz 1, Paul F. Baumeister 1, Hans Böttiger 3, Thorsten Hater 1, Thilo Maurer 3, Dirk Pleiter 1, Wolfram Schenck 4,
More informationApplications of Lattice Boltzmann Methods
Applications of Lattice Boltzmann Methods Dominik Bartuschat, Martin Bauer, Simon Bogner, Christian Godenschwager, Florian Schornbaum, Ulrich Rüde Erlangen, Germany March 1, 2016 NUMET 2016 D.Bartuschat,
More informationSimulation of floating bodies with lattice Boltzmann
Simulation of floating bodies with lattice Boltzmann by Simon Bogner, 17.11.2011, Lehrstuhl für Systemsimulation, Friedrich-Alexander Universität Erlangen 1 Simulation of floating bodies with lattice Boltzmann
More informationGPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications
GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign
More informationOn the Use of a Many core Processor for Computational Fluid Dynamics Simulations
On the Use of a Many core Processor for Computational Fluid Dynamics Simulations Sebastian Raase, Tomas Nordström Halmstad University, Sweden {sebastian.raase,tomas.nordstrom} @ hh.se Preface based on
More information591 TFLOPS Multi-TRILLION Particles Simulation on SuperMUC
International Supercomputing Conference 2013 591 TFLOPS Multi-TRILLION Particles Simulation on SuperMUC W. Eckhardt TUM, A. Heinecke TUM, R. Bader LRZ, M. Brehm LRZ, N. Hammer LRZ, H. Huber LRZ, H.-G.
More informationExploiting In-Memory Processing Capabilities for Density Functional Theory Applications
Exploiting In-Memory Processing Capabilities for Density Functional Theory Applications 2016 Aug 23 P. F. Baumeister, T. Hater, D. Pleiter H. Boettiger, T. Maurer, J. R. Brunheroto Contributors IBM R&D
More informationMeasuring freeze-out parameters on the Bielefeld GPU cluster
Measuring freeze-out parameters on the Bielefeld GPU cluster Outline Fluctuations and the QCD phase diagram Fluctuations from Lattice QCD The Bielefeld hybrid GPU cluster Freeze-out conditions from QCD
More informationP214 Efficient Computation of Passive Seismic Interferometry
P214 Efficient Computation of Passive Seismic Interferometry J.W. Thorbecke* (Delft University of Technology) & G.G. Drijkoningen (Delft University of Technology) SUMMARY Seismic interferometry is from
More informationAccelerating Quantum Chromodynamics Calculations with GPUs
Accelerating Quantum Chromodynamics Calculations with GPUs Guochun Shi, Steven Gottlieb, Aaron Torok, Volodymyr Kindratenko NCSA & Indiana University National Center for Supercomputing Applications University
More informationSimulation of Lid-driven Cavity Flow by Parallel Implementation of Lattice Boltzmann Method on GPUs
Simulation of Lid-driven Cavity Flow by Parallel Implementation of Lattice Boltzmann Method on GPUs S. Berat Çelik 1, Cüneyt Sert 2, Barbaros ÇETN 3 1,2 METU, Mechanical Engineering, Ankara, TURKEY 3 METU-NCC,
More informationGPU-accelerated Computing at Scale. Dirk Pleiter I GTC Europe 10 October 2018
GPU-accelerated Computing at Scale irk Pleiter I GTC Europe 10 October 2018 Outline Supercomputers at JSC Future science challenges Outlook and conclusions 2 3 Supercomputers at JSC JUQUEEN (until 2018)
More informationLecture 19. Architectural Directions
Lecture 19 Architectural Directions Today s lecture Advanced Architectures NUMA Blue Gene 2010 Scott B. Baden / CSE 160 / Winter 2010 2 Final examination Announcements Thursday, March 17, in this room:
More informationEfficient implementation of the overlap operator on multi-gpus
Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator
More informationCase Study: Quantum Chromodynamics
Case Study: Quantum Chromodynamics Michael Clark Harvard University with R. Babich, K. Barros, R. Brower, J. Chen and C. Rebbi Outline Primer to QCD QCD on a GPU Mixed Precision Solvers Multigrid solver
More informationExternal and Internal Incompressible Viscous Flows Computation using Taylor Series Expansion and Least Square based Lattice Boltzmann Method
Available online at http://ijim.srbiau.ac.ir/ Int. J. Industrial Mathematics (ISSN 2008-5621) Vol. 10, No. 2, 2018 Article ID IJIM-00726, 8 pages Research Article External and Internal Incompressible Viscous
More informationDrag Force Simulations of Particle Agglomerates with the Lattice-Boltzmann Method
Drag Force Simulations of Particle Agglomerates with the Lattice-Boltzmann Method Christian Feichtinger, Nils Thuerey, Ulrich Ruede Christian Binder, Hans-Joachim Schmid, Wolfgang Peukert Friedrich-Alexander-Universität
More informationPorting a sphere optimization program from LAPACK to ScaLAPACK
Porting a sphere optimization program from LAPACK to ScaLAPACK Mathematical Sciences Institute, Australian National University. For presentation at Computational Techniques and Applications Conference
More informationWindow-aware Load Shedding for Aggregation Queries over Data Streams
Window-aware Load Shedding for Aggregation Queries over Data Streams Nesime Tatbul Stan Zdonik Talk Outline Background Load shedding in Aurora Windowed aggregation queries Window-aware load shedding Experimental
More informationClaude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique
Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)
More informationA simple Concept for the Performance Analysis of Cluster-Computing
A simple Concept for the Performance Analysis of Cluster-Computing H. Kredel 1, S. Richling 2, J.P. Kruse 3, E. Strohmaier 4, H.G. Kruse 1 1 IT-Center, University of Mannheim, Germany 2 IT-Center, University
More informationA hierarchical Model for the Analysis of Efficiency and Speed-up of Multi-Core Cluster-Computers
A hierarchical Model for the Analysis of Efficiency and Speed-up of Multi-Core Cluster-Computers H. Kredel 1, H. G. Kruse 1 retired, S. Richling2 1 IT-Center, University of Mannheim, Germany 2 IT-Center,
More informationNumerical Characterization of Multi-Dielectric Green s Function for 3-D Capacitance Extraction with Floating Random Walk Algorithm
Numerical Characterization of Multi-Dielectric Green s Function for 3-D Capacitance Extraction with Floating Random Walk Algorithm Hao Zhuang 1, 2, Wenjian Yu 1 *, Gang Hu 1, Zuochang Ye 3 1 Department
More informationCompiling Techniques
Lecture 11: Introduction to 13 November 2015 Table of contents 1 Introduction Overview The Backend The Big Picture 2 Code Shape Overview Introduction Overview The Backend The Big Picture Source code FrontEnd
More informationNumerical Simulation Of Pore Fluid Flow And Fine Sediment Infiltration Into The Riverbed
City University of New York (CUNY) CUNY Academic Works International Conference on Hydroinformatics 8-1-2014 Numerical Simulation Of Pore Fluid Flow And Fine Sediment Infiltration Into The Riverbed Tobias
More information2.5D algorithms for distributed-memory computing
ntroduction for distributed-memory computing C Berkeley July, 2012 1/ 62 ntroduction Outline ntroduction Strong scaling 2.5D factorization 2/ 62 ntroduction Strong scaling Solving science problems faster
More informationResearch of Micro-Rectangular-Channel Flow Based on Lattice Boltzmann Method
Research Journal of Applied Sciences, Engineering and Technology 6(14): 50-55, 013 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 013 Submitted: November 08, 01 Accepted: December 8,
More informationMultiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU
Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Khramtsov D.P., Nekrasov D.A., Pokusaev B.G. Department of Thermodynamics, Thermal Engineering and Energy Saving Technologies,
More information- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline
- Part 4 - Multicore and Manycore Technology: Chances and Challenges Vincent Heuveline 1 Numerical Simulation of Tropical Cyclones Goal oriented adaptivity for tropical cyclones ~10⁴km ~1500km ~100km 2
More informationA CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method
A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method Jee Choi 1, Aparna Chandramowlishwaran 3, Kamesh Madduri 4, and Richard Vuduc 2 1 ECE, Georgia Tech 2 CSE, Georgia
More informationEXTENDED FREE SURFACE FLOW MODEL BASED ON THE LATTICE BOLTZMANN APPROACH
METALLURGY AND FOUNDRY ENGINEERING Vol. 36, 2010, No. 2 Micha³ Szucki*, Józef S. Suchy***, Pawe³ ak*, Janusz Lelito**, Beata Gracz* EXTENDED FREE SURFACE FLOW MODEL BASED ON THE LATTICE BOLTZMANN APPROACH
More informationQuickCheck. Collisions between molecules. Collisions between molecules
Collisions between molecules We model molecules as rigid spheres of radius r as shown at the right. The mean free path of a molecule is the average distance it travels between collisions. The average time
More informationarxiv: v1 [hep-lat] 8 Nov 2014
Staggered Dslash Performance on Intel Xeon Phi Architecture arxiv:1411.2087v1 [hep-lat] 8 Nov 2014 Department of Physics, Indiana University, Bloomington IN 47405, USA E-mail: ruizli AT umail.iu.edu Steven
More informationChapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 6.8 Shortest Paths Shortest Paths Shortest path problem. Given a directed graph G = (V,
More informationRouting Algorithms. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur
Routing Algorithms CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Main Features Table Computation The routing tables must be computed
More informationPorting RSL to C++ Ryusuke Villemin, Christophe Hery. Pixar Technical Memo 12-08
Porting RSL to C++ Ryusuke Villemin, Christophe Hery Pixar Technical Memo 12-08 1 Introduction In a modern renderer, relying on recursive ray-tracing, the number of shader calls increases by one or two
More informationCollisions between molecules
Collisions between molecules We model molecules as rigid spheres of radius r as shown at the right. The mean free path of a molecule is the average distance it travels between collisions. The average time
More informationTwo case studies of Monte Carlo simulation on GPU
Two case studies of Monte Carlo simulation on GPU National Institute for Computational Sciences University of Tennessee Seminar series on HPC, Feb. 27, 2014 Outline 1 Introduction 2 Discrete energy lattice
More informationJanus: FPGA Based System for Scientific Computing Filippo Mantovani
Janus: FPGA Based System for Scientific Computing Filippo Mantovani Physics Department Università degli Studi di Ferrara Ferrara, 28/09/2009 Overview: 1. The physical problem: - Ising model and Spin Glass
More informationAn Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))
An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8)) Tung Chou January 5, 2012 QUAD Stream cipher. Security relies on MQ (Multivariate Quadratics). QUAD The Provably-secure QUAD(q, n, r) Stream Cipher
More informationParallel Simulations of Self-propelled Microorganisms
Parallel Simulations of Self-propelled Microorganisms K. Pickl a,b M. Hofmann c T. Preclik a H. Köstler a A.-S. Smith b,d U. Rüde a,b ParCo 2013, Munich a Lehrstuhl für Informatik 10 (Systemsimulation),
More informationLattice Boltzmann model for the Elder problem
1549 Lattice Boltzmann model for the Elder problem D.T. Thorne a and M.C. Sukop a a Department of Earth Sciences, Florida International University, PC 344, University Park, 11200 SW 8th Street, Miami,
More informationLattice Boltzmann Method for Moving Boundaries
Lattice Boltzmann Method for Moving Boundaries Hans Groot March 18, 2009 Outline 1 Introduction 2 Moving Boundary Conditions 3 Cylinder in Transient Couette Flow 4 Collision-Advection Process for Moving
More informationMatrix Assembly in FEA
Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,
More informationCactus Tools for Petascale Computing
Cactus Tools for Petascale Computing Erik Schnetter Reno, November 2007 Gamma Ray Bursts ~10 7 km He Protoneutron Star Accretion Collapse to a Black Hole Jet Formation and Sustainment Fe-group nuclei Si
More informationONE DIMENSIONAL CELLULAR AUTOMATA(CA). By Bertrand Rurangwa
ONE DIMENSIONAL CELLULAR AUTOMATA(CA). By Bertrand Rurangwa bertrand LUT, 21May2010 Cellula automata(ca) OUTLINE - Introduction. -Short history. -Complex system. -Why to study CA. -One dimensional CA.
More informationMaxim > Design Support > Technical Documents > Application Notes > Battery Management > APP 131
Maxim > Design Support > Technical Documents > Application Notes > Battery Management > APP 131 Keywords: battery fuel gauge, battery monitors, integrated circuits, ICs, coulomb counter, Li-Ion battery
More informationSimulation of T-junction using LBM and VOF ENERGY 224 Final Project Yifan Wang,
Simulation of T-junction using LBM and VOF ENERGY 224 Final Project Yifan Wang, yfwang09@stanford.edu 1. Problem setting In this project, we present a benchmark simulation for segmented flows, which contain
More informationAlgorithms: Lecture 12. Chalmers University of Technology
Algorithms: Lecture 1 Chalmers University of Technology Today s Topics Shortest Paths Network Flow Algorithms Shortest Path in a Graph Shortest Path Problem Shortest path network. Directed graph G = (V,
More informationMitchell Chapter 10. Living systems are open systems that exchange energy, materials & information
Living systems compute Mitchell Chapter 10 Living systems are open systems that exchange energy, materials & information E.g. Erwin Shrodinger (1944) & Lynn Margulis (2000) books: What is Life? discuss
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationThe Finite Cell Method: High order simulation of complex structures without meshing
The Finite Cell Method: High order simulation of complex structures without meshing E. Rank, A. Düster, D. Schillinger, Z. Yang Fakultät für Bauingenieur und Vermessungswesen Technische Universität München,
More informationGas Turbine Technologies Torino (Italy) 26 January 2006
Pore Scale Mesoscopic Modeling of Reactive Mixtures in the Porous Media for SOFC Application: Physical Model, Numerical Code Development and Preliminary Validation Michele CALI, Pietro ASINARI Dipartimento
More informationarxiv: v1 [cs.pf] 5 Mar 2018
On the accuracy and usefulness of analytic energy models for contemporary multicore processors Johannes Hofmann 1, Georg Hager 2, and Dietmar Fey 1 arxiv:183.1618v1 [cs.pf] 5 Mar 218 1 Computer Architecture,
More informationLattice Boltzmann fluid-dynamics on the QPACE supercomputer
, Procedia Procedia Computer Computer Science Science 1 00 (2010) (2010) 1069 1076 1 8 Procedia Computer Science www.elsevier.com/locate/procedia International Conference on Computational Science, ICCS
More informationArray-of-Struct particles for ipic3d on MIC. Alec Johnson and Giovanni Lapenta. EASC2014 Stockholm, Sweden April 3, 2014
Array-of-Struct particles for ipic3d on MIC Alec Johnson and Giovanni Lapenta Centre for mathematical Plasma Astrophysics Mathematics Department KU Leuven, Belgium EASC2014 Stockholm, Sweden April 3, 2014
More informationNCU EE -- DSP VLSI Design. Tsung-Han Tsai 1
NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using
More informationPedestrian traffic models
December 1, 2014 Table of contents 1 2 3 to Pedestrian Dynamics Pedestrian dynamics two-dimensional nature should take into account interactions with other individuals that might cross walking path interactions
More informationLecture #3. Review: Power
Lecture #3 OUTLINE Power calculations Circuit elements Voltage and current sources Electrical resistance (Ohm s law) Kirchhoff s laws Reading Chapter 2 Lecture 3, Slide 1 Review: Power If an element is
More informationCellular Automata CS 591 Complex Adaptive Systems Spring Professor: Melanie Moses 2/02/09
Cellular Automata CS 591 Complex Adaptive Systems Spring 2009 Professor: Melanie Moses 2/02/09 Introduction to Cellular Automata (CA) Invented by John von Neumann (circa~1950). A cellular automata consists
More informationA Mathematical Model of the Skype VoIP Congestion Control Algorithm
A Mathematical Model of the Skype VoIP Congestion Control Algorithm Luca De Cicco, S. Mascolo, V. Palmisano Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari 47th IEEE Conference on Decision
More informationOpen-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters --
Parallel Processing for Energy Efficiency October 3, 2013 NTNU, Trondheim, Norway Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer
More informationThe Lattice Boltzmann Method for Laminar and Turbulent Channel Flows
The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows Vanja Zecevic, Michael Kirkpatrick and Steven Armfield Department of Aerospace Mechanical & Mechatronic Engineering The University of
More informationHow do Wireless Chains Behave? The Impact of MAC Interactions
The Impact of MAC Interactions S. Razak 1 Vinay Kolar 2 N. Abu-Ghazaleh 1 K. Harras 1 1 Department of Computer Science Carnegie Mellon University, Qatar 2 Department of Wireless Networks RWTH Aachen University,
More informationNumber Representation and Waveform Quantization
1 Number Representation and Waveform Quantization 1 Introduction This lab presents two important concepts for working with digital signals. The first section discusses how numbers are stored in memory.
More informationImprovements for Implicit Linear Equation Solvers
Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often
More informationCommunication avoiding parallel algorithms for dense matrix factorizations
Communication avoiding parallel dense matrix factorizations 1/ 44 Communication avoiding parallel algorithms for dense matrix factorizations Edgar Solomonik Department of EECS, UC Berkeley October 2013
More informationUsing OpenMP on a Hydrodynamic Lattice-Boltzmann Code
Using OpenMP on a Hydrodynamic Lattice-Boltzmann Code Gino Bella Nicola Rossi Salvatore Filippone Stefano Ubertini Università degli Studi di Roma Tor Vergata 1 Introduction The motion of a uid ow is governed
More informationarxiv: v2 [math.na] 21 Aug 2016
GPU-ACCELERATED BERNSTEIN-BEZIER DISCONTINUOUS GALERKIN METHODS FOR WAVE PROBLEMS JESSE CHAN AND T WARBURTON arxiv:15165v [mathna] 1 Aug 16 Abstract We evaluate the computational performance of the Bernstein-Bezier
More information5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)
5.1 Banded Storage u = temperature u= u h temperature at gridpoints u h = 1 u= Laplace s equation u= h u = u h = grid size u=1 The five-point difference operator 1 u h =1 uh (x + h, y) 2u h (x, y)+u h
More informationCSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms
CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms Professor Henry Carter Fall 2016 Recap Space-time tradeoffs allow for faster algorithms at the cost of space complexity overhead Dynamic
More informationarxiv: v1 [cs.dc] 4 Sep 2014
and NVIDIA R GPUs arxiv:1409.1510v1 [cs.dc] 4 Sep 2014 O. Kaczmarek, C. Schmidt and P. Steinbrecher Fakultät für Physik, Universität Bielefeld, D-33615 Bielefeld, Germany E-mail: okacz, schmidt, p.steinbrecher@physik.uni-bielefeld.de
More informationPower Allocation and Coverage for a Relay-Assisted Downlink with Voice Users
Power Allocation and Coverage for a Relay-Assisted Downlink with Voice Users Junjik Bae, Randall Berry, and Michael L. Honig Department of Electrical Engineering and Computer Science Northwestern University,
More informationME615 Project Presentation Aeroacoustic Simulations using Lattice Boltzmann Method
ME615 Project Presentation Aeroacoustic Simulations using Lattice Boltzmann Method Kameswararao Anupindi Graduate Research Assistant School of Mechanical Engineering Purdue Universit December 11, 11 Outline...
More informationModeling and Tuning Parallel Performance in Dense Linear Algebra
Modeling and Tuning Parallel Performance in Dense Linear Algebra Initial Experiences with the Tile QR Factorization on a Multi Core System CScADS Workshop on Automatic Tuning for Petascale Systems Snowbird,
More informationEquivalence between kinetic method for fluid-dynamic equation and macroscopic finite-difference scheme
Equivalence between kinetic method for fluid-dynamic equation and macroscopic finite-difference scheme Pietro Asinari (1), Taku Ohwada (2) (1) Department of Energetics, Politecnico di Torino, Torino 10129,
More informationHigh-Performance Scientific Computing
High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org
More informationCS 700: Quantitative Methods & Experimental Design in Computer Science
CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,
More informationCprE 281: Digital Logic
CprE 281: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Synchronous Sequential Circuits Basic Design Steps CprE 281: Digital Logic Iowa State University, Ames,
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More information416 Distributed Systems
416 Distributed Systems RAID, Feb 26 2018 Thanks to Greg Ganger and Remzi Arapaci-Dusseau for slides Outline Using multiple disks Why have multiple disks? problem and approaches RAID levels and performance
More informationQR Factorization of Tall and Skinny Matrices in a Grid Computing Environment
QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT
More informationLattice Boltzmann Method for Fluid Simulations
1 / 16 Lattice Boltzmann Method for Fluid Simulations Yuanxun Bill Bao & Justin Meskas Simon Fraser University April 7, 2011 2 / 16 Ludwig Boltzmann and His Kinetic Theory of Gases The Boltzmann Transport
More informationScientific Computing II
Scientific Computing II Molecular Dynamics Simulation Michael Bader SCCS Summer Term 2015 Molecular Dynamics Simulation, Summer Term 2015 1 Continuum Mechanics for Fluid Mechanics? Molecular Dynamics the
More informationVEHICULAR TRAFFIC FLOW MODELS
BBCR Group meeting Fri. 25 th Nov, 2011 VEHICULAR TRAFFIC FLOW MODELS AN OVERVIEW Khadige Abboud Outline Introduction VANETs Why we need to know traffic flow theories Traffic flow models Microscopic Macroscopic
More informationBandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)
Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner
More informationIMPLEMENTING THE LATTICE-BOLTZMANN
IMPLEMENTING THE LATTICE-BOLTZMANN METHOD A RESEARCH ON BOUNDARY CONDITION TECHNIQUES by S.C. Wetstein in partial fulfillment of the requirements for the degree of Bachelor of Science in Applied Physics
More informationThe Blue Gene/P at Jülich Case Study & Optimization. W.Frings, Forschungszentrum Jülich,
The Blue Gene/P at Jülich Case Study & Optimization W.Frings, Forschungszentrum Jülich, 26.08.2008 Jugene Case-Studies: Overview Case Study: PEPC Case Study: racoon Case Study: QCD CPU0CPU3 CPU1CPU2 2
More informationdistributed approaches For Proportional and max-min fairness in random access ad-hoc networks
distributed approaches For Proportional and max-min fairness in random access ad-hoc networks Xin Wang, Koushik Kar Rensselaer Polytechnic Institute OUTline Introduction Motivation and System model Proportional
More informationPower System Analysis Prof. A. K. Sinha Department of Electrical Engineering Indian Institute of Technology, Kharagpur. Lecture - 21 Power Flow VI
Power System Analysis Prof. A. K. Sinha Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 21 Power Flow VI (Refer Slide Time: 00:57) Welcome to lesson 21. In this
More informationParallel Transposition of Sparse Data Structures
Parallel Transposition of Sparse Data Structures Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng Department of Computer Science, Virginia Tech Niels Bohr Institute, University of Copenhagen Scientific Computing
More informationDirect Self-Consistent Field Computations on GPU Clusters
Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd
More information