On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code

Size: px
Start display at page:

Download "On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code"

Transcription

1 On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance Computing August 26, 2014 Porto, Portugal E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

2 Outline 1 LBM at glance, D2Q37 model 2 OpenCL 3 Implementation details 4 Results and conclusions We addressed the issue of portability of code across several computing architectures preserving performances. E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

3 The D2Q37 Lattice Boltzmann Model Lattice Boltzmann method (LBM) is a class of computational fluid dynamics (CFD) methods simulation of synthetic dynamics described by the discrete Boltzmann equation, instead of the Navier-Stokes equations a set of virtual particles called populations arranged at edges of a discrete and regular grid interacting by propagation and collision reproduce after appropriate averaging the dynamics of fluids D2Q37 is a D2 model with 37 components of velocity (populations) suitable to study behaviour of compressible gas and fluids optionally in presence of combustion 1 effects correct treatment of Navier-Stokes, heat transport and perfect-gas (P = ρt ) equations 1 chemical reactions turning cold-mixture of reactants into hot-mixture of burnt product. E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

4 Computational Scheme of LBM foreach time step foreach lattice point propagate ( ) ; endfor foreach lattice point collide ( ) ; endfor endfor Embarassing parallelism All sites can be processed in parallel applying in sequence propagate and collide. Challenge Design an efficient implementation able exploit a large fraction of available peak performance. E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

5 D2Q37: propagation scheme perform accesses to neighbour-cells at distance 1,2, and 3 generate memory-accesses with sparse addressing patterns E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

6 D2Q37: boundary-conditions After propagation, boundary conditions are enforced at top and bottom edges of the lattice. 2D lattice with period-boundaries along X-direction at the top and the bottom boundary conditions are enforced: to adjust some values at sites y = and y = N y 3... N y 1 e.g. set vertical velocity to zero At left and and right edges we apply periodic boundary conditions. E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

7 D2Q37 collision collision is computed at each lattice-cell after computation of boundary conditions computational intensive: for the D2Q37 model requires 7500 DP floating-point operations completely local: arithmetic operations require only the populations associate to the site computation of propagate and collide kernels are kept separate after propagate but before collide we may need to perform collective operations (e.g. divergence of of the velocity field) if we include computations conbustion effects. E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

8 Open Computing Language (OpenCL) programming framework for heterogenous architectures: CPU+accelerators computing model: host-code plus one or more kernels running on accelerators kernels are executed by a set of work-items each processing an item of the data-set (data-parallelism) work-items are grouped into work-groups, each executed by a compute-unit and processing K work-items in parallel using vector instructions e.g.: on Xeon-Phi work-groups are mapped on (virtual-)cores processing each up to 8 double-precisions floting-point data memory model identifies a hierarchy of four spaces which differ for size and access-time : private, local, global and constant memory OCL aims to guarantee portability of both code and performances across several architectures E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

9 OCL Saxpy kernel C = s A B, s R, A, B, C R n kernel void saxpy ( global double A, global double B, global double C, const double s ) { / / get g l o b a l thread ID int id = get_global_id ( 0 ) ; } C [ id ] = s A [ id ] + B [ id ] ; each work-item executes the saxpy kernel computing just one data-item of the output array first it computes its unique global identifier id and then uses it to address the id th data-item of arrays A, B and C. E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

10 Memory layout for LB : AoS vs SoA / / l a t t i c e stored as AoS : typedef struct { double p1 ; / / p o p u l a t i o n 1 double p2 ; / / p o p u l a t i o n 2... double p37 ; / / p o p u l a t i o n 37 } pop_t ; pop_t lattice2d [ SIZEX SIZEY ] ; AoS: corresponding populations of different sites are interleaved, causing strided memory-access and leading to coalescing issues. / / l a t t i c e stored as AoS : typedef struct { double p1 [ SIZEX SIZEY ] ; / / p o p u l a t i o n 1 array double p2 [ SIZEX SIZEY ] ; / / p o p u l a t i o n 2 array... double p37 [ SIZEX SIZEY ] ; / / p o p u l a t i o n 37 array } pop_t ; pop_t lattice2d ; SoA: corresponding populations of different sites are allocated at contiguous memory addresses, enabling coalescing of accesses, and making use of full memory bandwidth. E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

11 Grids Layout Uni-dimensional array of NTHREADS, each thread processing one lattice site. Example: physical lattice of cells; the size of work-groups is 1 4. L y = α N wi, α N; (L y L x )/N wi = N wg E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

12 Hardware used: Eurora prototype Eurora (Eurotech and Cineca) Hot water cooling system Deliver 3,209 MFLOPs per Watt of sustained performance 1 st in the Green500 of June 2013 Computing Nodes: 64 Processor Type: Intel Xeon 2.10GHz Intel Xeon 3.10GHz Accelerator Type: MIC - Intel Xeon-Phi 5120D GPU - NVIDIA Tesla K20x E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

13 OpenCL Benchmark of Propagate (Xeon-Phi) Performance of propagate as function of the number of work-items N wi per work-group, and the number of work-groups N wg. E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

14 OpenCL Benchmark of Collide (Xeon-Phi) Performance of collide as function of the number of work-items N wi per work-group, and the number of work-groups N wg. E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

15 2 x NVIDIA K20s GPU CUDA OpenCL Run time on 2 x GPU (NVIDIA K20s) [msec] per iteration Propagate BC Collide E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

16 2 x Intel Xeon Phi MIC C OpenCL Run time on 2 x MIC (Intel Xeon Phi) [msec] per iteration Propagate BC Collide E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

17 Propagate [msec] per iteration C C Opt. CUDA OpenCL Run time (Propagate x2048 lattice) 0 MIC GPU CPU2 CPU3 E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

18 Collide C C Opt. CUDA OpenCL Run time (Collide x2048 lattice) [msec] per iteration MIC GPU CPU2 CPU3 E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

19 Scalability on Eurora Nodes Weak regime lattice size: No_devices. Strong regime lattice size: E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

20 Limitations to strong scalability E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

21 Conclusions 1 we have presented an OCL implementation of a fluid-dynamic simulation based on Lattice Boltzmann methods 2 code portability: it has been succesfully ported and run on several computing architectures, including CPU, GPU and MIC systems 3 performance portability: results are of the same level of codes written using more native programming frameworks, such as CUDA or C 4 the good news: this results make OpenCL a good framework to develop code, easily portable across several architecture preserving performances 5 the bad news: not all vendors are today commited to support this standard because considered a low-level approach. E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

22 Simulation of the Rayleigh-Taylor (RT) Instability Instability at the interface of two fluids of different densities triggered by gravity. A cold-dense fluid over a less dense and warmer fluid triggers an instability that mixes the two fluid-regions (till equilibrium is reached). E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

23 Acknowledgments Luca Biferale, Mauro Sbragaglia, Patrizio Ripesi University of Tor Vergata and INFN Roma, Italy Andrea Scagliarini University of Barcelona, Spain Filippo Mantovani BSC institute, Spain Enrico Calore, Sebastiano Fabio Schifano, Raffaele Tripiccione, University and INFN of Ferrara, Italy Federico Toschi Eindhoven University of Technology The Netherlands, and CNR-IAC, Roma Italy This work has been performed in the framework of the INFN COKA and SUMA projects. We would like to thank CINECA (ITALY) and JSC (GERMANY) institutes for access to their systems. E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

24 Thanks for Your attention E. Calore (INFN of Ferrara) Portability, performance and scalability UCHPC, August 26, / 24

Performance Evaluation of Scientific Applications on POWER8

Performance Evaluation of Scientific Applications on POWER8 Performance Evaluation of Scientific Applications on POWER8 2014 Nov 16 Andrew V. Adinetz 1, Paul F. Baumeister 1, Hans Böttiger 3, Thorsten Hater 1, Thilo Maurer 3, Dirk Pleiter 1, Wolfram Schenck 4,

More information

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information

COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD

COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD XVIII International Conference on Water Resources CMWR 2010 J. Carrera (Ed) c CIMNE, Barcelona, 2010 COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD James.E. McClure, Jan F. Prins

More information

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Khramtsov D.P., Nekrasov D.A., Pokusaev B.G. Department of Thermodynamics, Thermal Engineering and Energy Saving Technologies,

More information

arxiv: v1 [hep-lat] 7 Oct 2010

arxiv: v1 [hep-lat] 7 Oct 2010 arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)

More information

Parallelism of MRT Lattice Boltzmann Method based on Multi-GPUs

Parallelism of MRT Lattice Boltzmann Method based on Multi-GPUs Parallelism of MRT Lattice Boltzmann Method based on Multi-GPUs 1 School of Information Engineering, China University of Geosciences (Beijing) Beijing, 100083, China E-mail: Yaolk1119@icloud.com Ailan

More information

Lattice Boltzmann fluid-dynamics on the QPACE supercomputer

Lattice Boltzmann fluid-dynamics on the QPACE supercomputer , Procedia Procedia Computer Computer Science Science 1 00 (2010) (2010) 1069 1076 1 8 Procedia Computer Science www.elsevier.com/locate/procedia International Conference on Computational Science, ICCS

More information

Introduction to numerical computations on the GPU

Introduction to numerical computations on the GPU Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming

More information

Accelerating Model Reduction of Large Linear Systems with Graphics Processors

Accelerating Model Reduction of Large Linear Systems with Graphics Processors Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex

More information

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

A Framework for Hybrid Parallel Flow Simulations with a Trillion Cells in Complex Geometries

A Framework for Hybrid Parallel Flow Simulations with a Trillion Cells in Complex Geometries A Framework for Hybrid Parallel Flow Simulations with a Trillion Cells in Complex Geometries SC13, November 21 st 2013 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler, Ulrich

More information

Lattice Quantum Chromodynamics on the MIC architectures

Lattice Quantum Chromodynamics on the MIC architectures Lattice Quantum Chromodynamics on the MIC architectures Piotr Korcyl Universität Regensburg Intel MIC Programming Workshop @ LRZ 28 June 2017 Piotr Korcyl Lattice Quantum Chromodynamics on the MIC 1/ 25

More information

Simulation of Lid-driven Cavity Flow by Parallel Implementation of Lattice Boltzmann Method on GPUs

Simulation of Lid-driven Cavity Flow by Parallel Implementation of Lattice Boltzmann Method on GPUs Simulation of Lid-driven Cavity Flow by Parallel Implementation of Lattice Boltzmann Method on GPUs S. Berat Çelik 1, Cüneyt Sert 2, Barbaros ÇETN 3 1,2 METU, Mechanical Engineering, Ankara, TURKEY 3 METU-NCC,

More information

CRYPTOGRAPHIC COMPUTING

CRYPTOGRAPHIC COMPUTING CRYPTOGRAPHIC COMPUTING ON GPU Chen Mou Cheng Dept. Electrical Engineering g National Taiwan University January 16, 2009 COLLABORATORS Daniel Bernstein, UIC, USA Tien Ren Chen, Army Tanja Lange, TU Eindhoven,

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Performance Analysis of a List-Based Lattice-Boltzmann Kernel

Performance Analysis of a List-Based Lattice-Boltzmann Kernel Performance Analysis of a List-Based Lattice-Boltzmann Kernel First Talk MuCoSim, 29. June 2016 Michael Hußnätter RRZE HPC Group Friedrich-Alexander University of Erlangen-Nuremberg Outline Lattice Boltzmann

More information

arxiv: v1 [cs.dc] 8 Mar 2017

arxiv: v1 [cs.dc] 8 Mar 2017 CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 0000; 00:1 17 Published online in Wiley InterScience (www.interscience.wiley.com). Evaluation of DVFS techniques

More information

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures José I. Aliaga Performance and Energy Analysis of the Iterative Solution of Sparse

More information

Physics plans and ILDG usage

Physics plans and ILDG usage Physics plans and ILDG usage in Italy Francesco Di Renzo University of Parma & INFN Parma The MAIN ILDG USERS in Italy are the ROME groups A (by now) well long track of ILDG-based projects mainly within

More information

GPU Computing Activities in KISTI

GPU Computing Activities in KISTI International Advanced Research Workshop on High Performance Computing, Grids and Clouds 2010 June 21~June 25 2010, Cetraro, Italy HPC Infrastructure and GPU Computing Activities in KISTI Hongsuk Yi hsyi@kisti.re.kr

More information

The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows

The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows Vanja Zecevic, Michael Kirkpatrick and Steven Armfield Department of Aerospace Mechanical & Mechatronic Engineering The University of

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago

More information

Nuclear Physics and Computing: Exascale Partnerships. Juan Meza Senior Scientist Lawrence Berkeley National Laboratory

Nuclear Physics and Computing: Exascale Partnerships. Juan Meza Senior Scientist Lawrence Berkeley National Laboratory Nuclear Physics and Computing: Exascale Partnerships Juan Meza Senior Scientist Lawrence Berkeley National Laboratory Nuclear Science and Exascale i Workshop held in DC to identify scientific challenges

More information

Janus: FPGA Based System for Scientific Computing Filippo Mantovani

Janus: FPGA Based System for Scientific Computing Filippo Mantovani Janus: FPGA Based System for Scientific Computing Filippo Mantovani Physics Department Università degli Studi di Ferrara Ferrara, 28/09/2009 Overview: 1. The physical problem: - Ising model and Spin Glass

More information

The Lattice Boltzmann Simulation on Multi-GPU Systems

The Lattice Boltzmann Simulation on Multi-GPU Systems The Lattice Boltzmann Simulation on Multi-GPU Systems Thor Kristian Valderhaug Master of Science in Computer Science Submission date: June 2011 Supervisor: Anne Cathrine Elster, IDI Norwegian University

More information

Lattice Boltzmann approach to liquid - vapour separation

Lattice Boltzmann approach to liquid - vapour separation Lattice Boltzmann approach to liquid - vapour separation T.Biciușcă 1,2, A.Cristea 1, A.Lamura 3, G.Gonnella 4, V.Sofonea 1 1 Centre for Fundamental and Advanced Technical Research, Romanian Academy Bd.

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors

MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors J. Dongarra, M. Gates, A. Haidar, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov University of Tennessee, Knoxville 05 / 03 / 2013 MAGMA:

More information

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain

More information

Practical Combustion Kinetics with CUDA

Practical Combustion Kinetics with CUDA Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Practical Combustion Kinetics with CUDA GPU Technology Conference March 20, 2015 Russell Whitesides

More information

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Some thoughts about energy efficient application execution on NEC LX Series compute clusters Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science

More information

Julian Merten. GPU Computing and Alternative Architecture

Julian Merten. GPU Computing and Alternative Architecture Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

Universität Dortmund UCHPC. Performance. Computing for Finite Element Simulations

Universität Dortmund UCHPC. Performance. Computing for Finite Element Simulations technische universität dortmund Universität Dortmund fakultät für mathematik LS III (IAM) UCHPC UnConventional High Performance Computing for Finite Element Simulations S. Turek, Chr. Becker, S. Buijssen,

More information

arxiv: v1 [hep-lat] 4 Jan 2018 Portable multi-node LQCD Monte Carlo simulations using OpenACC

arxiv: v1 [hep-lat] 4 Jan 2018 Portable multi-node LQCD Monte Carlo simulations using OpenACC International Journal of Modern Physics C c World Scientific Publishing Company 1 arxiv:1801.01473v1 [hep-lat] 4 Jan 2018 Portable multi-node LQCD Monte Carlo simulations using OpenACC Claudio Bonati Università

More information

OpenCL: Programming Heterogeneous Architectures

OpenCL: Programming Heterogeneous Architectures OpenCL: Programming Heterogeneous Architectures Porting BigDFT to OpenCL Brice Videau (LIG - NANOSIM) April 24, 2012 1 / 45 Introduction 2 / 45 Needs in computing resources are innite Benets for physicists

More information

Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and

Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and Accelerating the Multifrontal Method Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes {rflucas,genew,ddavis}@isi.edu and grimes@lstc.com 3D Finite Element

More information

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA S7255: CUTT: A HIGH- PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA MOTIVATION Tensor contractions are the most computationally intensive part of quantum

More information

APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD

APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD M.A. Naumenko, V.V. Samarin Joint Institute for Nuclear Research, Dubna, Russia

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Explore Computational Power of GPU in Electromagnetics and Micromagnetics

Explore Computational Power of GPU in Electromagnetics and Micromagnetics Explore Computational Power of GPU in Electromagnetics and Micromagnetics Presenter: Sidi Fu, PhD candidate, UC San Diego Advisor: Prof. Vitaliy Lomakin Center of Magnetic Recording Research, Department

More information

11 Parallel programming models

11 Parallel programming models 237 // Program Design 10.3 Assessing parallel programs 11 Parallel programming models Many different models for expressing parallelism in programming languages Actor model Erlang Scala Coordination languages

More information

On the Use of a Many core Processor for Computational Fluid Dynamics Simulations

On the Use of a Many core Processor for Computational Fluid Dynamics Simulations On the Use of a Many core Processor for Computational Fluid Dynamics Simulations Sebastian Raase, Tomas Nordström Halmstad University, Sweden {sebastian.raase,tomas.nordstrom} @ hh.se Preface based on

More information

ZFS - Flow Solver AIA NVIDIA Workshop

ZFS - Flow Solver AIA NVIDIA Workshop ZFS - Flow Solver AIA NVIDIA Workshop Andreas Lintermann, Matthias Meinke, Wolfgang Schröder Institute of Aerodynamics and Chair of Fluid Mechanics RWTH Aachen University Outline Softwareproject ZFS (applications

More information

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013

More information

arxiv: v1 [hep-lat] 8 Nov 2014

arxiv: v1 [hep-lat] 8 Nov 2014 Staggered Dslash Performance on Intel Xeon Phi Architecture arxiv:1411.2087v1 [hep-lat] 8 Nov 2014 Department of Physics, Indiana University, Bloomington IN 47405, USA E-mail: ruizli AT umail.iu.edu Steven

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

Solving PDEs with CUDA Jonathan Cohen

Solving PDEs with CUDA Jonathan Cohen Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear

More information

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,

More information

Case Study: Quantum Chromodynamics

Case Study: Quantum Chromodynamics Case Study: Quantum Chromodynamics Michael Clark Harvard University with R. Babich, K. Barros, R. Brower, J. Chen and C. Rebbi Outline Primer to QCD QCD on a GPU Mixed Precision Solvers Multigrid solver

More information

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Christopher P. Stone, Ph.D. Computational Science and Engineering, LLC Kyle Niemeyer, Ph.D. Oregon State University 2 Outline

More information

Introduction to Benchmark Test for Multi-scale Computational Materials Software

Introduction to Benchmark Test for Multi-scale Computational Materials Software Introduction to Benchmark Test for Multi-scale Computational Materials Software Shun Xu*, Jian Zhang, Zhong Jin xushun@sccas.cn Computer Network Information Center Chinese Academy of Sciences (IPCC member)

More information

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017 HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher

More information

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant Kale University of Illinois Urbana-Champaign May 25, 2012 Work is overdecomposed

More information

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,

More information

Available online at ScienceDirect. Procedia Engineering 61 (2013 ) 94 99

Available online at  ScienceDirect. Procedia Engineering 61 (2013 ) 94 99 Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 6 (203 ) 94 99 Parallel Computational Fluid Dynamics Conference (ParCFD203) Simulations of three-dimensional cavity flows with

More information

A Stochastic-based Optimized Schwarz Method for the Gravimetry Equations on GPU Clusters

A Stochastic-based Optimized Schwarz Method for the Gravimetry Equations on GPU Clusters A Stochastic-based Optimized Schwarz Method for the Gravimetry Equations on GPU Clusters Abal-Kassim Cheik Ahamed and Frédéric Magoulès Introduction By giving another way to see beneath the Earth, gravimetry

More information

Multi-GPU Simulations of the Infinite Universe

Multi-GPU Simulations of the Infinite Universe () Multi-GPU of the Infinite with with G. Rácz, I. Szapudi & L. Dobos Physics of Complex Systems Department Eötvös Loránd University, Budapest June 22, 2018, Budapest, Hungary Outline 1 () 2 () Concordance

More information

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark Block AIR Methods For Multicore and GPU Per Christian Hansen Hans Henrik B. Sørensen Technical University of Denmark Model Problem and Notation Parallel-beam 3D tomography exact solution exact data noise

More information

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Andrés Tomás, Che-Rung Lee, Zhaojun Bai, Richard Scalettar UC Davis SIAM Conference on Computation Science & Engineering Reno, March

More information

Multiscale simulations of complex fluid rheology

Multiscale simulations of complex fluid rheology Multiscale simulations of complex fluid rheology Michael P. Howard, Athanassios Z. Panagiotopoulos Department of Chemical and Biological Engineering, Princeton University Arash Nikoubashman Institute of

More information

Protein Structure Prediction on GPU: a Declarative Approach in a Multi-agent Framework

Protein Structure Prediction on GPU: a Declarative Approach in a Multi-agent Framework Protein Structure Prediction on GPU: a Declarative Approach in a Multi-agent Framework Federico Campeotto (1),(2) Agostino Dovier (2) Enrico Pontelli (1) campe8@nmsu.edu agostino.dovier@uniud.it epontell@cs.nmsu.edu

More information

arxiv: v1 [cs.dc] 4 Sep 2014

arxiv: v1 [cs.dc] 4 Sep 2014 and NVIDIA R GPUs arxiv:1409.1510v1 [cs.dc] 4 Sep 2014 O. Kaczmarek, C. Schmidt and P. Steinbrecher Fakultät für Physik, Universität Bielefeld, D-33615 Bielefeld, Germany E-mail: okacz, schmidt, p.steinbrecher@physik.uni-bielefeld.de

More information

Hydra. A library for data analysis in massively parallel platforms. A. Augusto Alves Jr and Michael D. Sokoloff

Hydra. A library for data analysis in massively parallel platforms. A. Augusto Alves Jr and Michael D. Sokoloff Hydra A library for data analysis in massively parallel platforms A. Augusto Alves Jr and Michael D. Sokoloff University of Cincinnati aalvesju@cern.ch Presented at NVIDIA s GPU Technology Conference,

More information

A simple Concept for the Performance Analysis of Cluster-Computing

A simple Concept for the Performance Analysis of Cluster-Computing A simple Concept for the Performance Analysis of Cluster-Computing H. Kredel 1, S. Richling 2, J.P. Kruse 3, E. Strohmaier 4, H.G. Kruse 1 1 IT-Center, University of Mannheim, Germany 2 IT-Center, University

More information

- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline

- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline - Part 4 - Multicore and Manycore Technology: Chances and Challenges Vincent Heuveline 1 Numerical Simulation of Tropical Cyclones Goal oriented adaptivity for tropical cyclones ~10⁴km ~1500km ~100km 2

More information

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Eftychios Sifakis CS758 Guest Lecture - 19 Sept 2012 Introduction Linear systems

More information

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting Thomas C. Schulthess 1 Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Piz Daint Compute node:

More information

Code Generation for GPU Accelerators in the Domain of Image Preprocessing

Code Generation for GPU Accelerators in the Domain of Image Preprocessing Code Generation for GPU Accelerators in the Domain of Image Preprocessing Oliver Reiche, Richard Membarth, Frank Hannig, and Jürgen Teich Hardware/Software Co-Design, University of Erlangen-Nuremberg Dagstuhl,

More information

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal

More information

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge González-Domínguez Parallel and Distributed Architectures Group Johannes Gutenberg University of Mainz, Germany j.gonzalez@uni-mainz.de

More information

S3D Direct Numerical Simulation: Preparation for the PF Era

S3D Direct Numerical Simulation: Preparation for the PF Era S3D Direct Numerical Simulation: Preparation for the 10 100 PF Era Ray W. Grout, Scientific Computing SC 12 Ramanan Sankaran ORNL John Levesque Cray Cliff Woolley, Stan Posey nvidia J.H. Chen SNL NREL

More information

Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa

Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa Accelerated Astrophysics: Using NVIDIA GPUs to Simulate and Understand the Universe Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa Cruz brant@ucsc.edu, UC

More information

Measuring freeze-out parameters on the Bielefeld GPU cluster

Measuring freeze-out parameters on the Bielefeld GPU cluster Measuring freeze-out parameters on the Bielefeld GPU cluster Outline Fluctuations and the QCD phase diagram Fluctuations from Lattice QCD The Bielefeld hybrid GPU cluster Freeze-out conditions from QCD

More information

Parallel Transposition of Sparse Data Structures

Parallel Transposition of Sparse Data Structures Parallel Transposition of Sparse Data Structures Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng Department of Computer Science, Virginia Tech Niels Bohr Institute, University of Copenhagen Scientific Computing

More information

Using AmgX to accelerate a PETSc-based immersed-boundary method code

Using AmgX to accelerate a PETSc-based immersed-boundary method code 29th International Conference on Parallel Computational Fluid Dynamics May 15-17, 2017; Glasgow, Scotland Using AmgX to accelerate a PETSc-based immersed-boundary method code Olivier Mesnard, Pi-Yueh Chuang,

More information

Array-of-Struct particles for ipic3d on MIC. Alec Johnson and Giovanni Lapenta. EASC2014 Stockholm, Sweden April 3, 2014

Array-of-Struct particles for ipic3d on MIC. Alec Johnson and Giovanni Lapenta. EASC2014 Stockholm, Sweden April 3, 2014 Array-of-Struct particles for ipic3d on MIC Alec Johnson and Giovanni Lapenta Centre for mathematical Plasma Astrophysics Mathematics Department KU Leuven, Belgium EASC2014 Stockholm, Sweden April 3, 2014

More information

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures

More information

arxiv: v1 [astro-ph.im] 7 Apr 2013

arxiv: v1 [astro-ph.im] 7 Apr 2013 A Performance Comparison of Different Graphics Processing Units Running Direct N-Body Simulations R. Capuzzo Dolcetta a, M. Spera a a Dep. of Physics, Sapienza, University of Roma, P.le A. Moro 5, Roma,

More information

Dr. Andrea Bocci. Using GPUs to Accelerate Online Event Reconstruction. at the Large Hadron Collider. Applied Physicist

Dr. Andrea Bocci. Using GPUs to Accelerate Online Event Reconstruction. at the Large Hadron Collider. Applied Physicist Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider Dr. Andrea Bocci Applied Physicist On behalf of the CMS Collaboration Discover CERN Inside the Large Hadron Collider at

More information

Verbundprojekt ELPA-AEO. Eigenwert-Löser für Petaflop-Anwendungen Algorithmische Erweiterungen und Optimierungen

Verbundprojekt ELPA-AEO. Eigenwert-Löser für Petaflop-Anwendungen Algorithmische Erweiterungen und Optimierungen Verbundprojekt ELPA-AEO http://elpa-aeo.mpcdf.mpg.de Eigenwert-Löser für Petaflop-Anwendungen Algorithmische Erweiterungen und Optimierungen BMBF Projekt 01IH15001 Feb 2016 - Jan 2019 7. HPC-Statustagung,

More information

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric

More information

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method NUCLEAR SCIENCE AND TECHNIQUES 25, 0501 (14) Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method XU Qi ( 徐琪 ), 1, YU Gang-Lin ( 余纲林 ), 1 WANG Kan ( 王侃 ),

More information

Accelerating Quantum Chromodynamics Calculations with GPUs

Accelerating Quantum Chromodynamics Calculations with GPUs Accelerating Quantum Chromodynamics Calculations with GPUs Guochun Shi, Steven Gottlieb, Aaron Torok, Volodymyr Kindratenko NCSA & Indiana University National Center for Supercomputing Applications University

More information

Performance Analysis of Lattice QCD Application with APGAS Programming Model

Performance Analysis of Lattice QCD Application with APGAS Programming Model Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models

More information

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Yuta Hirokawa Graduate School of Systems and Information Engineering, University of Tsukuba hirokawa@hpcs.cs.tsukuba.ac.jp

More information

Marawacc: A Framework for Heterogeneous Computing in Java

Marawacc: A Framework for Heterogeneous Computing in Java : A Framework for Heterogeneous Computing in Java Juan Fumero, Michel Steuwer, Christophe Dubach Code The University of Edinburgh UK Many-Core Developer Conference 2016 1 / 23 Code 2 / 23 Code 3 / 23 Code

More information

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)

More information

arxiv: v1 [physics.comp-ph] 22 Nov 2012

arxiv: v1 [physics.comp-ph] 22 Nov 2012 A Customized 3D GPU Poisson Solver for Free BCs Nazim Dugan a, Luigi Genovese b, Stefan Goedecker a, a Department of Physics, University of Basel, Klingelbergstr. 82, 4056 Basel, Switzerland b Laboratoire

More information

Accelerated Prediction of the Polar Ice and Global Ocean (APPIGO)

Accelerated Prediction of the Polar Ice and Global Ocean (APPIGO) DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Accelerated Prediction of the Polar Ice and Global Ocean (APPIGO) Eric Chassignet Center for Ocean-Atmosphere Prediction

More information

S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA

S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA Date: 16th May 2012 Wed, 3pm to 3.25pm(Adv. Session) Sathyanarayana K., Manish Banga, and Ravi Kumar G. V. V. Engineering Services,

More information

Innovations in dispersion modeling using FALL3D and operations at the Buenos Aires VAAC

Innovations in dispersion modeling using FALL3D and operations at the Buenos Aires VAAC www.bsc.es Innovations in dispersion modeling using FALL3D and operations at the Buenos Aires VAAC A. Folch Barcelona Supercomputing Center (BSC-CNS), Spain Co-workers: A. Costa (1), G. Macedonio (1),

More information

Optimization Techniques for Parallel Code 1. Parallel programming models

Optimization Techniques for Parallel Code 1. Parallel programming models Optimization Techniques for Parallel Code 1. Parallel programming models Sylvain Collange Inria Rennes Bretagne Atlantique http://www.irisa.fr/alf/collange/ sylvain.collange@inria.fr OPT - 2017 Goals of

More information

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Katharina Kormann 1 Klaus Reuter 2 Markus Rampp 2 Eric Sonnendrücker 1 1 Max Planck Institut für Plasmaphysik 2 Max Planck Computing

More information

Efficient algorithms for symmetric tensor contractions

Efficient algorithms for symmetric tensor contractions Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to

More information