sri 2D Implicit Charge- and Energy- Conserving Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy

Size: px
Start display at page:

Download "sri 2D Implicit Charge- and Energy- Conserving Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy"

Transcription

1 2D Implicit Charge- and Energy- Conserving sri Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy Mentors Dana Knoll and Allen McPherson IS&T CoDesign Summer School 2012, Los Alamos National Laboratory, NM LA-UR : Approved for public release; distribution is unlimited.

2 Agenda Co-Design Summer Problem - 2D Implicit Energy- and Charge- Conservation 2D Implicit PIC Method Outline CUDA Implementation Successful strategies Exploiting texture memory for storing the electric and magnetic fields Usage of intrinsics and strength reduction operations Sorting particles by Cell-x and Cell-y Sorting particles by done-ness and velocity directions 1/2 ions + 1/2 electrons on each GPU Unsuccessful strategies Red-black strategy of launching blocks of GPU threads Ions on one GPU and electrons on another GPU

3 Co-Design Summer School The Los Alamos IS&T Co-Design Summer School was inaugurated in Students from diverse technical backgrounds including nuclear engineering, applied mathematics, and computer science, form teams that work together to solve a focussed co-design problem... Emmanuel Cieren Applied Mathematics ENSTA ParisTech Nicolas Feltman Computer Science Carnegie Mellon University Christopher Leibs Applied Mathematics University of Colorado Colleen McCarthy Applied Mathematics North Carolina State University Karthik Murthy Computer Science Rice University Yijie Wang Computer Science University of South Florida

4 Problem- Plasma Simulation (charge, current density) MOMENT SOLVER Solve Maxwell, J E, B (electric, magnetic fields) Interpolate Particles Fields PARTICLE PUSHER Interpolate Fields Particles r, v (position, velocity) Push Particles F = q(e + v B) (force) Implicit/ Explicit Method

5 Problem - Explicit Particle-In-Cell Method Main idea Interpolate field values to the particles Push particles Interpolate particle information to field locations Solve field equations and update values Constraints! finite grid instability (need dx D ) tight CFL constraint (need dt small enough ) can be computationally demanding Solution  Try to use implicit methods to relax these conditions!

6 Problem - Implicit Particle-In-Cell Method Chen, Chacón and Barnes* developed a 1D electrostatic PIC method that : relaxes the CFL condition, is stable against the finite grid instability, conserves charge, conserves energy, and controls momentum. Â We will draw heavily from many of these ideas * An energy- and charge-conserving, implicit, electrostatic particle-in-cell algorithm. Journal of Computational Physics, 230: , 2011.

7 Today s Problem Application to demonstrate 2D Implicit method: Island Equilibrium Figure 2.3 Initial conditions. A contour plot of the density function (in blue) with the fieldlines of the magnetic field (in orange). For this figure, ce /! pe =0.3, =0.25, and the domain length is [ 2, 2 ] [, ].

8 2D Implicit PIC - Cell in the Electric-Magnetic Field k+1 j+1 B z E x, J x k+ 1 2 B y E y, J y j+ 1 2 E x,j x E z,j z E y,j y B y k j B z i i i+ 1 2 B x i+1 B x E z, J z i+ 1 2 j j+ 1 2 j+1 i+1

9 2D Implicit PIC - Particle Sub-stepping Outline initialization fields, particles fields compute work write output loop over all particles time estimator particle push cell crossing accumulation while d < dt

10 2D Implicit PIC - Time Estimation (Control Momentum) sub-step times are chosen to help control momentum by comparing a first order (Euler) and second order (Heun) integration scheme the estimate is then compared with a fractional value of the gyro frequency and a distance limiter in order to help alleviate stresses in the Picard iteration `e,r `e,v 2 2 a(r ) 2 2 (ra v) We choose d such that : p`e,r ( ) 2 + `e,v ( ) 2 < a + r kr 0 ( )k 2 Where r 0 ( ) is the initial residual of the equations of motion

11 2D Implicit PIC - Energy Conserving Particle Push Crank-Nicolson discretization 8 >< r +1 p r p = v +1/2 p >: v +1 p v p = q p m p h E(r +1/2 p )+v +1/2 p i B(r +1/2 p ) j+1 p = v p + v +1 p v +1/2 r +1/2 2 p = r p + r +1 p F(r +1/2 p ) = X i,j 2 F i,j S(r i,j r +1/2 p ) j+ 1 2 j By Ez,Jz Ex,Jx Ey,Jy Bz i i+ 1 2 Bx i+1 Â Converged through fixed-point iterations (Picard) PICARD for r +1 p and v +1 p

12 2D Implicit PIC - Cell Crossing (Conserve Charge)

13 2D Implicit PIC - Cell Crossing (Conserve Charge)

14 2D Implicit PIC - Cell Crossing (Conserve Charge) Some attempts The linear intercept is good enough (fast but not accurate) Bisection method wrapped around original CN (accurate but slow) Fix the final boundary value in CN and solve new system for free dimension and time ( fast but not stable) Estimate time of crossing with explicit solve to accelerate above methods Lesson Learned Cell crossing was (much) harder than we anticipated

15 2D Implicit PIC - Current Accumulation Each particle must accumulate its sub-step weighted current to the grid Jn+1/2 i,j = 1 dt 1 dxdy X X p q p S(r i,j r 1/2 p )v +1/2 p j+1 j+ 1 2 By Ez,Jz Ex,Jx Ey,Jy j Bz Bx i i+ 1 2 i+1 Lesson Learned (for parallel implementation) This is a map from a high dimension set (particles) to a lower dimension set (grid). Must be careful to ensure particles are not competing for write access.

16 2D Implicit PIC - Implementation void runpic(){ read_fields(); read_particles(); for(int p=0; p<n; ++p){ while(tau<dt){ time_estimator(); push_particle(); cell_crossing(); } accum_current(); } accum_charge(); } time_average_current(); export_data();

17 GPUs Built a version of PIC using CUDA Capable of exploiting multiple GPUs Experiment results on: One node of Darwin (2x Tesla M2090s) Scooter (1x Kepler GTX 680) Fig. credit: Nvidia documentation

18 GPUs Kernels launch a grid of blocks Each block contains a set of threads Blocks are scheduled onto SMs by a hardware scheduler Can t guarantee the order of execution of threads or blocks Fig. credit: Nvidia documentation

19 CUDA 2D PIC- Lesson 1: Locality Parallelization Strategy Assign groups of cells (Mesh Blocks) to a single CUDA block

20 CUDA 2D PIC- Lesson 2: Locality Parallelization Strategy Reflect memory hierarchy in the accumulation of current density

21 CUDA 2D PIC- Lesson 3: Locality Parallelization Strategy Drifting particles need to be re-sorted

22 CUDA 2D PIC- Exploiting Texture Memory Texture Memory is Special read-only memory Optimized for access patterns exhibiting spatial locality Each SM has it s own texture cache Special texture units help accelerate fetching of data (Z-order curve) Employed for electric and magnetic fields Electric and magnetic fields are constant Field access patterns in force computation exhibit spatial locality Span of shape functions allow for efficient texture cache performance Perfect candidates for texture memory

23 Big Picture each block works on a mesh of cells E,B fields local J fields global J fields

24 Performance and Optimizations(1) Tunable parameters Mesh Cells Per Block Number of Particle sub-steps before resort Max Number of Crossings Red-Black Offsets (discussed later) Time in seconds

25 Performance and Optimizations(1) Bitwise hacks, Intrinsics and Strength reductions Optimized shape functions using bitwise operations (combo-hack!) Usage of fused-multiply-add ( fmaf_rn) and other intrinsics Converting division into multiplication by pre-computing constant values Loop unrolling (#pragma unroll) Time in seconds #define SIGN_MASK 0x7fffffff union combo_hack{ unsigned int in; float fl; }; device float b2(float x){ combo_hack flip; flip.fl = x; flip.in = flip.in & SIGN_MASK; if(flip.fl <= 1.5f) { if(flip.fl > 0.5f) return fmaf_rn( 0.5f*flip.fl, (flip.fl - 3.0f),1.125f); return fmaf_rn(-x, x, 0.75f); } return 0.0f; }

26 Performance and Optimizations(3) Sorting Strategies Particles are sorted by Cell-x and Cell-y Within Mesh Cells, particles are sorted by particle done-ness particle x-velocity direction particle y-velocity direction Time in seconds

27 Performance and Optimizations(4) Intuition Avoid write conflicts in overlap region (atomics are expensive)

28 Performance and Optimizations(4) Red-Black Scheduling Thwarted by the block scheduler Advantage in reduction of atomics vs Texture cache misses Time in seconds

29 Performance and Optimizations(5) Targeting Multiple GPUs (Tesla M2090s) Unsuccessful Attempt: Ions on one GPU and Electrons on second GPU Successful Attempt: 1 2 Ions Electrons on each GPU 70 Time in seconds 42

30 Conclusions Co-Design was a wonderful experience Successful strategies Exploiting texture memory for storing the electric and magnetic fields Usage of intrinsics and strength reducing operations Sorting particles by Cell-x and Cell-y Sorting particles by done-ness and velocity directions 1/2 ions + 1/2 electrons on each GPU Unsuccessful strategies Red-black strategy of launching blocks of GPU threads Ions on one GPU and electrons on another GPU Future Dynamic load balancing: launch blocks to match density profile Domain decomposition across multi-gpus

31 EXTRA For a typical run, we load particles (40 million ions, 40 million electrons) on grids of size or The total time of the simulation is ratio of m i m e = 100. t = 10/! pe, with and artificial mass Figure 2.3 Initial conditions. A contour plot of the density function (in blue) with the fieldlines of the magnetic field (in orange). For this figure, ce /! pe =0.3, =0.25, and the domain length is [ 2, 2 ] [, ].

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

Guangye Chen, Luis Chacón,

Guangye Chen, Luis Chacón, JIFT workshop! Oct. 31, 2014 New Orleans, LA.! Guangye Chen, Luis Chacón, CoCoMANs team Los Alamos National Laboratory, Los Alamos, NM 87545, USA gchen@lanl.gov 1 Los Alamos National Laboratory Motivation

More information

Solving PDEs with CUDA Jonathan Cohen

Solving PDEs with CUDA Jonathan Cohen Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear

More information

COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD

COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD XVIII International Conference on Water Resources CMWR 2010 J. Carrera (Ed) c CIMNE, Barcelona, 2010 COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD James.E. McClure, Jan F. Prins

More information

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,

More information

arxiv: v1 [hep-lat] 7 Oct 2010

arxiv: v1 [hep-lat] 7 Oct 2010 arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA

More information

A fully implicit, exactly conserving algorithm for multidimensional particle-in-cell kinetic simulations

A fully implicit, exactly conserving algorithm for multidimensional particle-in-cell kinetic simulations A fully implicit, exactly conserving algorithm for multidimensional particle-in-cell kinetic simulations L. Chacón Applied Mathematics and Plasma Physics Group Theoretical Division Los Alamos National

More information

MONTE CARLO NEUTRON TRANSPORT SIMULATING NUCLEAR REACTIONS ONE NEUTRON AT A TIME Tony Scudiero NVIDIA

MONTE CARLO NEUTRON TRANSPORT SIMULATING NUCLEAR REACTIONS ONE NEUTRON AT A TIME Tony Scudiero NVIDIA MONTE CARLO NEUTRON TRANSPORT SIMULATING NUCLEAR REACTIONS ONE NEUTRON AT A TIME Tony Scudiero NVIDIA TAKEAWAYS Why Monte Carlo methods are fundamentally different than deterministic methods Inherent Parallelism

More information

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Christopher P. Stone, Ph.D. Computational Science and Engineering, LLC Kyle Niemeyer, Ph.D. Oregon State University 2 Outline

More information

S XMP LIBRARY INTERNALS. Niall Emmart University of Massachusetts. Follow on to S6151 XMP: An NVIDIA CUDA Accelerated Big Integer Library

S XMP LIBRARY INTERNALS. Niall Emmart University of Massachusetts. Follow on to S6151 XMP: An NVIDIA CUDA Accelerated Big Integer Library S6349 - XMP LIBRARY INTERNALS Niall Emmart University of Massachusetts Follow on to S6151 XMP: An NVIDIA CUDA Accelerated Big Integer Library High Performance Modular Exponentiation A^K mod P Where A,

More information

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal

More information

Lecture 4: The particle equations (1)

Lecture 4: The particle equations (1) Lecture 4: The particle equations (1) Presenter: Mark Eric Dieckmann Department of Science and Technology (ITN), Linköping University, Sweden July 17, 2014 Overview We have previously discussed the leapfrog

More information

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)

More information

EXASCALE COMPUTING. Implementation of a 2D Electrostatic Particle in Cell algorithm in UniÞed Parallel C with dynamic load-balancing

EXASCALE COMPUTING. Implementation of a 2D Electrostatic Particle in Cell algorithm in UniÞed Parallel C with dynamic load-balancing ExaScience Lab Intel Labs Europe EXASCALE COMPUTING Implementation of a 2D Electrostatic Particle in Cell algorithm in UniÞed Parallel C with dynamic load-balancing B. Verleye P. Henry R. Wuyts G. Lapenta

More information

Hybrid Simulations: Numerical Details and Current Applications

Hybrid Simulations: Numerical Details and Current Applications Hybrid Simulations: Numerical Details and Current Applications Dietmar Krauss-Varban and numerous collaborators Space Sciences Laboratory, UC Berkeley, USA Boulder, 07/25/2008 Content 1. Heliospheric/Space

More information

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant Kale University of Illinois Urbana-Champaign May 25, 2012 Work is overdecomposed

More information

Beam dynamics calculation

Beam dynamics calculation September 6 Beam dynamics calculation S.B. Vorozhtsov, Е.Е. Perepelkin and V.L. Smirnov Dubna, JINR http://parallel-compute.com Outline Problem formulation Numerical methods OpenMP and CUDA realization

More information

A Hybrid Method for the Wave Equation. beilina

A Hybrid Method for the Wave Equation.   beilina A Hybrid Method for the Wave Equation http://www.math.unibas.ch/ beilina 1 The mathematical model The model problem is the wave equation 2 u t 2 = (a 2 u) + f, x Ω R 3, t > 0, (1) u(x, 0) = 0, x Ω, (2)

More information

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance

More information

Array-of-Struct particles for ipic3d on MIC. Alec Johnson and Giovanni Lapenta. EASC2014 Stockholm, Sweden April 3, 2014

Array-of-Struct particles for ipic3d on MIC. Alec Johnson and Giovanni Lapenta. EASC2014 Stockholm, Sweden April 3, 2014 Array-of-Struct particles for ipic3d on MIC Alec Johnson and Giovanni Lapenta Centre for mathematical Plasma Astrophysics Mathematics Department KU Leuven, Belgium EASC2014 Stockholm, Sweden April 3, 2014

More information

Two case studies of Monte Carlo simulation on GPU

Two case studies of Monte Carlo simulation on GPU Two case studies of Monte Carlo simulation on GPU National Institute for Computational Sciences University of Tennessee Seminar series on HPC, Feb. 27, 2014 Outline 1 Introduction 2 Discrete energy lattice

More information

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain

More information

Earth System Modeling Domain decomposition

Earth System Modeling Domain decomposition Earth System Modeling Domain decomposition Graziano Giuliani International Centre for Theorethical Physics Earth System Physics Section Advanced School on Regional Climate Modeling over South America February

More information

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM

More information

MODELING OF CONCRETE MATERIALS AND STRUCTURES. Kaspar Willam. Class Meeting #5: Integration of Constitutive Equations

MODELING OF CONCRETE MATERIALS AND STRUCTURES. Kaspar Willam. Class Meeting #5: Integration of Constitutive Equations MODELING OF CONCRETE MATERIALS AND STRUCTURES Kaspar Willam University of Colorado at Boulder Class Meeting #5: Integration of Constitutive Equations Structural Equilibrium: Incremental Tangent Stiffness

More information

Optimization of Particle-In-Cell simulations for Vlasov-Poisson system with strong magnetic field

Optimization of Particle-In-Cell simulations for Vlasov-Poisson system with strong magnetic field Optimization of Particle-In-Cell simulations for Vlasov-Poisson system with strong magnetic field Edwin Chacon-Golcher Sever A. Hirstoaga Mathieu Lutz Abstract We study the dynamics of charged particles

More information

The Generalized Interpolation Material Point Method

The Generalized Interpolation Material Point Method Compaction of a foam microstructure The Generalized Interpolation Material Point Method Tungsten Particle Impacting sandstone The Material Point Method (MPM) 1. Lagrangian material points carry all state

More information

AMSC 663 Project Proposal: Upgrade to the GSP Gyrokinetic Code

AMSC 663 Project Proposal: Upgrade to the GSP Gyrokinetic Code AMSC 663 Project Proposal: Upgrade to the GSP Gyrokinetic Code George Wilkie (gwilkie@umd.edu) Supervisor: William Dorland (bdorland@umd.edu) October 11, 2011 Abstract Simulations of turbulent plasma in

More information

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)

More information

Block-Structured Adaptive Mesh Refinement

Block-Structured Adaptive Mesh Refinement Block-Structured Adaptive Mesh Refinement Lecture 2 Incompressible Navier-Stokes Equations Fractional Step Scheme 1-D AMR for classical PDE s hyperbolic elliptic parabolic Accuracy considerations Bell

More information

Efficient Molecular Dynamics on Heterogeneous Architectures in GROMACS

Efficient Molecular Dynamics on Heterogeneous Architectures in GROMACS Efficient Molecular Dynamics on Heterogeneous Architectures in GROMACS Berk Hess, Szilárd Páll KTH Royal Institute of Technology GTC 2012 GROMACS: fast, scalable, free Classical molecular dynamics package

More information

Algebraic Multigrid as Solvers and as Preconditioner

Algebraic Multigrid as Solvers and as Preconditioner Ò Algebraic Multigrid as Solvers and as Preconditioner Domenico Lahaye domenico.lahaye@cs.kuleuven.ac.be http://www.cs.kuleuven.ac.be/ domenico/ Department of Computer Science Katholieke Universiteit Leuven

More information

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015 Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done

More information

Explore Computational Power of GPU in Electromagnetics and Micromagnetics

Explore Computational Power of GPU in Electromagnetics and Micromagnetics Explore Computational Power of GPU in Electromagnetics and Micromagnetics Presenter: Sidi Fu, PhD candidate, UC San Diego Advisor: Prof. Vitaliy Lomakin Center of Magnetic Recording Research, Department

More information

PIConGPU Bringing Large-Scale Laser Plasma Simulations to GPU Supercomputing

PIConGPU Bringing Large-Scale Laser Plasma Simulations to GPU Supercomputing PIConGPU Bringing Large-Scale Laser Plasma Simulations to GPU Supercomputing Michael Bussmann 1, Heiko Burau 1, René Widera 1, Florian Berninger 1, Axel Hübl 1, Thomas Kluge 1, Alexander Debus 1, Ulrich

More information

Accelerating Model Reduction of Large Linear Systems with Graphics Processors

Accelerating Model Reduction of Large Linear Systems with Graphics Processors Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex

More information

High-performance processing and development with Madagascar. July 24, 2010 Madagascar development team

High-performance processing and development with Madagascar. July 24, 2010 Madagascar development team High-performance processing and development with Madagascar July 24, 2010 Madagascar development team Outline 1 HPC terminology and frameworks 2 Utilizing data parallelism 3 HPC development with Madagascar

More information

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Introduction to numerical computations on the GPU

Introduction to numerical computations on the GPU Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming

More information

Edwin Chacon-Golcher 1, Sever A. Hirstoaga 2 and Mathieu Lutz 3. Introduction

Edwin Chacon-Golcher 1, Sever A. Hirstoaga 2 and Mathieu Lutz 3. Introduction ESAIM: PROCEEDINGS AND SURVEYS, March 2016, Vol. 53, p. 177-190 M. Campos Pinto and F. Charles, Editors OPTIMIZATION OF PARTICLE-IN-CELL SIMULATIONS FOR VLASOV-POISSON SYSTEM WITH STRONG MAGNETIC FIELD

More information

Multigrid solvers for equations arising in implicit MHD simulations

Multigrid solvers for equations arising in implicit MHD simulations Multigrid solvers for equations arising in implicit MHD simulations smoothing Finest Grid Mark F. Adams Department of Applied Physics & Applied Mathematics Columbia University Ravi Samtaney PPPL Achi Brandt

More information

Review: From problem to parallel algorithm

Review: From problem to parallel algorithm Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:

More information

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method NUCLEAR SCIENCE AND TECHNIQUES 25, 0501 (14) Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method XU Qi ( 徐琪 ), 1, YU Gang-Lin ( 余纲林 ), 1 WANG Kan ( 王侃 ),

More information

Code Generation for GPU Accelerators in the Domain of Image Preprocessing

Code Generation for GPU Accelerators in the Domain of Image Preprocessing Code Generation for GPU Accelerators in the Domain of Image Preprocessing Oliver Reiche, Richard Membarth, Frank Hannig, and Jürgen Teich Hardware/Software Co-Design, University of Erlangen-Nuremberg Dagstuhl,

More information

Lecture XI. Approximating the Invariant Distribution

Lecture XI. Approximating the Invariant Distribution Lecture XI Approximating the Invariant Distribution Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Invariant Distribution p. 1 /24 SS Equilibrium in the Aiyagari model G.

More information

Real-Time Scheduling and Resource Management

Real-Time Scheduling and Resource Management ARTIST2 Summer School 2008 in Europe Autrans (near Grenoble), France September 8-12, 2008 Real-Time Scheduling and Resource Management Lecturer: Giorgio Buttazzo Full Professor Scuola Superiore Sant Anna

More information

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information

Accelerating incompressible fluid flow simulations on hybrid CPU/GPU systems

Accelerating incompressible fluid flow simulations on hybrid CPU/GPU systems Accelerating incompressible fluid flow simulations on hybrid CPU/GPU systems Yushan Wang 1, Marc Baboulin 1,2, Karl Rupp 3,4, Yann Fraigneau 1,5, Olivier Le Maître 1,5 1 Université Paris-Sud, France 2

More information

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Andrés Tomás, Che-Rung Lee, Zhaojun Bai, Richard Scalettar UC Davis SIAM Conference on Computation Science & Engineering Reno, March

More information

A microsecond a day keeps the doctor away: Efficient GPU Molecular Dynamics with GROMACS

A microsecond a day keeps the doctor away: Efficient GPU Molecular Dynamics with GROMACS GTC 20130319 A microsecond a day keeps the doctor away: Efficient GPU Molecular Dynamics with GROMACS Erik Lindahl erik.lindahl@scilifelab.se Molecular Dynamics Understand biology We re comfortably on

More information

S Subdivide, Preprocess and Conquer: Micromagnetism FEM/BEM-Simulations on Single-Node/Multi-GPU Systems

S Subdivide, Preprocess and Conquer: Micromagnetism FEM/BEM-Simulations on Single-Node/Multi-GPU Systems S4283 - Subdivide, : Micromagnetism FEM/BEM-Simulations on Single-Node/Multi-GPU Systems Elmar Westphal - Forschungszentrum Jülich GmbH 1 Contents Micromagnetism TetraMag, a FEM/BEM Micromagnetism Simulator

More information

An Overview of Fluid Animation. Christopher Batty March 11, 2014

An Overview of Fluid Animation. Christopher Batty March 11, 2014 An Overview of Fluid Animation Christopher Batty March 11, 2014 What distinguishes fluids? What distinguishes fluids? No preferred shape. Always flows when force is applied. Deforms to fit its container.

More information

Hideyuki Usui 1,3, M. Nunami 2,3, Y. Yagi 1,3, T. Moritaka 1,3, and JST/CREST multi-scale PIC simulation team

Hideyuki Usui 1,3, M. Nunami 2,3, Y. Yagi 1,3, T. Moritaka 1,3, and JST/CREST multi-scale PIC simulation team Hideyuki Usui 1,3, M. Nunami 2,3, Y. Yagi 1,3, T. Moritaka 1,3, and JST/CREST multi-scale PIC simulation team 1 Kobe Univ., Japan, 2 NIFS,Japan, 3 JST/CREST, Outline Multi-scale interaction between weak

More information

Modeling and Solving Constraints. Erin Catto Blizzard Entertainment

Modeling and Solving Constraints. Erin Catto Blizzard Entertainment Modeling and Solving Constraints Erin Catto Blizzard Entertainment Basic Idea Constraints are used to simulate joints, contact, and collision. We need to solve the constraints to stack boxes and to keep

More information

Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa

Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa Accelerated Astrophysics: Using NVIDIA GPUs to Simulate and Understand the Universe Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa Cruz brant@ucsc.edu, UC

More information

GPU Computing Activities in KISTI

GPU Computing Activities in KISTI International Advanced Research Workshop on High Performance Computing, Grids and Clouds 2010 June 21~June 25 2010, Cetraro, Italy HPC Infrastructure and GPU Computing Activities in KISTI Hongsuk Yi hsyi@kisti.re.kr

More information

Simulation of Coulomb Collisions in Plasma Accelerators for Space Applications

Simulation of Coulomb Collisions in Plasma Accelerators for Space Applications Simulation of Coulomb Collisions in Plasma Accelerators for Space Applications D. D Andrea 1, W.Maschek 1 and R. Schneider 2 Vienna, May 6 th 2009 1 Institut for Institute for Nuclear and Energy Technologies

More information

Multistep Methods for IVPs. t 0 < t < T

Multistep Methods for IVPs. t 0 < t < T Multistep Methods for IVPs We are still considering the IVP dy dt = f(t,y) t 0 < t < T y(t 0 ) = y 0 So far we have looked at Euler s method, which was a first order method and Runge Kutta (RK) methods

More information

HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU

HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU April 4-7, 2016 Silicon Valley HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU Minmin Sun, NVIDIA minmins@nvidia.com April 5th Brief Introduction of CTC AGENDA Alpha/Beta Matrix

More information

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

6. Iterative Methods for Linear Systems. The stepwise approach to the solution... 6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse

More information

arxiv: v1 [physics.comp-ph] 30 Oct 2017

arxiv: v1 [physics.comp-ph] 30 Oct 2017 An efficient GPU algorithm for tetrahedron-based Brillouin-zone integration Daniel Guterding 1, and Harald O. Jeschke 1 Lucht Probst Associates, Große Gallusstraße 9, 011 Frankfurt am Main, Germany, European

More information

Scientific Computing II

Scientific Computing II Scientific Computing II Molecular Dynamics Numerics Michael Bader SCCS Technical University of Munich Summer 018 Recall: Molecular Dynamics System of ODEs resulting force acting on a molecule: F i = j

More information

A CUDA Solver for Helmholtz Equation

A CUDA Solver for Helmholtz Equation Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College

More information

Parsek2D: An Implicit Parallel Particle-in-Cell Code

Parsek2D: An Implicit Parallel Particle-in-Cell Code NUMERICAL MODELING OF SPACE PLASMA FLOWS: ASTRONUM-2008 ASP Conference Series, Vol. 406, c 2009 Nikolai V. Pogorelov, Edouard Audit, Phillip Colella, and Gary P. Zank, eds. Parsek2D: An Implicit Parallel

More information

Treecodes for Cosmology Thomas Quinn University of Washington N-Body Shop

Treecodes for Cosmology Thomas Quinn University of Washington N-Body Shop Treecodes for Cosmology Thomas Quinn University of Washington N-Body Shop Outline Motivation Multipole Expansions Tree Algorithms Periodic Boundaries Time integration Gravitational Softening SPH Parallel

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Beam Propagation Method Solution to the Seminar Tasks

Beam Propagation Method Solution to the Seminar Tasks Beam Propagation Method Solution to the Seminar Tasks Matthias Zilk The task was to implement a 1D beam propagation method (BPM) that solves the equation z v(xz) = i 2 [ 2k x 2 + (x) k 2 ik2 v(x, z) =

More information

General Physics - E&M (PHY 1308) - Lecture Notes. General Physics - E&M (PHY 1308) Lecture Notes

General Physics - E&M (PHY 1308) - Lecture Notes. General Physics - E&M (PHY 1308) Lecture Notes General Physics - E&M (PHY 1308) Lecture Notes Lecture 014: RC Circuits and Magnetism SteveSekula, 21 March 2011 (created 7 March 2011) Capacitors in Circuits no tags What happens if we add a capacitor

More information

An Algorithmic Framework of Large-Scale Circuit Simulation Using Exponential Integrators

An Algorithmic Framework of Large-Scale Circuit Simulation Using Exponential Integrators An Algorithmic Framework of Large-Scale Circuit Simulation Using Exponential Integrators Hao Zhuang 1, Wenjian Yu 2, Ilgweon Kang 1, Xinan Wang 1, and Chung-Kuan Cheng 1 1. University of California, San

More information

Hybrid Simulation Method ISSS-10 Banff 2011

Hybrid Simulation Method ISSS-10 Banff 2011 Hybrid Simulation Method ISSS-10 Banff 2011 David Burgess Astronomy Unit Queen Mary University of London With thanks to Dietmar Krauss-Varban Space Plasmas: From Sun to Earth Space Plasma Plasma is (mostly)

More information

Accelerating Quantum Chromodynamics Calculations with GPUs

Accelerating Quantum Chromodynamics Calculations with GPUs Accelerating Quantum Chromodynamics Calculations with GPUs Guochun Shi, Steven Gottlieb, Aaron Torok, Volodymyr Kindratenko NCSA & Indiana University National Center for Supercomputing Applications University

More information

Behavioral Simulations in MapReduce

Behavioral Simulations in MapReduce Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1 What are Behavioral Simulations?

More information

Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice

Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice Michal Borovský Department of Theoretical Physics and Astrophysics, University of P. J. Šafárik in Košice,

More information

CS-206 Concurrency. Lecture 13. Wrap Up. Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/

CS-206 Concurrency. Lecture 13. Wrap Up. Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/ CS-206 Concurrency Lecture 13 Wrap Up Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/ Created by Nooshin Mirzadeh, Georgios Psaropoulos and Babak Falsafi EPFL Copyright 2015 EPFL CS-206 Spring

More information

Plasma Physics Prof. V. K. Tripathi Department of Physics Indian Institute of Technology, Delhi

Plasma Physics Prof. V. K. Tripathi Department of Physics Indian Institute of Technology, Delhi Plasma Physics Prof. V. K. Tripathi Department of Physics Indian Institute of Technology, Delhi Module No. # 01 Lecture No. # 22 Adiabatic Invariance of Magnetic Moment and Mirror Confinement Today, we

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Fluid Animation. Christopher Batty November 17, 2011

Fluid Animation. Christopher Batty November 17, 2011 Fluid Animation Christopher Batty November 17, 2011 What distinguishes fluids? What distinguishes fluids? No preferred shape Always flows when force is applied Deforms to fit its container Internal forces

More information

Supplementary Figure 1: Chemical compound space. Errors depending on the size of the training set for models with T = 1, 2, 3 interaction passes

Supplementary Figure 1: Chemical compound space. Errors depending on the size of the training set for models with T = 1, 2, 3 interaction passes 9 8 7 6 5 4 3 2 1 0 10 3 10 4 10 5 6 5 4 3 2 1 0 10000 25000 50000 100000 Supplementary Figure 1: Chemical compound space. Errors depending on the size of the training set for models with T = 1, 2, 3 interaction

More information

NIMEQ: MHD Equilibrium Solver for NIMROD

NIMEQ: MHD Equilibrium Solver for NIMROD NIMEQ: MHD Equilibrium Solver for NIMOD E.C.Howell, C..Sovinec University of Wisconsin-Madison 5 th Annual Meeting of Division of Plasma Physics Dallas, Texas, Nov. 17-Nov. 1,8 1 Abstract A Grad-Shafranov

More information

A particle-in-cell method with adaptive phase-space remapping for kinetic plasmas

A particle-in-cell method with adaptive phase-space remapping for kinetic plasmas A particle-in-cell method with adaptive phase-space remapping for kinetic plasmas Bei Wang 1 Greg Miller 2 Phil Colella 3 1 Princeton Institute of Computational Science and Engineering Princeton University

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

Randomized Selection on the GPU. Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory

Randomized Selection on the GPU. Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory Randomized Selection on the GPU Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory High Performance Graphics 2011 August 6, 2011 Top k Selection on GPU Output the top k keys

More information

Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem

Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National

More information

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,

More information

Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei

Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei 1/20 Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei H. Watanabe ISSP, The M. Suzuki H. Inaoka N. Ito Kyushu University RIKEN AICS The, RIKEN AICS Outline 1. Introduction 2. Benchmark results

More information

CSC321 Lecture 8: Optimization

CSC321 Lecture 8: Optimization CSC321 Lecture 8: Optimization Roger Grosse Roger Grosse CSC321 Lecture 8: Optimization 1 / 26 Overview We ve talked a lot about how to compute gradients. What do we actually do with them? Today s lecture:

More information

GPU Accelerated Markov Decision Processes in Crowd Simulation

GPU Accelerated Markov Decision Processes in Crowd Simulation GPU Accelerated Markov Decision Processes in Crowd Simulation Sergio Ruiz Computer Science Department Tecnológico de Monterrey, CCM Mexico City, México sergio.ruiz.loza@itesm.mx Benjamín Hernández National

More information

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Eftychios Sifakis CS758 Guest Lecture - 19 Sept 2012 Introduction Linear systems

More information

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis Lecture 14 Class URL: http://vlsicad.ucsd.edu/courses/cse21-s14/ Lecture 14 Notes Goals for this week Big-O complexity

More information

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03024-7 An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Michel Lesoinne

More information

Concurrent Divide-and-Conquer Library

Concurrent Divide-and-Conquer Library with Petascale Electromagnetics Applications, Tech-X Corporation CScADS Workshop on Libraries and Algorithms for Petascale Applications, 07/30/2007, Snowbird, Utah Background Particle In Cell (PIC) in

More information

MULTIGRID CALCULATIONS FOB. CASCADES. Antony Jameson and Feng Liu Princeton University, Princeton, NJ 08544

MULTIGRID CALCULATIONS FOB. CASCADES. Antony Jameson and Feng Liu Princeton University, Princeton, NJ 08544 MULTIGRID CALCULATIONS FOB. CASCADES Antony Jameson and Feng Liu Princeton University, Princeton, NJ 0544 1. Introduction Development of numerical methods for internal flows such as the flow in gas turbines

More information

Chapter 5. Formulation of FEM for Unsteady Problems

Chapter 5. Formulation of FEM for Unsteady Problems Chapter 5 Formulation of FEM for Unsteady Problems Two alternatives for formulating time dependent problems are called coupled space-time formulation and semi-discrete formulation. The first one treats

More information

Julian Merten. GPU Computing and Alternative Architecture

Julian Merten. GPU Computing and Alternative Architecture Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg

More information

Practical Combustion Kinetics with CUDA

Practical Combustion Kinetics with CUDA Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Practical Combustion Kinetics with CUDA GPU Technology Conference March 20, 2015 Russell Whitesides

More information

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2) INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder

More information

Cyclops Tensor Framework

Cyclops Tensor Framework Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r

More information

Practical Free-Start Collision Attacks on full SHA-1

Practical Free-Start Collision Attacks on full SHA-1 Practical Free-Start Collision Attacks on full SHA-1 Inria and École polytechnique, France Nanyang Technological University, Singapore Joint work with Thomas Peyrin and Marc Stevens Séminaire Cryptologie

More information