sri 2D Implicit Charge- and Energy- Conserving Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy
|
|
- Shanon Dawson
- 6 years ago
- Views:
Transcription
1 2D Implicit Charge- and Energy- Conserving sri Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy Mentors Dana Knoll and Allen McPherson IS&T CoDesign Summer School 2012, Los Alamos National Laboratory, NM LA-UR : Approved for public release; distribution is unlimited.
2 Agenda Co-Design Summer Problem - 2D Implicit Energy- and Charge- Conservation 2D Implicit PIC Method Outline CUDA Implementation Successful strategies Exploiting texture memory for storing the electric and magnetic fields Usage of intrinsics and strength reduction operations Sorting particles by Cell-x and Cell-y Sorting particles by done-ness and velocity directions 1/2 ions + 1/2 electrons on each GPU Unsuccessful strategies Red-black strategy of launching blocks of GPU threads Ions on one GPU and electrons on another GPU
3 Co-Design Summer School The Los Alamos IS&T Co-Design Summer School was inaugurated in Students from diverse technical backgrounds including nuclear engineering, applied mathematics, and computer science, form teams that work together to solve a focussed co-design problem... Emmanuel Cieren Applied Mathematics ENSTA ParisTech Nicolas Feltman Computer Science Carnegie Mellon University Christopher Leibs Applied Mathematics University of Colorado Colleen McCarthy Applied Mathematics North Carolina State University Karthik Murthy Computer Science Rice University Yijie Wang Computer Science University of South Florida
4 Problem- Plasma Simulation (charge, current density) MOMENT SOLVER Solve Maxwell, J E, B (electric, magnetic fields) Interpolate Particles Fields PARTICLE PUSHER Interpolate Fields Particles r, v (position, velocity) Push Particles F = q(e + v B) (force) Implicit/ Explicit Method
5 Problem - Explicit Particle-In-Cell Method Main idea Interpolate field values to the particles Push particles Interpolate particle information to field locations Solve field equations and update values Constraints! finite grid instability (need dx D ) tight CFL constraint (need dt small enough ) can be computationally demanding Solution  Try to use implicit methods to relax these conditions!
6 Problem - Implicit Particle-In-Cell Method Chen, Chacón and Barnes* developed a 1D electrostatic PIC method that : relaxes the CFL condition, is stable against the finite grid instability, conserves charge, conserves energy, and controls momentum. Â We will draw heavily from many of these ideas * An energy- and charge-conserving, implicit, electrostatic particle-in-cell algorithm. Journal of Computational Physics, 230: , 2011.
7 Today s Problem Application to demonstrate 2D Implicit method: Island Equilibrium Figure 2.3 Initial conditions. A contour plot of the density function (in blue) with the fieldlines of the magnetic field (in orange). For this figure, ce /! pe =0.3, =0.25, and the domain length is [ 2, 2 ] [, ].
8 2D Implicit PIC - Cell in the Electric-Magnetic Field k+1 j+1 B z E x, J x k+ 1 2 B y E y, J y j+ 1 2 E x,j x E z,j z E y,j y B y k j B z i i i+ 1 2 B x i+1 B x E z, J z i+ 1 2 j j+ 1 2 j+1 i+1
9 2D Implicit PIC - Particle Sub-stepping Outline initialization fields, particles fields compute work write output loop over all particles time estimator particle push cell crossing accumulation while d < dt
10 2D Implicit PIC - Time Estimation (Control Momentum) sub-step times are chosen to help control momentum by comparing a first order (Euler) and second order (Heun) integration scheme the estimate is then compared with a fractional value of the gyro frequency and a distance limiter in order to help alleviate stresses in the Picard iteration `e,r `e,v 2 2 a(r ) 2 2 (ra v) We choose d such that : p`e,r ( ) 2 + `e,v ( ) 2 < a + r kr 0 ( )k 2 Where r 0 ( ) is the initial residual of the equations of motion
11 2D Implicit PIC - Energy Conserving Particle Push Crank-Nicolson discretization 8 >< r +1 p r p = v +1/2 p >: v +1 p v p = q p m p h E(r +1/2 p )+v +1/2 p i B(r +1/2 p ) j+1 p = v p + v +1 p v +1/2 r +1/2 2 p = r p + r +1 p F(r +1/2 p ) = X i,j 2 F i,j S(r i,j r +1/2 p ) j+ 1 2 j By Ez,Jz Ex,Jx Ey,Jy Bz i i+ 1 2 Bx i+1 Â Converged through fixed-point iterations (Picard) PICARD for r +1 p and v +1 p
12 2D Implicit PIC - Cell Crossing (Conserve Charge)
13 2D Implicit PIC - Cell Crossing (Conserve Charge)
14 2D Implicit PIC - Cell Crossing (Conserve Charge) Some attempts The linear intercept is good enough (fast but not accurate) Bisection method wrapped around original CN (accurate but slow) Fix the final boundary value in CN and solve new system for free dimension and time ( fast but not stable) Estimate time of crossing with explicit solve to accelerate above methods Lesson Learned Cell crossing was (much) harder than we anticipated
15 2D Implicit PIC - Current Accumulation Each particle must accumulate its sub-step weighted current to the grid Jn+1/2 i,j = 1 dt 1 dxdy X X p q p S(r i,j r 1/2 p )v +1/2 p j+1 j+ 1 2 By Ez,Jz Ex,Jx Ey,Jy j Bz Bx i i+ 1 2 i+1 Lesson Learned (for parallel implementation) This is a map from a high dimension set (particles) to a lower dimension set (grid). Must be careful to ensure particles are not competing for write access.
16 2D Implicit PIC - Implementation void runpic(){ read_fields(); read_particles(); for(int p=0; p<n; ++p){ while(tau<dt){ time_estimator(); push_particle(); cell_crossing(); } accum_current(); } accum_charge(); } time_average_current(); export_data();
17 GPUs Built a version of PIC using CUDA Capable of exploiting multiple GPUs Experiment results on: One node of Darwin (2x Tesla M2090s) Scooter (1x Kepler GTX 680) Fig. credit: Nvidia documentation
18 GPUs Kernels launch a grid of blocks Each block contains a set of threads Blocks are scheduled onto SMs by a hardware scheduler Can t guarantee the order of execution of threads or blocks Fig. credit: Nvidia documentation
19 CUDA 2D PIC- Lesson 1: Locality Parallelization Strategy Assign groups of cells (Mesh Blocks) to a single CUDA block
20 CUDA 2D PIC- Lesson 2: Locality Parallelization Strategy Reflect memory hierarchy in the accumulation of current density
21 CUDA 2D PIC- Lesson 3: Locality Parallelization Strategy Drifting particles need to be re-sorted
22 CUDA 2D PIC- Exploiting Texture Memory Texture Memory is Special read-only memory Optimized for access patterns exhibiting spatial locality Each SM has it s own texture cache Special texture units help accelerate fetching of data (Z-order curve) Employed for electric and magnetic fields Electric and magnetic fields are constant Field access patterns in force computation exhibit spatial locality Span of shape functions allow for efficient texture cache performance Perfect candidates for texture memory
23 Big Picture each block works on a mesh of cells E,B fields local J fields global J fields
24 Performance and Optimizations(1) Tunable parameters Mesh Cells Per Block Number of Particle sub-steps before resort Max Number of Crossings Red-Black Offsets (discussed later) Time in seconds
25 Performance and Optimizations(1) Bitwise hacks, Intrinsics and Strength reductions Optimized shape functions using bitwise operations (combo-hack!) Usage of fused-multiply-add ( fmaf_rn) and other intrinsics Converting division into multiplication by pre-computing constant values Loop unrolling (#pragma unroll) Time in seconds #define SIGN_MASK 0x7fffffff union combo_hack{ unsigned int in; float fl; }; device float b2(float x){ combo_hack flip; flip.fl = x; flip.in = flip.in & SIGN_MASK; if(flip.fl <= 1.5f) { if(flip.fl > 0.5f) return fmaf_rn( 0.5f*flip.fl, (flip.fl - 3.0f),1.125f); return fmaf_rn(-x, x, 0.75f); } return 0.0f; }
26 Performance and Optimizations(3) Sorting Strategies Particles are sorted by Cell-x and Cell-y Within Mesh Cells, particles are sorted by particle done-ness particle x-velocity direction particle y-velocity direction Time in seconds
27 Performance and Optimizations(4) Intuition Avoid write conflicts in overlap region (atomics are expensive)
28 Performance and Optimizations(4) Red-Black Scheduling Thwarted by the block scheduler Advantage in reduction of atomics vs Texture cache misses Time in seconds
29 Performance and Optimizations(5) Targeting Multiple GPUs (Tesla M2090s) Unsuccessful Attempt: Ions on one GPU and Electrons on second GPU Successful Attempt: 1 2 Ions Electrons on each GPU 70 Time in seconds 42
30 Conclusions Co-Design was a wonderful experience Successful strategies Exploiting texture memory for storing the electric and magnetic fields Usage of intrinsics and strength reducing operations Sorting particles by Cell-x and Cell-y Sorting particles by done-ness and velocity directions 1/2 ions + 1/2 electrons on each GPU Unsuccessful strategies Red-black strategy of launching blocks of GPU threads Ions on one GPU and electrons on another GPU Future Dynamic load balancing: launch blocks to match density profile Domain decomposition across multi-gpus
31 EXTRA For a typical run, we load particles (40 million ions, 40 million electrons) on grids of size or The total time of the simulation is ratio of m i m e = 100. t = 10/! pe, with and artificial mass Figure 2.3 Initial conditions. A contour plot of the density function (in blue) with the fieldlines of the magnetic field (in orange). For this figure, ce /! pe =0.3, =0.25, and the domain length is [ 2, 2 ] [, ].
GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications
GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign
More informationGuangye Chen, Luis Chacón,
JIFT workshop! Oct. 31, 2014 New Orleans, LA.! Guangye Chen, Luis Chacón, CoCoMANs team Los Alamos National Laboratory, Los Alamos, NM 87545, USA gchen@lanl.gov 1 Los Alamos National Laboratory Motivation
More informationSolving PDEs with CUDA Jonathan Cohen
Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear
More informationCOMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD
XVIII International Conference on Water Resources CMWR 2010 J. Carrera (Ed) c CIMNE, Barcelona, 2010 COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD James.E. McClure, Jan F. Prins
More informationA Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures
A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationarxiv: v1 [hep-lat] 7 Oct 2010
arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA
More informationA fully implicit, exactly conserving algorithm for multidimensional particle-in-cell kinetic simulations
A fully implicit, exactly conserving algorithm for multidimensional particle-in-cell kinetic simulations L. Chacón Applied Mathematics and Plasma Physics Group Theoretical Division Los Alamos National
More informationMONTE CARLO NEUTRON TRANSPORT SIMULATING NUCLEAR REACTIONS ONE NEUTRON AT A TIME Tony Scudiero NVIDIA
MONTE CARLO NEUTRON TRANSPORT SIMULATING NUCLEAR REACTIONS ONE NEUTRON AT A TIME Tony Scudiero NVIDIA TAKEAWAYS Why Monte Carlo methods are fundamentally different than deterministic methods Inherent Parallelism
More informationFaster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs
Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Christopher P. Stone, Ph.D. Computational Science and Engineering, LLC Kyle Niemeyer, Ph.D. Oregon State University 2 Outline
More informationS XMP LIBRARY INTERNALS. Niall Emmart University of Massachusetts. Follow on to S6151 XMP: An NVIDIA CUDA Accelerated Big Integer Library
S6349 - XMP LIBRARY INTERNALS Niall Emmart University of Massachusetts Follow on to S6151 XMP: An NVIDIA CUDA Accelerated Big Integer Library High Performance Modular Exponentiation A^K mod P Where A,
More informationA model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)
A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal
More informationLecture 4: The particle equations (1)
Lecture 4: The particle equations (1) Presenter: Mark Eric Dieckmann Department of Science and Technology (ITN), Linköping University, Sweden July 17, 2014 Overview We have previously discussed the leapfrog
More informationHeterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry
Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)
More informationEXASCALE COMPUTING. Implementation of a 2D Electrostatic Particle in Cell algorithm in UniÞed Parallel C with dynamic load-balancing
ExaScience Lab Intel Labs Europe EXASCALE COMPUTING Implementation of a 2D Electrostatic Particle in Cell algorithm in UniÞed Parallel C with dynamic load-balancing B. Verleye P. Henry R. Wuyts G. Lapenta
More informationHybrid Simulations: Numerical Details and Current Applications
Hybrid Simulations: Numerical Details and Current Applications Dietmar Krauss-Varban and numerous collaborators Space Sciences Laboratory, UC Berkeley, USA Boulder, 07/25/2008 Content 1. Heliospheric/Space
More informationDynamic Scheduling for Work Agglomeration on Heterogeneous Clusters
Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant Kale University of Illinois Urbana-Champaign May 25, 2012 Work is overdecomposed
More informationBeam dynamics calculation
September 6 Beam dynamics calculation S.B. Vorozhtsov, Е.Е. Perepelkin and V.L. Smirnov Dubna, JINR http://parallel-compute.com Outline Problem formulation Numerical methods OpenMP and CUDA realization
More informationA Hybrid Method for the Wave Equation. beilina
A Hybrid Method for the Wave Equation http://www.math.unibas.ch/ beilina 1 The mathematical model The model problem is the wave equation 2 u t 2 = (a 2 u) + f, x Ω R 3, t > 0, (1) u(x, 0) = 0, x Ω, (2)
More informationOn Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code
On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance
More informationArray-of-Struct particles for ipic3d on MIC. Alec Johnson and Giovanni Lapenta. EASC2014 Stockholm, Sweden April 3, 2014
Array-of-Struct particles for ipic3d on MIC Alec Johnson and Giovanni Lapenta Centre for mathematical Plasma Astrophysics Mathematics Department KU Leuven, Belgium EASC2014 Stockholm, Sweden April 3, 2014
More informationTwo case studies of Monte Carlo simulation on GPU
Two case studies of Monte Carlo simulation on GPU National Institute for Computational Sciences University of Tennessee Seminar series on HPC, Feb. 27, 2014 Outline 1 Introduction 2 Discrete energy lattice
More informationSP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay
SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain
More informationEarth System Modeling Domain decomposition
Earth System Modeling Domain decomposition Graziano Giuliani International Centre for Theorethical Physics Earth System Physics Section Advanced School on Regional Climate Modeling over South America February
More informationTowards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters
Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM
More informationMODELING OF CONCRETE MATERIALS AND STRUCTURES. Kaspar Willam. Class Meeting #5: Integration of Constitutive Equations
MODELING OF CONCRETE MATERIALS AND STRUCTURES Kaspar Willam University of Colorado at Boulder Class Meeting #5: Integration of Constitutive Equations Structural Equilibrium: Incremental Tangent Stiffness
More informationOptimization of Particle-In-Cell simulations for Vlasov-Poisson system with strong magnetic field
Optimization of Particle-In-Cell simulations for Vlasov-Poisson system with strong magnetic field Edwin Chacon-Golcher Sever A. Hirstoaga Mathieu Lutz Abstract We study the dynamics of charged particles
More informationThe Generalized Interpolation Material Point Method
Compaction of a foam microstructure The Generalized Interpolation Material Point Method Tungsten Particle Impacting sandstone The Material Point Method (MPM) 1. Lagrangian material points carry all state
More informationAMSC 663 Project Proposal: Upgrade to the GSP Gyrokinetic Code
AMSC 663 Project Proposal: Upgrade to the GSP Gyrokinetic Code George Wilkie (gwilkie@umd.edu) Supervisor: William Dorland (bdorland@umd.edu) October 11, 2011 Abstract Simulations of turbulent plasma in
More informationClaude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique
Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)
More informationBlock-Structured Adaptive Mesh Refinement
Block-Structured Adaptive Mesh Refinement Lecture 2 Incompressible Navier-Stokes Equations Fractional Step Scheme 1-D AMR for classical PDE s hyperbolic elliptic parabolic Accuracy considerations Bell
More informationEfficient Molecular Dynamics on Heterogeneous Architectures in GROMACS
Efficient Molecular Dynamics on Heterogeneous Architectures in GROMACS Berk Hess, Szilárd Páll KTH Royal Institute of Technology GTC 2012 GROMACS: fast, scalable, free Classical molecular dynamics package
More informationAlgebraic Multigrid as Solvers and as Preconditioner
Ò Algebraic Multigrid as Solvers and as Preconditioner Domenico Lahaye domenico.lahaye@cs.kuleuven.ac.be http://www.cs.kuleuven.ac.be/ domenico/ Department of Computer Science Katholieke Universiteit Leuven
More informationTips Geared Towards R. Adam J. Suarez. Arpil 10, 2015
Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done
More informationExplore Computational Power of GPU in Electromagnetics and Micromagnetics
Explore Computational Power of GPU in Electromagnetics and Micromagnetics Presenter: Sidi Fu, PhD candidate, UC San Diego Advisor: Prof. Vitaliy Lomakin Center of Magnetic Recording Research, Department
More informationPIConGPU Bringing Large-Scale Laser Plasma Simulations to GPU Supercomputing
PIConGPU Bringing Large-Scale Laser Plasma Simulations to GPU Supercomputing Michael Bussmann 1, Heiko Burau 1, René Widera 1, Florian Berninger 1, Axel Hübl 1, Thomas Kluge 1, Alexander Debus 1, Ulrich
More informationAccelerating Model Reduction of Large Linear Systems with Graphics Processors
Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex
More informationHigh-performance processing and development with Madagascar. July 24, 2010 Madagascar development team
High-performance processing and development with Madagascar July 24, 2010 Madagascar development team Outline 1 HPC terminology and frameworks 2 Utilizing data parallelism 3 HPC development with Madagascar
More informationParallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29
Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing
More informationTR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems
TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a
More informationIntroduction to numerical computations on the GPU
Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming
More informationEdwin Chacon-Golcher 1, Sever A. Hirstoaga 2 and Mathieu Lutz 3. Introduction
ESAIM: PROCEEDINGS AND SURVEYS, March 2016, Vol. 53, p. 177-190 M. Campos Pinto and F. Charles, Editors OPTIMIZATION OF PARTICLE-IN-CELL SIMULATIONS FOR VLASOV-POISSON SYSTEM WITH STRONG MAGNETIC FIELD
More informationMultigrid solvers for equations arising in implicit MHD simulations
Multigrid solvers for equations arising in implicit MHD simulations smoothing Finest Grid Mark F. Adams Department of Applied Physics & Applied Mathematics Columbia University Ravi Samtaney PPPL Achi Brandt
More informationReview: From problem to parallel algorithm
Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:
More informationResearch on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method
NUCLEAR SCIENCE AND TECHNIQUES 25, 0501 (14) Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method XU Qi ( 徐琪 ), 1, YU Gang-Lin ( 余纲林 ), 1 WANG Kan ( 王侃 ),
More informationCode Generation for GPU Accelerators in the Domain of Image Preprocessing
Code Generation for GPU Accelerators in the Domain of Image Preprocessing Oliver Reiche, Richard Membarth, Frank Hannig, and Jürgen Teich Hardware/Software Co-Design, University of Erlangen-Nuremberg Dagstuhl,
More informationLecture XI. Approximating the Invariant Distribution
Lecture XI Approximating the Invariant Distribution Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Invariant Distribution p. 1 /24 SS Equilibrium in the Aiyagari model G.
More informationReal-Time Scheduling and Resource Management
ARTIST2 Summer School 2008 in Europe Autrans (near Grenoble), France September 8-12, 2008 Real-Time Scheduling and Resource Management Lecturer: Giorgio Buttazzo Full Professor Scuola Superiore Sant Anna
More informationLattice Boltzmann simulations on heterogeneous CPU-GPU clusters
Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts
More informationAccelerating incompressible fluid flow simulations on hybrid CPU/GPU systems
Accelerating incompressible fluid flow simulations on hybrid CPU/GPU systems Yushan Wang 1, Marc Baboulin 1,2, Karl Rupp 3,4, Yann Fraigneau 1,5, Olivier Le Maître 1,5 1 Université Paris-Sud, France 2
More informationMulticore Parallelization of Determinant Quantum Monte Carlo Simulations
Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Andrés Tomás, Che-Rung Lee, Zhaojun Bai, Richard Scalettar UC Davis SIAM Conference on Computation Science & Engineering Reno, March
More informationA microsecond a day keeps the doctor away: Efficient GPU Molecular Dynamics with GROMACS
GTC 20130319 A microsecond a day keeps the doctor away: Efficient GPU Molecular Dynamics with GROMACS Erik Lindahl erik.lindahl@scilifelab.se Molecular Dynamics Understand biology We re comfortably on
More informationS Subdivide, Preprocess and Conquer: Micromagnetism FEM/BEM-Simulations on Single-Node/Multi-GPU Systems
S4283 - Subdivide, : Micromagnetism FEM/BEM-Simulations on Single-Node/Multi-GPU Systems Elmar Westphal - Forschungszentrum Jülich GmbH 1 Contents Micromagnetism TetraMag, a FEM/BEM Micromagnetism Simulator
More informationAn Overview of Fluid Animation. Christopher Batty March 11, 2014
An Overview of Fluid Animation Christopher Batty March 11, 2014 What distinguishes fluids? What distinguishes fluids? No preferred shape. Always flows when force is applied. Deforms to fit its container.
More informationHideyuki Usui 1,3, M. Nunami 2,3, Y. Yagi 1,3, T. Moritaka 1,3, and JST/CREST multi-scale PIC simulation team
Hideyuki Usui 1,3, M. Nunami 2,3, Y. Yagi 1,3, T. Moritaka 1,3, and JST/CREST multi-scale PIC simulation team 1 Kobe Univ., Japan, 2 NIFS,Japan, 3 JST/CREST, Outline Multi-scale interaction between weak
More informationModeling and Solving Constraints. Erin Catto Blizzard Entertainment
Modeling and Solving Constraints Erin Catto Blizzard Entertainment Basic Idea Constraints are used to simulate joints, contact, and collision. We need to solve the constraints to stack boxes and to keep
More informationProf. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa
Accelerated Astrophysics: Using NVIDIA GPUs to Simulate and Understand the Universe Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa Cruz brant@ucsc.edu, UC
More informationGPU Computing Activities in KISTI
International Advanced Research Workshop on High Performance Computing, Grids and Clouds 2010 June 21~June 25 2010, Cetraro, Italy HPC Infrastructure and GPU Computing Activities in KISTI Hongsuk Yi hsyi@kisti.re.kr
More informationSimulation of Coulomb Collisions in Plasma Accelerators for Space Applications
Simulation of Coulomb Collisions in Plasma Accelerators for Space Applications D. D Andrea 1, W.Maschek 1 and R. Schneider 2 Vienna, May 6 th 2009 1 Institut for Institute for Nuclear and Energy Technologies
More informationMultistep Methods for IVPs. t 0 < t < T
Multistep Methods for IVPs We are still considering the IVP dy dt = f(t,y) t 0 < t < T y(t 0 ) = y 0 So far we have looked at Euler s method, which was a first order method and Runge Kutta (RK) methods
More informationHIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU
April 4-7, 2016 Silicon Valley HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU Minmin Sun, NVIDIA minmins@nvidia.com April 5th Brief Introduction of CTC AGENDA Alpha/Beta Matrix
More information6. Iterative Methods for Linear Systems. The stepwise approach to the solution...
6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse
More informationarxiv: v1 [physics.comp-ph] 30 Oct 2017
An efficient GPU algorithm for tetrahedron-based Brillouin-zone integration Daniel Guterding 1, and Harald O. Jeschke 1 Lucht Probst Associates, Große Gallusstraße 9, 011 Frankfurt am Main, Germany, European
More informationScientific Computing II
Scientific Computing II Molecular Dynamics Numerics Michael Bader SCCS Technical University of Munich Summer 018 Recall: Molecular Dynamics System of ODEs resulting force acting on a molecule: F i = j
More informationA CUDA Solver for Helmholtz Equation
Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College
More informationParsek2D: An Implicit Parallel Particle-in-Cell Code
NUMERICAL MODELING OF SPACE PLASMA FLOWS: ASTRONUM-2008 ASP Conference Series, Vol. 406, c 2009 Nikolai V. Pogorelov, Edouard Audit, Phillip Colella, and Gary P. Zank, eds. Parsek2D: An Implicit Parallel
More informationTreecodes for Cosmology Thomas Quinn University of Washington N-Body Shop
Treecodes for Cosmology Thomas Quinn University of Washington N-Body Shop Outline Motivation Multipole Expansions Tree Algorithms Periodic Boundaries Time integration Gravitational Softening SPH Parallel
More informationA Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters
A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationBeam Propagation Method Solution to the Seminar Tasks
Beam Propagation Method Solution to the Seminar Tasks Matthias Zilk The task was to implement a 1D beam propagation method (BPM) that solves the equation z v(xz) = i 2 [ 2k x 2 + (x) k 2 ik2 v(x, z) =
More informationGeneral Physics - E&M (PHY 1308) - Lecture Notes. General Physics - E&M (PHY 1308) Lecture Notes
General Physics - E&M (PHY 1308) Lecture Notes Lecture 014: RC Circuits and Magnetism SteveSekula, 21 March 2011 (created 7 March 2011) Capacitors in Circuits no tags What happens if we add a capacitor
More informationAn Algorithmic Framework of Large-Scale Circuit Simulation Using Exponential Integrators
An Algorithmic Framework of Large-Scale Circuit Simulation Using Exponential Integrators Hao Zhuang 1, Wenjian Yu 2, Ilgweon Kang 1, Xinan Wang 1, and Chung-Kuan Cheng 1 1. University of California, San
More informationHybrid Simulation Method ISSS-10 Banff 2011
Hybrid Simulation Method ISSS-10 Banff 2011 David Burgess Astronomy Unit Queen Mary University of London With thanks to Dietmar Krauss-Varban Space Plasmas: From Sun to Earth Space Plasma Plasma is (mostly)
More informationAccelerating Quantum Chromodynamics Calculations with GPUs
Accelerating Quantum Chromodynamics Calculations with GPUs Guochun Shi, Steven Gottlieb, Aaron Torok, Volodymyr Kindratenko NCSA & Indiana University National Center for Supercomputing Applications University
More informationBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1 What are Behavioral Simulations?
More informationPopulation annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice
Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice Michal Borovský Department of Theoretical Physics and Astrophysics, University of P. J. Šafárik in Košice,
More informationCS-206 Concurrency. Lecture 13. Wrap Up. Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/
CS-206 Concurrency Lecture 13 Wrap Up Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/ Created by Nooshin Mirzadeh, Georgios Psaropoulos and Babak Falsafi EPFL Copyright 2015 EPFL CS-206 Spring
More informationPlasma Physics Prof. V. K. Tripathi Department of Physics Indian Institute of Technology, Delhi
Plasma Physics Prof. V. K. Tripathi Department of Physics Indian Institute of Technology, Delhi Module No. # 01 Lecture No. # 22 Adiabatic Invariance of Magnetic Moment and Mirror Confinement Today, we
More informationDense Arithmetic over Finite Fields with CUMODP
Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,
More informationFluid Animation. Christopher Batty November 17, 2011
Fluid Animation Christopher Batty November 17, 2011 What distinguishes fluids? What distinguishes fluids? No preferred shape Always flows when force is applied Deforms to fit its container Internal forces
More informationSupplementary Figure 1: Chemical compound space. Errors depending on the size of the training set for models with T = 1, 2, 3 interaction passes
9 8 7 6 5 4 3 2 1 0 10 3 10 4 10 5 6 5 4 3 2 1 0 10000 25000 50000 100000 Supplementary Figure 1: Chemical compound space. Errors depending on the size of the training set for models with T = 1, 2, 3 interaction
More informationNIMEQ: MHD Equilibrium Solver for NIMROD
NIMEQ: MHD Equilibrium Solver for NIMOD E.C.Howell, C..Sovinec University of Wisconsin-Madison 5 th Annual Meeting of Division of Plasma Physics Dallas, Texas, Nov. 17-Nov. 1,8 1 Abstract A Grad-Shafranov
More informationA particle-in-cell method with adaptive phase-space remapping for kinetic plasmas
A particle-in-cell method with adaptive phase-space remapping for kinetic plasmas Bei Wang 1 Greg Miller 2 Phil Colella 3 1 Princeton Institute of Computational Science and Engineering Princeton University
More informationDirect Self-Consistent Field Computations on GPU Clusters
Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd
More informationRandomized Selection on the GPU. Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory
Randomized Selection on the GPU Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory High Performance Graphics 2011 August 6, 2011 Top k Selection on GPU Output the top k keys
More informationAccelerating computation of eigenvectors in the nonsymmetric eigenvalue problem
Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National
More informationSparse LU Factorization on GPUs for Accelerating SPICE Simulation
Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,
More informationHuge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
1/20 Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei H. Watanabe ISSP, The M. Suzuki H. Inaoka N. Ito Kyushu University RIKEN AICS The, RIKEN AICS Outline 1. Introduction 2. Benchmark results
More informationCSC321 Lecture 8: Optimization
CSC321 Lecture 8: Optimization Roger Grosse Roger Grosse CSC321 Lecture 8: Optimization 1 / 26 Overview We ve talked a lot about how to compute gradients. What do we actually do with them? Today s lecture:
More informationGPU Accelerated Markov Decision Processes in Crowd Simulation
GPU Accelerated Markov Decision Processes in Crowd Simulation Sergio Ruiz Computer Science Department Tecnológico de Monterrey, CCM Mexico City, México sergio.ruiz.loza@itesm.mx Benjamín Hernández National
More informationParallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)
Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Eftychios Sifakis CS758 Guest Lecture - 19 Sept 2012 Introduction Linear systems
More informationUCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis
UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis Lecture 14 Class URL: http://vlsicad.ucsd.edu/courses/cse21-s14/ Lecture 14 Notes Goals for this week Big-O complexity
More informationAn Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors
Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03024-7 An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Michel Lesoinne
More informationConcurrent Divide-and-Conquer Library
with Petascale Electromagnetics Applications, Tech-X Corporation CScADS Workshop on Libraries and Algorithms for Petascale Applications, 07/30/2007, Snowbird, Utah Background Particle In Cell (PIC) in
More informationMULTIGRID CALCULATIONS FOB. CASCADES. Antony Jameson and Feng Liu Princeton University, Princeton, NJ 08544
MULTIGRID CALCULATIONS FOB. CASCADES Antony Jameson and Feng Liu Princeton University, Princeton, NJ 0544 1. Introduction Development of numerical methods for internal flows such as the flow in gas turbines
More informationChapter 5. Formulation of FEM for Unsteady Problems
Chapter 5 Formulation of FEM for Unsteady Problems Two alternatives for formulating time dependent problems are called coupled space-time formulation and semi-discrete formulation. The first one treats
More informationJulian Merten. GPU Computing and Alternative Architecture
Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg
More informationPractical Combustion Kinetics with CUDA
Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Practical Combustion Kinetics with CUDA GPU Technology Conference March 20, 2015 Russell Whitesides
More informationINF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)
INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder
More informationCyclops Tensor Framework
Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r
More informationPractical Free-Start Collision Attacks on full SHA-1
Practical Free-Start Collision Attacks on full SHA-1 Inria and École polytechnique, France Nanyang Technological University, Singapore Joint work with Thomas Peyrin and Marc Stevens Séminaire Cryptologie
More information