Accelerating Three-Body Potentials using GPUs NVIDIA Tesla K20X
|
|
- Edward Hoover
- 6 years ago
- Views:
Transcription
1 Using a Hybrid Cray Supercomputer to Model Non-Icing Surfaces for Cold- Climate Wind Turbines Accelerating Three-Body Potentials using GPUs NVIDIA Tesla K20X GE Global Research Masako Yamada
2 Opportunity in Cold-Climate Wind Wind energy production > 285 GW/year and growing Cold regions favorable Lower human population Good wind conditions GW opportunity from ~$2million/MW installed Technical need Anti-icing surfaces 3-10% energy losses due to icing Shut-downs Active heating expensive VTT Technical Research Centre of Finland 13_wind_energy.jsp?lang=en 2
3 ALCC Awards million hours DOE ASCR Leadership Computing Challenge Awards Energy-relevant applications 1. Non-Icing Surfaces for Cold-Climate Wind Turbines Jaguar (Cray XK6) at Oak Ridge National Lab Molecular dynamics using LAMMPS 1 million mw water molecule droplets on engineered surfaces Completed >300 simulations Achieved >200x speedup from 2011 to 2013 >5x from GPU acceleration 2. Accelerated Non-Icing Surfaces for Cold-Climate Wind Turbines Titan (Cray XK7, hybrid) at Oak Ridge National Lab Time parallelization via Parallel Replica method Expected x faster results 3
4 Titan enables leadership-class study Size of simulation ~ 1 million molecules Droplet size >> critical nucleus size Mimic physical dimensions (*somewhat) Duration of simulation ~ 1 microsecond Nucleation is an activated process Freezing rarely observed in MD simulations Number of simulations ~ 100 s Study requires embarrassingly parallel runs Different surfaces, ambient temperatures, conductivity Multiple replicates required due to stochastic nature *million molecule droplet ~ 50nm diameter 4
5 Personal history with MD LOGO DRAFT Year Software/Language # of Molecules Hardware 1995 Pascal Few Desktop Mac 2000 C, Fortran90 Hundreds IBM SP, SGI O2K 2010 NAMD, LAMMPS 1000 s Linux HPC Present GPU-enabled LAMMPS Millions Titan
6 >200x overall speedup since Switched to mw water potential 3-body model is more expensive/complex than 2-body but Particle reduction at least 3x Timestep increase 10x No long-range forces 2. LAMMPS dynamic load balance 2-3x 3. GPU acceleration of 3-body model 5x 2011: 6 femtosecond/1024 CPU-second (SPC/E) 2013: 2 picoseconds/1024 CPU-second (mw) 6
7 1. mw water potential Stillinger Weber 3-body particle = one water molecule Introduced in 2009, Nature paper in 2011 Bulk water properties comparable or better than existing point-charge models Much faster than point-charge models Exemplary test case by authors: 180x faster than SPC/E GE production simulation: 40-50x faster than SPC/E asymmetric million molecule droplet on engineered surface; loaded onto 64 nodes SPC/E mw 7
8 2. LAMMPS dynamic load balance Introduced in 2012 Adjusts size of processor sub-domains to equalize number of particles 2-3x speedup for 1 million molecule droplets on 64 nodes (with user-specified processor mapping) No load balancing Default load balancing User-specified mapping 8
9 3. GPU-acceleration of 3-body potential See details W. Michael Brown and Masako Yamada Implementing Molecular Dynamics on Hybrid High Performance Computers Three-Body Potentials. Computer Physics Communications Computer Physics Communications, (2013) 9
10 Load 1 million molecules on Host/CPU + 1 million molecules 64 nodes Processor sub-domains correspond to spatial partitioning of droplet MPI tasks/node 1 core/paired-unit 10
11 Host Memory Per node ~ 15,000 molecules DRAFT Host AMD Opteron 6274 CPU Core0 Core1 Core2 Core3 Core4 Core5 Core6 Core7 Core8 Core9 Core10 Core11 Core12 Core13 Core14 Core15 Work iterm Work item Work item Work item Kernel Work item Work item Work item Work Group Processor 1 Processor 2. Processor 14 Accelerator NVIDIA Tesla K20X GPU Core 1 Core 2 Core 192 Private Private Private Local Memory Global Memory Work item = fundamental unit of activity 11
12 Parallelization in LAMMPS Host Time integration Thermostat/barostat Bond/angle calculations Statistics Accelerator 3-body potential Neighbor-lists 12
13 Generic 3-body potential U = i j i k>j φ p i, p j, p k r ij < r c, r ik < r c 0 otherwise Good candidate for GPU 1. Occupies majority of computational time 2. Can be decomposed into independent kernels/work-items p i i p j r ij r ik j k Stillinger-Weber MEAM Tersoff REBO/AIREBO Bond-order (0,0,0) p k r c = cutoff r α = neighbor skin 13
14 Redundant Computation Approach Atom-decomposition 1 atom 1 computational kernel only fewest operations (and effective parallelization) but shared memory access a bottleneck Force-decomposition 1 atom 3 computational kernels required redundant computations but reduced shared memory issues many work-items = more effective use of cores 14
15 Stillinger-Weber Parallelization U = φ 2 (r ij ) + φ 3 r ij, r ik, θ jik i j<i i j i k>j 2-body operations Atom i 3 kernels no data dependencies 3-body operations (r ij < r α ).AND. (r ik < r α ) ==.TRUE. update forces on i only 3-body operations (r ij < r α ).AND. (r ik < r α ) ==.FALSE. neighbor-of-neighbor interactions 15
16 Neighbor List 3-body force-decomposition approach involves neighbor-of-neighbor operations Requires additional overhead increase in border size shared by two processes neighbor list for ghost atoms straddling across cores GPU implementation not necessarily faster than CPU but less time spent in host-accelerator data transfer (note: neighbor lists are huge) 16
17 GPU acceleration benefit >5x speedup achieved in production water droplet of 1 million molecules on engineered surface (64 nodes) Not limited to Stillinger-Weber -- applicable to MEAM, Tersoff, REBO, AIREBO, Bond-order, etc. 17
18 Implementation 18
19 6 different surfaces Interaction potential developed at GE Global Research 19
20 Freezing front propagation Visualization of latent heat release 20
21 Bottom View Side View Visualizing crystalline regions Steinhardt-Nelson order parameter particle mobility DRAFT
22 Advanced visualization Mike Matheson, Oak Ridge National Lab Will include visuals/movies here 22
23 Next steps Quasi time parallelization using Parallel Replica Method Launch dozens of replicates simultaneously; monitor ensemble behavior Expected outcome: x faster results Analysis and application of simulation results 23
24 Credits Mike Brown (ORNL) GPU acceleration Paul Crozier (Sandia) dynamic load balancing Valeria Molinero (Utah) mw potential Aaron Keyes (Umich, Berkeley) Steinhardt-Nelson order parameters Art Voter/Danny Perez (LANL) Parallel Replica method Mike Matheson (ORNL) -- Visualization Jack Wells, Suzy Tichenor (ORNL) General Azar Alizadeh, Branden Moore, Rick Arthur, Margaret Blohm (GE Global Research) This research was conducted in part under the auspices of the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy under Contract No. DEAC05-00OR22725 with UT- Battelle, LLC. This research was also conducted in part under the auspices of the GE Global Research High Performance Computing program. 24
25 References W. Michael Brown, W. M and Yamada, M. Implementing Molecular Dynamics on Hybrid High Performance Computers Three-Body Potentials. Computer Physics Communications Computer Physics Communications, (2013) Shi, B. and Dhir, V. K. Molecular dynamics simulation of the contact angle of liquids on solid surfaces. The Journal of Chemical Physics, 130, 3 (01/21/ 2009), ; Sergi, D., Scocchi, G. and Ortona, A. Molecular dynamics simulations of the contact angle between water droplets and graphite surfaces. Fluid Phase Equilibria, 332, 0 (10/25/ 2012), Oxtoby, D. W. Homogeneous nucleation: theory and experiment. Journal of Physics: Condensed Matter, 4, ), Plimpton, S. Fast Parallel Algorithms for Short-Range Molecular Dynamics. Journal of Computational Physics, 117, 1 (3/1/ 1995), Humphrey, W., Dalke, A. and Schulten, K. VMD: Visual molecular dynamics. Journal of Molecular Graphics, 14, 1 (2// 1996), Keys, A. S. Shape Matching Analysis Code. University of Michigan, City, 2011; Keys, A. S., Iacovella, C. R. and Glotzer, S. C. Characterizing Structure Through Shape Matching and Applications to Self-Assembly. Annual Review of Condensed Matter Physics, 2, 1 (2011/03/ ), ; Steinhardt, P. J., Nelson, D. R. and Ronchetti, M. Bond-orientational order in liquids and glasses. Physical Review B, 28, 2 (07/15/ 1983), Stillinger, F. H. and Weber, T. A. Computer simulation of local order in condensed phases of silicon. Physical Review B, 31, 8 (04/15/ 1985), Berendsen, H. J. C., Grigera, J. R. and Straatsma, T. P. The missing term in effective pair potentials. The Journal of Physical Chemistry, 91, 24 (1987/11/ ), Molinero, V. and Moore, E. B. Water Modeled As an Intermediate Element between Carbon and Silicon. The Journal of Physical Chemistry B, 113, 13 (2009/04/ ), ; Moore, E. B. and Molinero, V. Structural transformation in supercooled water controls the crystallization rate of ice. Nature, 479, 7374 (11/24/print 2011), Yamada, M., Mossa, S., Stanley, H. E. and Sciortino, F. Interplay between Time-Temperature Transformation and the Liquid-Liquid Phase Transition in Water. Physical Review Letters, 88, 19 (04/26/ 2002), Brown, W. M., Wang, P., Plimpton, S. J. and Tharrington, A. N. Implementing molecular dynamics on hybrid high performance computers short range forces. Computer Physics Communications, 182, 4 (4// 2011), Voter, A. F. Parallel replica method for dynamics of infrequent events. Physical Review B, 57, 22 (06/01/ 1998), R13985-R
26 Titan 299,008 Opteron cores (18688 nodes) 18,688 NVIDIA Tesla K20X GPU accelerators Each node 16 cores AMD Opteron CPUs 1 Tesla K20X GPU accelerator 2688 compute cores Gemini interconnect (ASIC, MPI messages) PCI-Express 2.0 bus LAMMPS one of the acceptance testing applications for Titan 26
Accelerating Three-Body Molecular Dynamics Potentials Using NVIDIA Tesla K20X GPUs. GE Global Research Masako Yamada
Accelerating Three-Body Molecular Dynamics Potentials Using NVIDIA Tesla K20X GPUs GE Global Research Masako Yamada Overview of MD Simulations Non-Icing Surfaces for Wind Turbines Large simulations ~ 1
More informationParallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29
Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing
More informationLAMMPS Performance Benchmark on VSC-1 and VSC-2
LAMMPS Performance Benchmark on VSC-1 and VSC-2 Daniel Tunega and Roland Šolc Institute of Soil Research, University of Natural Resources and Life Sciences VSC meeting, Neusiedl am See, February 27-28,
More informationParallel Multivariate SpatioTemporal Clustering of. Large Ecological Datasets on Hybrid Supercomputers
Parallel Multivariate SpatioTemporal Clustering of Large Ecological Datasets on Hybrid Supercomputers Sarat Sreepathi1, Jitendra Kumar1, Richard T. Mills2, Forrest M. Hoffman1, Vamsi Sripathi3, William
More informationParallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2
1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013
More informationClaude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique
Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)
More informationEfficient Parallelization of Molecular Dynamics Simulations on Hybrid CPU/GPU Supercoputers
Efficient Parallelization of Molecular Dynamics Simulations on Hybrid CPU/GPU Supercoputers Jaewoon Jung (RIKEN, RIKEN AICS) Yuji Sugita (RIKEN, RIKEN AICS, RIKEN QBiC, RIKEN ithes) Molecular Dynamics
More informationSupporting Information for Solid-liquid Thermal Transport and its Relationship with Wettability and the Interfacial Liquid Structure
Supporting Information for Solid-liquid Thermal Transport and its Relationship with Wettability and the Interfacial Liquid Structure Bladimir Ramos-Alvarado, Satish Kumar, and G. P. Peterson The George
More informationRWTH Aachen University
IPCC @ RWTH Aachen University Optimization of multibody and long-range solvers in LAMMPS Rodrigo Canales William McDoniel Markus Höhnerbach Ahmed E. Ismail Paolo Bientinesi IPCC Showcase November 2016
More informationThe Performance Evolution of the Parallel Ocean Program on the Cray X1
The Performance Evolution of the Parallel Ocean Program on the Cray X1 Patrick H. Worley Oak Ridge National Laboratory John Levesque Cray Inc. 46th Cray User Group Conference May 18, 2003 Knoxville Marriott
More informationA microsecond a day keeps the doctor away: Efficient GPU Molecular Dynamics with GROMACS
GTC 20130319 A microsecond a day keeps the doctor away: Efficient GPU Molecular Dynamics with GROMACS Erik Lindahl erik.lindahl@scilifelab.se Molecular Dynamics Understand biology We re comfortably on
More informationPiz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess
Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting Thomas C. Schulthess 1 Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Piz Daint Compute node:
More informationUsing GPUs for faster LAMMPS particle simulations
Using GPUs for faster LAMMPS particle simulations Paul S. Crozier Exploiting New Computer Architectures in Molecular Dynamics Simulations March 23, 2011 Sandia National Laboratories is a multi-program
More informationOptimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-core Architectures
1 Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-core Architectures Hasan Metin Aktulga, Chris Knight, Paul Coffman, Tzu-Ray Shan, Wei Jiang Abstract Hybrid parallelism
More informationINITIAL INTEGRATION AND EVALUATION
INITIAL INTEGRATION AND EVALUATION OF SLATE PARALLEL BLAS IN LATTE Marc Cawkwell, Danny Perez, Arthur Voter Asim YarKhan, Gerald Ragghianti, Jack Dongarra, Introduction The aim of the joint milestone STMS10-52
More informationMolecular dynamics simulation. CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror
Molecular dynamics simulation CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror 1 Outline Molecular dynamics (MD): The basic idea Equations of motion Key properties of MD simulations Sample applications
More informationPerformance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville
Performance of the fusion code GYRO on three four generations of Crays Mark Fahey mfahey@utk.edu University of Tennessee, Knoxville Contents Introduction GYRO Overview Benchmark Problem Test Platforms
More informationInvestigation of an Unusual Phase Transition Freezing on heating of liquid solution
Investigation of an Unusual Phase Transition Freezing on heating of liquid solution Calin Gabriel Floare National Institute for R&D of Isotopic and Molecular Technologies, Cluj-Napoca, Romania Max von
More informationEnergetics in Ice VI
Energetics in Ice VI Nishanth Sasankan 2011 NSF / REU PROJECT Physics Department University of Notre Dame Advisor: Dr. Kathie E. Newman Abstract There are many different phases of Ice, which exist in different
More informationSupercomputing: Why, What, and Where (are we)?
Supercomputing: Why, What, and Where (are we)? R. Govindarajan Indian Institute of Science, Bangalore, INDIA govind@serc.iisc.ernet.in (C)RG@SERC,IISc Why Supercomputer? Third and Fourth Legs RG@SERC,IISc
More informationWeile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun Fu 1, Xuebin Chi 1, Weiguo Gao 2, Lin-Wang Wang 3
A plane wave pseudopotential density functional theory molecular dynamics code on multi-gpu machine - GPU Technology Conference, San Jose, May 17th, 2012 Weile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun
More informationHYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017
HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher
More informationPerformance Analysis of Lattice QCD Application with APGAS Programming Model
Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models
More informationThe Fast Multipole Method in molecular dynamics
The Fast Multipole Method in molecular dynamics Berk Hess KTH Royal Institute of Technology, Stockholm, Sweden ADAC6 workshop Zurich, 20-06-2018 Slide BioExcel Slide Molecular Dynamics of biomolecules
More informationJulian Merten. GPU Computing and Alternative Architecture
Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg
More informationA Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters
A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationMolecular Dynamics with LAMMPS on a Hybrid Cray Supercomputer
Molecular Dynamics with LAMMPS on a Hybrid Cray Supercomputer W. Michael Brown Na@onal Center for Computa@onal Sciences Oak Ridge Na@onal Laboratory NVIDIA Technology Theater, Supercompu8ng 2012 November
More informationPractical Combustion Kinetics with CUDA
Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Practical Combustion Kinetics with CUDA GPU Technology Conference March 20, 2015 Russell Whitesides
More informationDynamic Scheduling for Work Agglomeration on Heterogeneous Clusters
Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant Kale University of Illinois Urbana-Champaign May 25, 2012 Work is overdecomposed
More informationA framework for detailed multiphase cloud modeling on HPC systems
Center for Information Services and High Performance Computing (ZIH) A framework for detailed multiphase cloud modeling on HPC systems ParCo 2009, 3. September 2009, ENS Lyon, France Matthias Lieber a,
More informationWhat is Classical Molecular Dynamics?
What is Classical Molecular Dynamics? Simulation of explicit particles (atoms, ions,... ) Particles interact via relatively simple analytical potential functions Newton s equations of motion are integrated
More informationProcessing NOAA Observation Data over Hybrid Computer Systems for Comparative Climate Change Analysis
Processing NOAA Observation Data over Hybrid Computer Systems for Comparative Climate Change Analysis Xuan Shi 1,, Dali Wang 2 1 Department of Geosciences, University of Arkansas, Fayetteville, AR 72701,
More informationAcceleration of WRF on the GPU
Acceleration of WRF on the GPU Daniel Abdi, Sam Elliott, Iman Gohari Don Berchoff, Gene Pache, John Manobianco TempoQuest 1434 Spruce Street Boulder, CO 80302 720 726 9032 TempoQuest.com THE WORLD S FASTEST
More informationHigh-Performance Scientific Computing
High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org
More informationTwo case studies of Monte Carlo simulation on GPU
Two case studies of Monte Carlo simulation on GPU National Institute for Computational Sciences University of Tennessee Seminar series on HPC, Feb. 27, 2014 Outline 1 Introduction 2 Discrete energy lattice
More informationDiffusion of Water and Diatomic Oxygen in Poly(3-hexylthiophene) Melt: A Molecular Dynamics Simulation Study
Diffusion of Water and Diatomic Oxygen in Poly(3-hexylthiophene) Melt: A Molecular Dynamics Simulation Study Julia Deitz, Yeneneh Yimer, and Mesfin Tsige Department of Polymer Science University of Akron
More informationAccelerating Quantum Chromodynamics Calculations with GPUs
Accelerating Quantum Chromodynamics Calculations with GPUs Guochun Shi, Steven Gottlieb, Aaron Torok, Volodymyr Kindratenko NCSA & Indiana University National Center for Supercomputing Applications University
More informationMolecular Dynamics Simulations
Molecular Dynamics Simulations Dr. Kasra Momeni www.knanosys.com Outline LAMMPS AdHiMad Lab 2 What is LAMMPS? LMMPS = Large-scale Atomic/Molecular Massively Parallel Simulator Open-Source (http://lammps.sandia.gov)
More informationFirst, a look at using OpenACC on WRF subroutine advance_w dynamics routine
First, a look at using OpenACC on WRF subroutine advance_w dynamics routine Second, an estimate of WRF multi-node performance on Cray XK6 with GPU accelerators Based on performance of WRF kernels, what
More informationProf. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa
Accelerated Astrophysics: Using NVIDIA GPUs to Simulate and Understand the Universe Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa Cruz brant@ucsc.edu, UC
More informationJASS Modeling and visualization of molecular dynamic processes
JASS 2009 Konstantin Shefov Modeling and visualization of molecular dynamic processes St Petersburg State University, Physics faculty, Department of Computational Physics Supervisor PhD Stepanova Margarita
More informationGraphics Card Computing for Materials Modelling
Graphics Card Computing for Materials Modelling Case study: Analytic Bond Order Potentials B. Seiser, T. Hammerschmidt, R. Drautz, D. Pettifor Funded by EPSRC within the collaborative multi-scale project
More informationA Computation- and Communication-Optimal Parallel Direct 3-body Algorithm
A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm Penporn Koanantakool and Katherine Yelick {penpornk, yelick}@cs.berkeley.edu Computer Science Division, University of California,
More informationXiaoxia Li. Group of HPC & Cheminformatics Institute of Process Engineering Chinese Academy of Sciences, Beijing
GTC 2016 San Jose, Californiae, 7 April, 2016 Xiaoxia Li Group of HPC & Cheminformatics Institute of Process Engineering Chinese Academy of Sciences, Beijing Outline 1 2 3 Reaction mechanisms of coal pyrolysis?
More informationMultiscale simulations of complex fluid rheology
Multiscale simulations of complex fluid rheology Michael P. Howard, Athanassios Z. Panagiotopoulos Department of Chemical and Biological Engineering, Princeton University Arash Nikoubashman Institute of
More informationPetascale Quantum Simulations of Nano Systems and Biomolecules
Petascale Quantum Simulations of Nano Systems and Biomolecules Emil Briggs North Carolina State University 1. Outline of real-space Multigrid (RMG) 2. Scalability and hybrid/threaded models 3. GPU acceleration
More informationNucleation rate (m -3 s -1 ) Radius of water nano droplet (Å) 1e+00 1e-64 1e-128 1e-192 1e-256
Supplementary Figures Nucleation rate (m -3 s -1 ) 1e+00 1e-64 1e-128 1e-192 1e-256 Calculated R in bulk water Calculated R in droplet Modified CNT 20 30 40 50 60 70 Radius of water nano droplet (Å) Supplementary
More informationPerformance Evaluation of Scientific Applications on POWER8
Performance Evaluation of Scientific Applications on POWER8 2014 Nov 16 Andrew V. Adinetz 1, Paul F. Baumeister 1, Hans Böttiger 3, Thorsten Hater 1, Thilo Maurer 3, Dirk Pleiter 1, Wolfram Schenck 4,
More informationIntroduction to numerical computations on the GPU
Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming
More informationGPU-Accelerated Analysis and Visualization of Large Structures Solved by Molecular Dynamics Flexible Fitting
GPU-Accelerated Analysis and Visualization of Large Structures Solved by Molecular Dynamics Flexible Fitting John E. Stone, Ryan McGreevy, Barry Isralewitz, and Klaus Schulten Theoretical and Computational
More informationDirect Self-Consistent Field Computations on GPU Clusters
Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd
More informationThe QMC Petascale Project
The QMC Petascale Project Richard G. Hennig What will a petascale computer look like? What are the limitations of current QMC algorithms for petascale computers? How can Quantum Monte Carlo algorithms
More information上海超级计算中心 Shanghai Supercomputer Center. Lei Xu Shanghai Supercomputer Center San Jose
上海超级计算中心 Shanghai Supercomputer Center Lei Xu Shanghai Supercomputer Center 03/26/2014 @GTC, San Jose Overview Introduction Fundamentals of the FDTD method Implementation of 3D UPML-FDTD algorithm on GPU
More informationParallel Eigensolver Performance on High Performance Computers 1
Parallel Eigensolver Performance on High Performance Computers 1 Andrew Sunderland STFC Daresbury Laboratory, Warrington, UK Abstract Eigenvalue and eigenvector computations arise in a wide range of scientific
More informationNew approaches to strongly interacting Fermi gases
New approaches to strongly interacting Fermi gases Joaquín E. Drut The Ohio State University INT Program Simulations and Symmetries Seattle, March 2010 In collaboration with Timo A. Lähde Aalto University,
More informationSP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay
SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain
More informationLattice Boltzmann simulations on heterogeneous CPU-GPU clusters
Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts
More informationFrom Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system. Oliver Fuhrer and Thomas C. Schulthess
From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system Oliver Fuhrer and Thomas C. Schulthess 1 Piz Daint Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Compute node:
More informationReactive Molecular Dynamics on Massively Parallel Heterogeneous Architectures
Reactive Molecular Dynamics on Massively Parallel Heterogeneous Architectures Sudhir B Kylasa, Hasan Metin Aktulga, Ananth Y Grama Abstract We present a parallel implementation of the ReaxFF force field
More informationGPU Accelerated Markov Decision Processes in Crowd Simulation
GPU Accelerated Markov Decision Processes in Crowd Simulation Sergio Ruiz Computer Science Department Tecnológico de Monterrey, CCM Mexico City, México sergio.ruiz.loza@itesm.mx Benjamín Hernández National
More informationCollision and Coalescence 3/3/2010. ATS 351 Lab 7 Precipitation. Droplet Growth by Collision and Coalescence. March 7, 2006
ATS 351 Lab 7 Precipitation March 7, 2006 Droplet Growth by Collision and Coalescence Growth by condensation alone takes too long ( 15 C -) Occurs in clouds with tops warmer than 5 F Greater the speed
More informationS3D Direct Numerical Simulation: Preparation for the PF Era
S3D Direct Numerical Simulation: Preparation for the 10 100 PF Era Ray W. Grout, Scientific Computing SC 12 Ramanan Sankaran ORNL John Levesque Cray Cliff Woolley, Stan Posey nvidia J.H. Chen SNL NREL
More informationAtomistic molecular simulations for engineering applications: methods, tools and results. Jadran Vrabec
Atomistic molecular simulations for engineering applications: methods, tools and results Jadran Vrabec Motivation Simulation methods vary in their level of detail The more detail, the more predictive power
More informationAdaptive Heterogeneous Computing with OpenCL: Harnessing hundreds of GPUs and CPUs
Adaptive Heterogeneous Computing with OpenCL: Harnessing hundreds of GPUs and CPUs Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of Microelectronics Research University of Bristol, UK 1 ! Collaborators
More informationINTENSIVE COMPUTATION. Annalisa Massini
INTENSIVE COMPUTATION Annalisa Massini 2015-2016 Course topics The course will cover topics that are in some sense related to intensive computation: Matlab (an introduction) GPU (an introduction) Sparse
More informationA Brief Guide to All Atom/Coarse Grain Simulations 1.1
A Brief Guide to All Atom/Coarse Grain Simulations 1.1 Contents 1. Introduction 2. AACG simulations 2.1. Capabilities and limitations 2.2. AACG commands 2.3. Visualization and analysis 2.4. AACG simulation
More informationEnhancing Amber for Use on the Blue Waters High- Performance Computing Resource
Enhancing Amber for Use on the Blue Waters High- Performance Computing Resource Daniel R. Roe a, Jason M. Swails b, Romelia Salomon-Ferrer c, Adrian E. Roitberg b, and Thomas E. Cheatham III a * a. Department
More informationFaster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs
Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Christopher P. Stone, Ph.D. Computational Science and Engineering, LLC Kyle Niemeyer, Ph.D. Oregon State University 2 Outline
More informationGPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications
GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign
More informationMolecular Dynamics Simulation of Nanometric Machining Under Realistic Cutting Conditions Using LAMMPS
Molecular Dynamics Simulation of Nanometric Machining Under Realistic Cutting Conditions Using LAMMPS Rapeepan Promyoo Thesis Presentation Advisor: Dr. Hazim El-Mounayri Department of Mechanical Engineering
More informationCode Timings of a Bulk Liquid Simulation
1 Parallel Molecular Dynamic Code for Large Simulations using Truncated Octahedron Periodics M.M. Micci, L.N. Long and J.K. Little a a Aerospace Department, The Pennsylvania State University, 233 Hammond
More informationEfficient Molecular Dynamics on Heterogeneous Architectures in GROMACS
Efficient Molecular Dynamics on Heterogeneous Architectures in GROMACS Berk Hess, Szilárd Páll KTH Royal Institute of Technology GTC 2012 GROMACS: fast, scalable, free Classical molecular dynamics package
More informationNuclear Physics and Computing: Exascale Partnerships. Juan Meza Senior Scientist Lawrence Berkeley National Laboratory
Nuclear Physics and Computing: Exascale Partnerships Juan Meza Senior Scientist Lawrence Berkeley National Laboratory Nuclear Science and Exascale i Workshop held in DC to identify scientific challenges
More informationAccelerated Prediction of the Polar Ice and Global Ocean (APPIGO)
DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Accelerated Prediction of the Polar Ice and Global Ocean (APPIGO) Eric Chassignet Center for Ocean-Atmosphere Prediction
More informationPerformance Evaluation of MPI on Weather and Hydrological Models
NCAR/RAL Performance Evaluation of MPI on Weather and Hydrological Models Alessandro Fanfarillo elfanfa@ucar.edu August 8th 2018 Cheyenne - NCAR Supercomputer Cheyenne is a 5.34-petaflops, high-performance
More informationarxiv: v1 [hep-lat] 7 Oct 2010
arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA
More informationShear Properties and Wrinkling Behaviors of Finite Sized Graphene
Shear Properties and Wrinkling Behaviors of Finite Sized Graphene Kyoungmin Min, Namjung Kim and Ravi Bhadauria May 10, 2010 Abstract In this project, we investigate the shear properties of finite sized
More informationHow to deal with uncertainties and dynamicity?
How to deal with uncertainties and dynamicity? http://graal.ens-lyon.fr/ lmarchal/scheduling/ 19 novembre 2012 1/ 37 Outline 1 Sensitivity and Robustness 2 Analyzing the sensitivity : the case of Backfilling
More informationDirect Molecular Simulation of Nonequilibrium Flows
Direct Molecular Simulation of Nonequilibrium Flows Thomas E. Schwartzentruber Aerospace Engineering and Mechanics University of Minnesota AFOSR Young Investigator Program Grant # FA9550-10-1-0075 Direct
More informationStructure of the First and Second Neighbor Shells of Water: Quantitative Relation with Translational and Orientational Order.
Structure of the First and Second Neighbor Shells of Water: Quantitative Relation with Translational and Orientational Order Zhenyu Yan, Sergey V. Buldyrev,, Pradeep Kumar, Nicolas Giovambattista 3, Pablo
More informationGPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic
GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago
More informationArizona Cloud Seeding Efforts: A Salt River Project Perspective. James Walter SRP Surface Water Resources CPWAC/CPWP Joint Meeting March 30, 2018
Arizona Cloud Seeding Efforts: A Salt River Project Perspective James Walter SRP Surface Water Resources Importance of Winter Precipitation Winter Cloud Seeding 101 Early Cloud Seeding Projects Questions
More information11 Parallel programming models
237 // Program Design 10.3 Assessing parallel programs 11 Parallel programming models Many different models for expressing parallelism in programming languages Actor model Erlang Scala Coordination languages
More informationGPU Computing Activities in KISTI
International Advanced Research Workshop on High Performance Computing, Grids and Clouds 2010 June 21~June 25 2010, Cetraro, Italy HPC Infrastructure and GPU Computing Activities in KISTI Hongsuk Yi hsyi@kisti.re.kr
More informationLocal Hyperdynamics. Arthur F. Voter Theoretical Division Los Alamos National Laboratory Los Alamos, NM, USA
Local Hyperdynamics Arthur F. Voter Theoretical Division National Laboratory, NM, USA Summer School in Monte Carlo Methods for Rare Events Division of Applied Mathematics Brown University Providence, RI
More informationHigh-Performance Computing, Planet Formation & Searching for Extrasolar Planets
High-Performance Computing, Planet Formation & Searching for Extrasolar Planets Eric B. Ford (UF Astronomy) Research Computing Day September 29, 2011 Postdocs: A. Boley, S. Chatterjee, A. Moorhead, M.
More informationEfficient implementation of the overlap operator on multi-gpus
Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator
More informationUnraveling the mysteries of quarks with hundreds of GPUs. Ron Babich NVIDIA
Unraveling the mysteries of quarks with hundreds of GPUs Ron Babich NVIDIA Collaborators and QUDA developers Kip Barros (LANL) Rich Brower (Boston University) Mike Clark (NVIDIA) Justin Foley (University
More informationPerm State University Research-Education Center Parallel and Distributed Computing
Perm State University Research-Education Center Parallel and Distributed Computing A 25-minute Talk (S4493) at the GPU Technology Conference (GTC) 2014 MARCH 24-27, 2014 SAN JOSE, CA GPU-accelerated modeling
More informationIntroduction to Benchmark Test for Multi-scale Computational Materials Software
Introduction to Benchmark Test for Multi-scale Computational Materials Software Shun Xu*, Jian Zhang, Zhong Jin xushun@sccas.cn Computer Network Information Center Chinese Academy of Sciences (IPCC member)
More informationAccelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric
More informationParallel Eigensolver Performance on High Performance Computers
Parallel Eigensolver Performance on High Performance Computers Andrew Sunderland Advanced Research Computing Group STFC Daresbury Laboratory CUG 2008 Helsinki 1 Summary (Briefly) Introduce parallel diagonalization
More informationCyclops Tensor Framework
Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r
More informationTuning And Understanding MILC Performance In Cray XK6 GPU Clusters. Mike Showerman, Guochun Shi Steven Gottlieb
Tuning And Understanding MILC Performance In Cray XK6 GPU Clusters Mike Showerman, Guochun Shi Steven Gottlieb Outline Background Lattice QCD and MILC GPU and Cray XK6 node architecture Implementation
More informationScalable and Power-Efficient Data Mining Kernels
Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the
More informationComputer Simulation of Lignocellulosic Biomass
Computer Simulation of Lignocellulosic Biomass Loukas Petridis UT/ORNL Center for Molecular Biophysics Oak Ridge National Laboratory Cellulosic Ethanol Production Ligocellulosic Biomass Cellulose Hemicellulose
More informationHow is molecular dynamics being used in life sciences? Davide Branduardi
How is molecular dynamics being used in life sciences? Davide Branduardi davide.branduardi@schrodinger.com Exploring molecular processes with MD Drug discovery and design Protein-protein interactions Protein-DNA
More informationWelcome to MCS 572. content and organization expectations of the course. definition and classification
Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson
More informationApplications of Mathematical Economics
Applications of Mathematical Economics Michael Curran Trinity College Dublin Overview Introduction. Data Preparation Filters. Dynamic Stochastic General Equilibrium Models: Sunspots and Blanchard-Kahn
More information