Nuclear Physics and Computing: Exascale Partnerships. Juan Meza Senior Scientist Lawrence Berkeley National Laboratory

Similar documents
Supercomputing: Why, What, and Where (are we)?

Architecture-Aware Algorithms and Software for Peta and Exascale Computing

ON THE FUTURE OF HIGH PERFORMANCE COMPUTING: HOW TO THINK FOR PETA AND EXASCALE COMPUTING

Physics plans and ILDG usage

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29

INTENSIVE COMPUTATION. Annalisa Massini

Extreme Computing Trilogy: Nuclear Physics. Martin J. Savage University of Washington August 2012, Lattice Summer School

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lecture 19. Architectural Directions

High-Performance Scientific Computing

Data Intensive Computing meets High Performance Computing

ab initio Electronic Structure Calculations

Computing least squares condition numbers on hybrid multicore/gpu systems

From Supercomputers to GPUs

Computational Physics Computerphysik

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code

History of Scientific Computing!

Nuclear Physics at the Interface of 21 st Century Computational Science

Performance Analysis of Lattice QCD Application with APGAS Programming Model

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters --

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures

Julian Merten. GPU Computing and Alternative Architecture

The Memory Intensive System

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

Direct Self-Consistent Field Computations on GPU Clusters

Fundamentals of Computational Science

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic

Case Study: Quantum Chromodynamics

Lattice Quantum Chromodynamics on the MIC architectures

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Introduction to numerical computations on the GPU

MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs

NOAA Supercomputing Directions and Challenges. Frank Indiviglio GFDL MRC Workshop June 1, 2017

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu

GPU Computing Activities in KISTI

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess

Measuring freeze-out parameters on the Bielefeld GPU cluster

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and

Cactus Tools for Petascale Computing

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster

Molecular Dynamics Simulations

Parallel Eigensolver Performance on High Performance Computers 1

Big Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation

arxiv: v1 [hep-lat] 7 Oct 2010

A Survey of HPC Usage in Europe and the PRACE Benchmark Suite

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia

Improvements for Implicit Linear Equation Solvers

Massive Parallelization of First Principles Molecular Dynamics Code

Enhancing Scalability of Sparse Direct Methods

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

CRYPTOGRAPHIC COMPUTING

Applications of Mathematical Economics

Introduction The Nature of High-Performance Computation

The QMC Petascale Project

Symmetry insights for design of supercomputer network topologies: roots and weights lattices.

GPU-based computation of the Monte Carlo simulation of classical spin systems

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Software optimization for petaflops/s scale Quantum Monte Carlo simulations

New approaches to strongly interacting Fermi gases

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark

Core-Collapse Supernova Simulation A Quintessentially Exascale Problem

Universität Dortmund UCHPC. Performance. Computing for Finite Element Simulations

Solving RODEs on GPU clusters

Algebraic Multi-Grid solver for lattice QCD on Exascale hardware: Intel Xeon Phi

A simple Concept for the Performance Analysis of Cluster-Computing

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA

Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems

A Numerical QCD Hello World

MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors

Reducing the Run-time of MCMC Programs by Multithreading on SMP Architectures

GRAPE-DR, GRAPE-8, and...

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters

Frontiers of Extreme Computing 2007 Applications and Algorithms Working Group

ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems

APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD

On the Paths to Exascale: Will We be Hungry?

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors

QCDOC A Specialized Computer for Particle Physics

From Physics to Logic

A Survey of HPC Systems and Applications in Europe

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

Cold QCD. Meeting on Computational Nuclear Physics. Washington, DC July Thomas Luu Lawrence Livermore National Laboratory

Molecular Dynamics Simulation of a Biomolecule with High Speed, Low Power and Accuracy Using GPU-Accelerated TSUBAME2.

Cartesius Opening. Jean-Marc DENIS. June, 14th, International Business Director Extreme Computing Business Unit

Adjoint-Based Uncertainty Quantification and Sensitivity Analysis for Reactor Depletion Calculations

Tuning And Understanding MILC Performance In Cray XK6 GPU Clusters. Mike Showerman, Guochun Shi Steven Gottlieb

Update on Cray Earth Sciences Segment Activities and Roadmap

Transcription:

Nuclear Physics and Computing: Exascale Partnerships Juan Meza Senior Scientist Lawrence Berkeley National Laboratory

Nuclear Science and Exascale i Workshop held in DC to identify scientific challenges in nuclear physics Identify areas that could benefit from extreme scale computing Hosted by DOE Advanced Scientific Computing Research and Nuclear Physics

Five Major Areas Nuclear Structure and Nuclear Reactions Nuclear Forces and Cold QCD Nuclear Astrophysics Accelerator Physics 3 Hot and Dense QCD

Some of the report s findings 19 Priority Research Directions identified Extreme scale computing will provide the computational resources required to perform calculations that will unify nuclear physics research Foster collaborations between nuclear physicists, computational scientists, applied mathematicians, and physicists outside of nuclear physics 4

Cold QCD and Nuclear Forces Spectrum of QCD How QCD makes a proton From QCD to nuclei Fundamental symmetries Linear solvers for lattice -Dirac equation Deflation techniques and other preconditioners for the Dirac operator High-precision algorithms 5

Nuclear Structure and Nuclear Reactions Predictive capability for the entire periodic table ab initio calculations of light nuclei and their reactions Green s Function Monte Carlo ab initio no-core shell model, extended by the resonating group method (NCSM/RGM) Coupled cluster method 6

Nuclear Astrophysics Nonlinear algebraic equations Adaptive mesh refinement Sparse, structured linear systems of equations, with physics-based preconditioners Nonlinear, stiff, coupled, and large systems of ODEs Need multi-core versions of all of these 7

Hot and Dense QCD Inversion of the fermion matrix, where non-zero entries fluctuate significantly during the calculation of new field configurations Need for deflation techniques, domain decomposition, and MG Notorious sign problem 8

Accelerator Physics Maximize the production and efficiency and beam purity for rare isotope beams Optimal design for an electron-ion collider Design optimization of complex electromagnetic structures Advanced methods for accelerator simulation 9

Sampling of Math/CS needs Solution of systems of linear equations Deflation techniques, domain decomposition, MG, effective preconditioners Monte Carlo for high-dimensional integrals Nonlinear algebraic equations Adaptive mesh refinement Nonlinear, stiff, coupled, and large systems of ODEs Global optimization, multi-objective optimization Multicore algorithms High precision algorithms Data management and Visualization 10

Computer Architecture Trends 11

37th List: The TOP10 Rank Site Manufacturer Computer Country Cores 1 2 3 4 5 RIKEN Advanced Institute for Computational Science National SuperComputer Center in Tianjin Oak Ridge National Laboratory National Supercomputing Centre in Shenzhen GSIC, Tokyo Institute of Technology Fujitsu NUDT Cray Dawning NEC/HP 6 DOE/NNSA/LANL/SNL Cray 7 8 9 NASA/Ames Research Center/NAS DOE/SC/ LBNL/NERSC Commissariat a l'energie Atomique (CEA) SGI Cray Bull 10 DOE/NNSA/LANL IBM K Computer SPARC64 VIIIfx 2.0GHz, Tofu Interconnect Tianhe-1A NUDT TH MPP, Xeon 6C, NVidia, FT-1000 8C Jaguar Cray XT5, HC 2.6 GHz Nebulae TC3600 Blade, Intel X5650, NVidia Tesla C2050 GPU TSUBAME-2 HP ProLiant, Xeon 6C, NVidia, Linux/Windows Cielo Cray XE6, 8C 2.4 GHz Pleiades SGI Altix ICE 8200EX/8400EX Hopper Cray XE6, 6C 2.1 GHz Tera 100 Bull bullx super-node S6010/ S6030 Roadrunner BladeCenter QS22/LS21 Rmax [Pflops] Power [MW] Japan 548,352 8.162 9.90 China 186,368 2.566 4.04 USA 224,162 1.759 6.95 China 120,640 1.271 2.58 Japan 73,278 1.192 1.40 USA 142,272 1.110 3.98 USA 111,104 1.088 4.10 USA 153,408 1.054 2.91 France 138.368 1.050 4.59 USA 122,400 1.042 2.34

Projected Performance Development 1E+11 1E+10 11E+09 Eflop/s 100 Pflop/s 000000 10 Pflop/s 000000 000000 1 Pflop/s 100 Tflop/s 100000 10 Tflop/s 10000 1 Tflop/s 1000 100 Gflop/s 100 10 Gflop/s 10 1 Gflop/s 1 100 Mflop/s 0.1 SUM N=1 N=500 199 4 199 6 199 8 200 0 200 200 4 200 200 201 201 2 201 201 6 201 8 202 Top 500 slides courtesy of Erich Strohmaier, LBNL

Power Consumption 5 TOP10 4 Power [MW] 3 2 TOP50 1 TOP500 0 2008 2009 2010 2011 Top 500 slides courtesy of Erich Strohmaier, LBNL

Power Efficiency Linpack/Power [Gflops/kW] 500 450 400 350 300 250 200 150 100 50 0 TOP10 TOP50 TOP500 2008 2009 2010 2011 Top 500 slides courtesy of Erich Strohmaier, LBNL

Cores per Socket 500 450 400 350 300 250 200 150 100 50 0 2002 2003 2004 2005 Systems 2006 2007 2008 2009 2010 2011 16 12 10 9 8 6 4 2 1 Top 500 slides courtesy of Erich Strohmaier, LBNL

Computing Performance Improvements will be Harder than Ever Used to rely on processor speeds increases plus parallelism Single processors are not getting faster Key challenge is energy! 10000000 1000000 100000 10000 1000 100 10 Transistors (Thousands) Frequency (MHz) Power (W) Cores 1 0 1970 1975 1980 1985 1990 1995 2000 2005 2010 Future: All in added concurrency, include new on-chip concurrency

New Processors Means New Software Interconnect Memory Processors 130 Watts 75 Watts Usual Processors Manycore processors Exascale systems will be built from chips with thousands of tiny processor cores The architecture (how they will be organized) is still an R&D problem, but likely a mixture of core types They will require a different kind of programming and new software 18

Memory and Network Also Important Interconnect Memory Processors 75 Megawatts 25 Megawatts Usual memory + network Memory as important as processors in energy Requires basic R&D to lower energy use by memory stacking and other innovations True for all computational problems, but especially data intensive ones 19 New memory + network

Math and Computing Science Trends 20

Strong-Scaling Drives Change in Algorithm Requirements Parallel computing has thrived on weak-scaling for past 15 years Flat CPU performance increases emphasis on strong -scaling Focus on Strong Scaling will change requirements: Concurrency: Will double every 18 months Implicit Methods: Improve time-to-solution (pay for allreduce) Multiscale/AMR methods: Only apply computation where it is required (need better approaches for load balancing) Slide courtesy of John Shalf, LBNL 21

Exascale Machines Require New Algorithms Even with innovations: arithmetic (flop) is free data movement is expensive (time and energy) New model for algorithms and applied mathematics on these machines Algorithms avoid data movement Significant change from tradition of counting arithmetic operations Exascale will enable new science problems, which will also require new algorithms

Programming Models Programming Languages are a reflection of the underlying hardware Model is rapidly diverging from our current machine model If the programming model doesn t reflect the costs on underlying machine, then cannot fix this problem Important Language Features for Future Hierarchical data layout statements Loops that execute based on data locality (not thread #) Support for non-spmd programming model Annotations to make functional guarantees Abstractions for disjoint memory spaces Slide courtesy of John Shalf, LBNL 23

Predictive Computational Science Accuracy + Uncertainty Quantification = Predictive High-fidelity algorithms, high-resolution models Quantification of uncertainty due to mathematical models, input parameters, algorithms, codes First necessary step in validating simulations Need to develop capability in UQ and Verification & Validation and tools to facilitate workflows

Data Enabled Science Many scientific problems are now rate-limited by access to and analysis of data Climate models must integrate petascale data sets Genome sequencing is outpacing computing and algorithms Observational data from instruments will also outpace current methods Data mining and analysis methods will not scale with growth of current data sets Need to develop capabilities to rapidly access, search, and mine datasets

Summary of Challenges to Exascale System power is the primary constraint Concurrency (1000x today) Memory bandwidth and capacity are not keeping pace Processor architecture is an open question Software needs to change to match architecture Algorithms need to minimize data movement, not flops I/O and Data analytics also need to keep pace Reliability and resiliency will be critical at this scale Unlike the last 20 years most of these (1-7) are equally important across scales, e.g., 100 10-PF machines

Final thoughts Nuclear physicists are benefitting greatly from interactions and collaborations with computer scientists, and such collaborations must be further embraced and strengthened as researchers move toward the era of extreme computing Scientific Grand Challenges: Forefront Questions in Nuclear Science and the Role of Computing at the Extreme Scale, 2009 27

Questions to jump-start discussion How far can we go by just fine-tuning our existing algorithms for new architectures? Will we have to change the way we program large simulations? Can we reproduce results of simulations that use 1,10, 100M cores? And if we can reproduce them, how do we quantify the uncertainty in the results? 28

Backup Slides 29