Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess

Similar documents
From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system. Oliver Fuhrer and Thomas C. Schulthess

Exascale computing: endgame or new beginning for climate modelling. Thomas C. Schulthess

Reflecting on the Goal and Baseline of Exascale Computing

Supercomputers: instruments for science or dinosaurs that haven t gone extinct yet? Thomas C. Schulthess

Deutscher Wetterdienst

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

The Memory Intensive System

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29

Supercomputer Programme

First, a look at using OpenACC on WRF subroutine advance_w dynamics routine

ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS

Reflecting on the goal and baseline for exascale computing: a roadmap based on weather and climate simulations

Acceleration of WRF on the GPU

ECMWF Computing & Forecasting System

Advancing Weather Prediction at NOAA. 18 November 2015 Tom Henderson NOAA / ESRL / GSD

Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6

Performance Evaluation of MPI on Weather and Hydrological Models

Exascale challenges for Numerical Weather Prediction : the ESCAPE project

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory

Julian Merten. GPU Computing and Alternative Architecture

Supercomputing: Why, What, and Where (are we)?

The Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia

Performance Analysis of Lattice QCD Application with APGAS Programming Model

Parallel Multivariate SpatioTemporal Clustering of. Large Ecological Datasets on Hybrid Supercomputers

The Panel: What does the future look like for NPW application development? 17 th ECMWF Workshop on High Performance Computing in Meteorology

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

High-Performance Computing and Groundbreaking Applications

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

An Overview of HPC at the Met Office

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Red Sky. Pushing Toward Petascale with Commodity Systems. Matthew Bohnsack. Sandia National Laboratories Albuquerque, New Mexico USA

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem

Progress in NWP on Intel HPC architecture at Australian Bureau of Meteorology

Introduction to Benchmark Test for Multi-scale Computational Materials Software

Weather and Climate Modeling on GPU and Xeon Phi Accelerated Systems

On the Paths to Exascale: Will We be Hungry?

Scalability Ini,a,ve at ECMWF

16th Workshop on High Performance Computing in Meteorology

Petascale Quantum Simulations of Nano Systems and Biomolecules

Scalable and Power-Efficient Data Mining Kernels

Stochastic Modelling of Electron Transport on different HPC architectures

Perm State University Research-Education Center Parallel and Distributed Computing

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Performance of WRF using UPC

Update on Cray Earth Sciences Segment Activities and Roadmap

Introduction to numerical computations on the GPU

A framework for detailed multiphase cloud modeling on HPC systems

Scaling the Software and Advancing the Science of Global Modeling and Assimilation Systems at NASA. Bill Putman

INITIAL INTEGRATION AND EVALUATION

John C. Linford. ParaTools, Inc. EMiT 15, Manchester UK 1 July 2015

Accelerated Prediction of the Polar Ice and Global Ocean (APPIGO)

Implementation and validation of the. ECMWF IFS convection scheme. in COSMO-CLM. Peter Brockhaus. Daniel Lüthi. Christoph Schär

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

Paralleliza(on and Performance of the NIM Weather Model on CPU, GPU and MIC Architectures

ECMWF Scalability Programme

Center Report from KMA

Practical Combustion Kinetics with CUDA

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures

Deutscher Wetterdienst

Progress in Numerical Methods at ECMWF

The QMC Petascale Project

Nuclear Physics and Computing: Exascale Partnerships. Juan Meza Senior Scientist Lawrence Berkeley National Laboratory

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

Recent successes in high-end modelling for materials design in Europe. Thomas C. Schulthess

Performance of Met Office Weather and Climate Codes on Cavium ThunderX2 Processors. Adam Voysey, Maff Glover HPC Optimisation Team

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code

Chile / Dirección Meteorológica de Chile (Chilean Weather Service)

From Supercomputers to GPUs

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Implementing NNLO into MCFM

Improving weather prediction via advancing model initialization

GPU Computing Activities in KISTI

History of the partnership between SMHI and NSC. Per Undén

The Fast Multipole Method in molecular dynamics

ECE 574 Cluster Computing Lecture 20

NVIDIA HPC Update for Earth System Modeling. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA

CP2K. New Frontiers. ab initio Molecular Dynamics

Efficient implementation of the overlap operator on multi-gpus

arxiv: v1 [hep-lat] 8 Nov 2014

MSC HPC Infrastructure Update. Alain St-Denis Canadian Meteorological Centre Meteorological Service of Canada

GloMAP Mode on HECToR Phase2b (Cray XT6) Mark Richardson Numerical Algorithms Group

Accelerating Three-Body Potentials using GPUs NVIDIA Tesla K20X

WRF benchmark for regional applications

High-Performance Scientific Computing

How to Prepare Weather and Climate Models for Future HPC Hardware

THE WEATHER RESEARCH AND FORECAST MODEL VERSION 2.0

APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD

MEMBER REPORT. Singapore. ESCAP/WMO Typhoon Committee 9 th Integrated Workshop October 2014 ESCAP UN Conference Center, Bangkok, Thailand

Lenstool-HPC. From scratch to supercomputers: building a large-scale strong lensing computational software bottom-up. HPC Advisory Council, April 2018

上海超级计算中心 Shanghai Supercomputer Center. Lei Xu Shanghai Supercomputer Center San Jose

Performance Evaluation of Scientific Applications on POWER8

Efficient multigrid solvers for mixed finite element discretisations in NWP models

Targeting Extreme Scale Computational Challenges with Heterogeneous Systems

Establishing a high-resolution precipitation dataset for the Alps

Transcription:

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting Thomas C. Schulthess 1

Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Piz Daint Compute node: > Host: Intel Xeon E5 2670 (SandyBridge 8c) > Accelerator: NVIDIA K20X GPU (GK110) 2

September 15, 2015 Today s Outlook: GPU-accelerated Weather Forecasting John Russell Piz Kesch 3

Swiss High-Performance Computing & Networking Initiative (HPCN) High-risk & high-impact projects (www.hp2c.ch) Application driven co-design of pre-exascale supercomputing ecosystem Phase II Monte Rosa Cray XT5 14 762 cores Hex-core upgrade 22 128 cores Three pronged approach of the HPCN Initiative 1. New, flexible, and efficient building 2. Efficient supercomputers 2016 3. Efficient applications Upgrade to Phase II 2015 Cray XE6 K20X based hybrid 2014 47,200 coresphase I 2013 Development & Aries network & multi-core procurement of 2012 petaflop/s scale 2011 supercomputer(s) 2017 Pascal based hybrid Upgrade 2009 2010 Begin construction of new building New building complete 4

Platform for Advanced Scientific Computing Climate Materials simulations Life Sciences Structuring project of the Swiss University Conference (swissuniversities) 5 domain science networks > distributed application support >20 projects see: www.pasc-ch.org Physics Solid Earth Dynamics 1.ANSWERS 2.Angiogenesis 3.AV-FLOPW 4.CodeWave 5.Coupled Cardiac Simulations 6.DIAPHANE 7.Direct GPU to GPU com. 8.Electronic Structure Calc. 9.ENVIRON 10.Genomic Data Processing 11.GeoPC 12.GeoScale 13.Grid Tools 14.Heterogen. Compiler Platform 15.HPC-ABGEM 16.MD-based drug design 17.Multiscale applications 18.Multiscale economical data 19.Particles and fields 20.Snowball sampling 5

6

Leutwyler, D., O. Fuhrer, X. Lapillone, D. Lüthi, C. Schär, 2015: Continental-Scale Climate Simulation at Kilometer resolution. ETH Zurich Online Resource, DOI: http://dx.doi.org/10.3929/ethz-a-010483656, online video: http://vimeo.com/136588806 7

Meteo Swiss production suite until March 30, 2016 COSMO-7 3x per day 72h forecast 6.6 km lateral grid, 60 layers ECMWF 2x per day 16 km lateral grid, 91 layers COSMO-2 8x per day 24h forecast 2.2 km lateral grid, 60 layers Some of the products generate from these simulations: Daily weather forecast on TV / radio Forecasting for air traffic control (Sky Guide) Safety management in event of nuclear incidents 8

Albis & Lema : CSCS production systems for Meteo Swiss until March 2016 Cray XE6 procured in spring 2012 based on 12-core AMD Opteron multi-core processors 9

Improving simulation quality requires higher performance what exactly and by how much? Resource determining factors for Meteo Swiss simulations Current model running through spring 2016 New model starting operation on in spring 2016 COSMO-2: 24h forecast running in 30 min. 8x per day COSMO-1: 24h forecast running in 30 min. 8x per day (~10x COSMO-2) COSMO-2E: 21-member ensemble,120h forecast in 150 min., 2x per day (~26x COSMO-2) KENDA: 40-member ensemble,1h forecast in 15 min., 24x per day (~5x COSMO-2) New production system must deliver ~40x the simulations performance of Albis and Lema 10

State of the art implementation of new system for Meteo Swiss Albis & Lema: 3 cabinets Cray XE6 installed Q2/2012 New system needs to be installed Q2-3/2015 Assuming 2x improvement in per-socket performance: ~20x more X86 sockets would require 30 Cray XC cabinets New system for Meteo Swiss if we build it like the German Weather Service (DWD) did theirs, or UK Met Office, or ECMWF (30 racks XC) Current Cray XC30/XC40 platform (space for 40 racks XC) CSCS machine room Thinking inside the box is not a good option! 11

COSMO: old and new (refactored) code main (current / Fortran) main (new / Fortran) dynamics (C++) physics (Fortran) dynamics (Fortran) physics (Fortran) with OpenMP / OpenACC stencil library X86 GPU Shared Infrastructure boundary conditions & halo exchg. Generic Comm. Library MPI MPI or whatever system system Used by most weather services (incl. MeteoSwiss until 3/2016) as well as most HPC centres HP2C/PASC development in production on Piz Daint since 01/2014 and for Meteo Meteo Swiss since 04/206 12

Piz Kesch / Piz Escha: appliance for meteorology Water cooled rack (48U) 12 compute nodes with 2 Intel Xeon E5-2690v3 12 cores @ 2.6 GHz256 GB 2133 MHz DDR4 memory 8 NVIDIA Tesla K80 GPU 3 login nodes 5 post-processing nodes Mellanox FDR InfiniBand Cray CLFS Luster Storage Cray Programming Environment 13

Origin of factor 40 performance improvement Performance of COSMO running on new Piz Kesch compared to (in Sept. 2015) (1) previous production system Cray XE6 with AMD Barcelona (2) Piz Dora Cray XE40 with Intel Haswell (E5-2690v3) Current production system installed in 2012 New Piz Kesch/Escha installed in 2015 Processor performance Improved system utilisation General software performance Port to GPU architecture Increase in number of processors Total performance improvement Bonus: simulation running on GPU is 3x more energy efficient compared to conventional state of the art CPU 2.8x 2.8x 1.7x 2.3x 1.3x ~40x Moore s Law Software refactoring 14

A factor 40 improvement with the same footprint Current production system: Albis & Lema New system: Kesch & Escha 15

GPU - accelerated hybrid Xeon Phi (accelerated) 2017+ Summit Tsuname-3.0 2016 MeteoSwiss Aurora Multi-core post-k U. Tokyo 2015 Both architecture have heterogeneous memory! 2014 2013 2012 2011 DARPA HPCS 16