Exascale challenges for Numerical Weather Prediction : the ESCAPE project

Size: px
Start display at page:

Download "Exascale challenges for Numerical Weather Prediction : the ESCAPE project"

Transcription

1 Exascale challenges for Numerical Weather Prediction : the ESCAPE project O Olivier Marsden This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No

2 European Centre for Medium-Range Weather Forecasts Independent intergovernmental organisation established in 1975 with 19 Member States 15 Co-operating States 2

3 The success story of Numerical Weather Prediction: Hurricanes May be one of the best medium-range forecasts of all times! 3

4 NWP: Benefit of high-resolution Mean sea-level pressure AN 30 Oct 5d FC T3999 5d FC T1279 5d FC T639 Sandy 28 Oct 2012 Precipitation: NEXRAD 27 Oct 3d FC: Wave height Mean sea-level pressure 4d FC T639 4d FC T1279 4d FC T m wind speed 4

5 What is the challenge? Observations Models Today: Tomorrow: Volume 20 million = 2 x million grid points 100 levels 10 prognostic variables = 5 x 10 9 Type 98% from 60 different satellite instruments Observations physical parameters of atmosphere, waves, ocean Models Volume 200 million = 2 x million grid points 200 levels 100 prognostic variables = 1 x Type 98% from 80 different satellite instruments physical and chemical parameters of atmosphere, waves, ocean, ice, vegetation Factor 10 per day Factor 2000 per time step 5

6 Fraction of Operational Threshold AVEC forecast model intercomparison: 13 km km Case: Speed Normalized to Operational Threshold (8.5 mins per day) IFS NMM-UJ FV3, single precision NIM FV3, double precision MPAS NEPTUNE 13km Oper. Threshold 1 0 Number of Edison Cores (CRAY XC-30) [Michalakes et al. 2015: AVEC-Report: NGGPS level-1 benchmarks and software evaluation] 6

7 Fraction of Operational Threshold AVEC forecast model intercomparison: 3 km km Case: Speed Normalized to Operational Threshold (8.5 mins per day) IFS NMM-UJ FV3 single precision FV3 double precision NIM NIM, improved MPI comms MPAS NEPTUNE 3km Oper. Threshold Advanced Computing Evaluation Committee (AVEC) to evaluate HPC performance of five Next Generation Global Prediction System candidates to meet operational forecast requirements at the National Weather Service through Number of Edison Cores (CRAY XC-30) 7

8 Technology applied at ECMWF for the last 30 years A spectral transform, semi-lagrangian, semi-implicit (compressible) hydrostatic model How long can ECMWF continue to run such a model? IFS data assimilation and model must EACH run in under ONE HOUR for a 10 day global forecast 8

9 IFS today (MPI + OpenMP parallel) IFS = Integrated Forecasting System 9

10 Predicted 2.5 km model scaling on a XC-30 Operational requirement 2 MW 6 MW (for a single HRES forecast) two XC-30 clusters each with 85K cores ECMWF require system capacity for October 29, to 20 simultaneous HRES forecasts

11 Numerical methods Code Adaptation - Architecture ESCAPE*, Energy efficient SCalable Algorithms for weather Prediction at Exascale: Next generation IFS numerical building blocks and compute intensive algorithms Compute/energy efficiency diagnostics New approaches and implementation on novel architectures Testing in operational configurations *Funded by EC H2020 framework, Future and Emerging Technologies High-Performance Computing Partners: ECMWF, Météo-France, RMI, DMI, Meteo Swiss, DWD, U Loughborough, PSNC, ICHEC, Bull, NVIDIA, Optalysys

12 Schematic description of the spectral transform method in the ECMWF IFS model FFT Grid-point space -semi-lagrangian advection -physical parametrizations -products of terms Inverse FFT Fourier space Fourier space LT Spectral space -horizontal gradients -semi-implicit calculations -horizontal diffusion Inverse LT FFT: Fast Fourier Transform, LT: Legendre Transform 13

13 Schematic description of the spectral transform warf in ESCAPE Grid-point space d FFT Fourier space LT 100 iterations Time-stepping loop in dwarf1-atlas.f90 DO JSTEP=1,ITERS call trans%invtrans(spfields,gpfields) call trans%dirtrans(gpfields,spfields) ENDDO Inverse FFT Fourier space Inverse LT Spectral space FFT: Fast Fourier Transform, LT: Legendre Transform 14

14 GPU-related work on this dwarf Work carried out by George Mozdzynski, ECMWF An OpenACC port of a spectral transform test (transform_test.f90) Using 1D parallelisation over spectral waves Contrast with IFS which uses 2D parallelisation (waves, levels) About 30 routines ported, 280!$ACC directives Major focus on FFTs, using NVIDIA cufft library Legendre Transform uses DGEMM_ACC Fast Legendre Transform not ported (need working deep copy) CRAY provided access to SWAN (6 NVIDIA K20X GPUs) Latest 8.4 CRAY compilers Larger runs performed on TITAN Each node has 16 AMD Interlagos cores & 1 NVIDIA K20X GPU (6GB) CRESTA INCITE14 access Used CRAY compiler Compare performance of XK7/Titan node with XC-30 node (24 core Ivybridge) 15

15 msec per time-step 300 Tc km model Spectral Transform Compute Cost (40 nodes, 800 fields) XC TITAN LTINV_CTL LTDIR_CTL FTDIR_CTL FTINV_CTL 16

16 msec per time-step 700 Tc km model Spectral Transform Compute Cost (120 nodes, 800 fields) XC-30 TITAN LTINV_CTL LTDIR_CTL FTDIR_CTL FTINV_CTL 17

17 msec per time-step 1400 Tc km model Spectral Transform Compute Cost (400 nodes, 800 fields) XC-30 TITAN LTINV_CTL LTDIR_CTL FTDIR_CTL FTINV_CTL 18

18 Relative Performance Relative FFT performance NVIDIA K20X GPU (v2) v 24 core Ivybridge CRAY XC-30 node (FFT992) T95 T159 T399 T1023 T1279 T2047 T T K20X GPU performance up to 1.4 times faster than 24 Ivybridge core XC-30 node 22

19 Time 0.50 Comparison of FFT cost for LOT size GPU ver 1 GPU ver FFT FFTW FFT length (latitude points) 24

20 What about MPI communications? Cost very much greater than compute for Spectral Transform test Tc3999 example follows XC-30 (Aries) is faster than XK7/Titan (Gemini) So made prediction for XC-30 comms with K20X GPU Potential for compute / communications overlap GPU compute while MPI transfers are taking place Not done (yet) 25

21 Tc3999, 400 nodes, 800 fields (ms per time-step) Tc3999 XC-30 TITAN XC-30+GPU Prediction LTINV_CTL LTDIR_CTL FTDIR_CTL FTINV_CTL MTOL LTOM LTOG GTOL HOST2GPU** GPU2HOST** ** included in comms (red) times 26

22 Spectral transforms experience OpenACC not that difficult, but Replaced ~10 OpenMP directives (high-level parallelisation) By ~280 OpenACC directives (low-level parallelisation) Most of the porting time spent on Strategy for porting IFS FFT992 interface (algor/fourier) Replaced by calls to new cuda FFT993 interface Calling NVIDIA cufft library routines Coding versions of FTDIR and FTINV where FFT992 and FFT993 both ran on same data to compare results Writing several offline FFT tests to explore performance Performance issues Used nvprof, gstats 27

23 Physics dwarf : CloudSC Work done by Sami Saarinen, ECMWF Adaptation of IFS physics cloud scheme (CLOUDSC) to new architectures as part of ECMWF Scalability programme Emphasis was on GPU-migration by use of OpenACC directives CLOUDSC consumes about 10% of IFS Forecast time Some 3500 lines of Fortran2003 before OpenACC directives Focus on performance comparison between - OpenMP version of CLOUDSC on Haswell -OpenACC version of CLOUDSC on NVIDIA K40 28

24 Problem parameters: Given 160,000 grid point columns (NGPTOT) Each with 137 levels (NLEV) About 80,000 columns fit into one K40 GPU Grid point columns are independent of each other So no horizontal dependencies here, but level dependency prevents parallelization along vertical dim Arrays are organized in blocks of grid point columns Instead of using ARRAY(NGPTOT, NLEV) we use ARRAY(NPROMA, NLEV, NBLKS) NPROMA is a (runtime) fixed blocking factor Arrays are OpenMP thread safe over NBLKS 29

25 Details on hardware, compilers, NPROMA: Haswell-node : 2.5GHz 2 x NVIDIA K40c GPUs on each Haswell-node via PCIe Each GPU equipped with 12GB memory with CUDA 7.0 PGI Compiler 15.7 with OpenMP & OpenACC O4 fast mp=numa,allcores,bind Mfprelaxed tp haswell Mvect=simd:256 [ -acc ] Environment variables PGI_ACC_NOSHARED=1 PGI_ACC_BUFFERSIZE=4M Typical good NPROMA value for Haswell~ For GPUs NPROMA up to 80,000 for max performance 30

26 OpenMP loop around CLOUDSC call: REAL(kind=8) :: array(nproma, NLEV, NGPBLKS)!$OMP PARALLEL PRIVATE(JKGLO,IBL,ICEND)!$OMP DO SCHEDULE(DYNAMIC,1) DO JKGLO=1,NGPTOT,NPROMA! So called NPROMA-loop IBL=(JKGLO-1)/NPROMA+1! Current block number ICEND=MIN(NPROMA,NGPTOT-JKGLO+1)! Block length <= NPROMA CALL CLOUDSC( 1, ICEND, NPROMA, KLEV, & END DO & array(1,1,ibl), &! ~ 65 arrays like this ) Typical values for!$omp END DO NPROMA in OpenMP implementation:!$omp END PARALLEL

27 OpenMP scaling (Haswell, in GFlops) 32

28 Development of OpenACC/GPU-version The driver-code with OpenMP-loop kept roughly unchanged GPU to HOST data mapping (ACC DATA) added OpenACC can (in most cases) co-exist with OpenMP Allows an elegant multi-gpu implementation CLOUDSC was pre-processed with acc_insert Perl-script Allowed automatic creation of ACC KERNELS and ACC DATA PRESENT / CREATE clauses to CLOUDSC In addition some minimal manual source code clean-up CLOUDSC performance on GPU needs very large NPROMA Lack of multilevel parallelism (only across NPROMA, not NLEV) 33

29 Driving OpenACC CLOUDSC with OpenMP!$OMP PARALLEL PRIVATE(JKGLO,IBL,ICEND) &!$OMP& PRIVATE(tid, idgpu) num_threads(numgpus) tid = omp_get_thread_num()! OpenMP thread number idgpu = mod(tid, NumGPUs)! Effective GPU# for this thread CALL acc_set_device_num(idgpu, acc_get_device_type())!$omp DO SCHEDULE(STATIC) DO JKGLO=1,NGPTOT,NPROMA! NPROMA-loop IBL=(JKGLO-1)/NPROMA+1! Current block number ICEND=MIN(NPROMA,NGPTOT-JKGLO+1)! Block length <= NPROMA!$acc data copyout(array(:,:,ibl),...) &! ~22 : GPU to Host!$acc& copyin(array(:,:,ibl))! ~43 : Host to GPU CALL CLOUDSC (... array(1,1,ibl)...)! Runs on GPU#<idgpu>!$acc end data END DO!$OMP END DO!$OMP END PARALLEL Typical values for NPROMA in OpenACC implementation: > 10,000 34

30 Sample OpenACC coding of CLOUDSC!$ACC KERNELS LOOP COLLAPSE(2) PRIVATE(ZTMP_Q,ZTMP) DO JK=1,KLEV DO JL=KIDIA,KFDIA ztmp_q = 0.0_JPRB ztmp = 0.0_JPRB!$ACC LOOP PRIVATE(ZQADJ) REDUCTION(+:ZTMP_Q, +:ZTMP) DO JM=1,NCLV-1 IF (ZQX(JL,JK,JM)<RLMIN) THEN ZLNEG(JL,JK,JM) = ZLNEG(JL,JK,JM)+ZQX(JL,JK,JM) ZQADJ = ZQX(JL,JK,JM)*ZQTMST ztmp_q = ztmp_q + ZQADJ ztmp = ztmp + ZQX(JL,JK,JM) ZQX(JL,JK,JM) = 0.0_JPRB ENDIF ENDDO PSTATE_q_loc(JL,JK) = PSTATE_q_loc(JL,JK) + ztmp_q ZQX(JL,JK,NCLDQV) ENDDO ENDDO!$ACC END KERNELS ASYNC(IBL) = ZQX(JL,JK,NCLDQV) + ztmp ASYNC removes CUDA-thread syncs 35

31 OpenACC scaling (K40c, in GFlops) GPU 2 GPUs 4 2 NPROMA

32 Timing (ms) breakdown : single GPU Other overhead Communication Computation Haswell NPROMA 37

33 Saturating GPUs with more work More threads here!$omp PARALLEL PRIVATE(JKGLO,IBL,ICEND) &!$OMP& PRIVATE(tid, idgpu) num_threads(numgpus * 4) tid = omp_get_thread_num()! OpenMP thread number idgpu = mod(tid, NumGPUs)! Effective GPU# for this thread CALL acc_set_device_num(idgpu, acc_get_device_type())!$omp DO SCHEDULE(STATIC) DO JKGLO=1,NGPTOT,NPROMA! NPROMA-loop IBL=(JKGLO-1)/NPROMA+1! Current block number ICEND=MIN(NPROMA,NGPTOT-JKGLO+1)! Block length <= NPROMA!$acc data copyout(array(:,:,ibl),...) &! ~22 : GPU to Host!$acc& copyin(array(:,:,ibl))! ~43 : Host to GPU CALL CLOUDSC (... array(1,1,ibl)...)! Runs on GPU#<idgpu>!$acc end data END DO!$OMP END DO!$OMP END PARALLEL 38

34 Saturating GPUs with more work Consider few performance degradation facts at present Parallelism only in NPROMA dimension in CLOUDSC Updating 60-odd arrays back and forth every time step OpenACC overhead related to data transfers & ACC DATA Can we do better? YES! We can enable concurrently executed kernels through OpenMP! Time-sharing GPU(s) across multiple OpenMP-threads About 4 simultaneous OpenMP host threads can saturate a single GPU in our CLOUDSC case Extra care must be taken to avoid running out of memory on GPU Needs ~ 4X smaller NPROMA : 20,000 instead of 80,000 39

35 Multiple copies of CLOUDSC per GPU (GFlops) GPU 2 GPUs Copies

36 nvvp profiler shows time-sharing impact GPU is fed with work by one OpenMP thread only GPU is 4-way time-shared 41

37 Timing (ms) : 4-way time-shared vs. no T/S GPU is not time-shared GPU is 4-way time-shared Other overhead Communication Computation Haswell NPROMA

38 24-core Haswell 2.5GHz vs. K40c GPU(s) (GFlops) 18 T/S = GPUs timeshared Haswell 2 GPUs (T/S) 2 GPUs 1 GPU (T/S) 1 GPU

39 Conclusions CLOUDSC OpenACC prototype from 3Q/2014 was ported to ECMWF s tiny GPU cluster in 3Q/2015 Since last time PGI compiler has improved and OpenACC overheads have been greatly reduced (PGI 14.7 vs. 15.7) With CUDA 7.0 and concurrent kernels it seems time-sharing (oversubscribing) GPUs with more work pays off Saturation of GPUs can be achieved not surprisingly by help of multi-core host launching more data blocks onto GPUs The outcome is not bad considering we seem to be underutilizing the GPUs (parallelism just along NPROMA) 44

40 Thank You! This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No

Scalability Programme at ECMWF

Scalability Programme at ECMWF Scalability Programme at ECMWF Picture: Stan Tomov, ICL, University of Tennessee, Knoxville Peter Bauer, Mike Hawkins, George Mozdzynski, Tiago Quintino, Deborah Salmond, Stephan Siemen, Yannick Trémolet

More information

Improving ECMWF s IFS model by Nils Wedi

Improving ECMWF s IFS model by Nils Wedi Improving ECMWF s IFS model by Nils Wedi wedi@ecmwf.int Anna Agusti-Panareda, Gianpaolo Balsamo, Peter Bauer, Peter Bechtold, Willem Deconinck, Mikhail Diamantakis, Mats Hamrud, Christian Kuehnlein, Martin

More information

Advancing Weather Prediction at NOAA. 18 November 2015 Tom Henderson NOAA / ESRL / GSD

Advancing Weather Prediction at NOAA. 18 November 2015 Tom Henderson NOAA / ESRL / GSD Advancing Weather Prediction at NOAA 18 November 2015 Tom Henderson NOAA / ESRL / GSD The U. S. Needs Better Global Numerical Weather Prediction Hurricane Sandy October 28, 2012 A European forecast that

More information

ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS

ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS Alan Gray, Developer Technology Engineer, NVIDIA ECMWF 18th Workshop on high performance computing in meteorology, 28 th September 2018 ESCAPE NVIDIA s

More information

Paralleliza(on and Performance of the NIM Weather Model on CPU, GPU and MIC Architectures

Paralleliza(on and Performance of the NIM Weather Model on CPU, GPU and MIC Architectures Paralleliza(on and Performance of the NIM Weather Model on CPU, GPU and MIC Architectures Mark Gove? NOAA Earth System Research Laboratory We Need Be?er Numerical Weather Predic(on Superstorm Sandy Hurricane

More information

ECMWF Scalability Programme

ECMWF Scalability Programme ECMWF Scalability Programme Picture: Stan Tomov, ICL, University of Tennessee, Knoxville Peter Bauer, Mike Hawkins, Deborah Salmond, Stephan Siemen, Yannick Trémolet, and Nils Wedi Next generation science

More information

Progress in Numerical Methods at ECMWF

Progress in Numerical Methods at ECMWF Progress in Numerical Methods at ECMWF EWGLAM / SRNWP October 2016 W. Deconinck, G. Mengaldo, C. Kühnlein, P.K. Smolarkiewicz, N.P. Wedi, P. Bauer willem.deconinck@ecmwf.int ECMWF November 7, 2016 2 The

More information

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017 HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher

More information

An Overview of HPC at the Met Office

An Overview of HPC at the Met Office An Overview of HPC at the Met Office Paul Selwood Crown copyright 2006 Page 1 Introduction The Met Office National Weather Service for the UK Climate Prediction (Hadley Centre) Operational and Research

More information

First, a look at using OpenACC on WRF subroutine advance_w dynamics routine

First, a look at using OpenACC on WRF subroutine advance_w dynamics routine First, a look at using OpenACC on WRF subroutine advance_w dynamics routine Second, an estimate of WRF multi-node performance on Cray XK6 with GPU accelerators Based on performance of WRF kernels, what

More information

The spectral transform method

The spectral transform method The spectral transform method by Nils Wedi European Centre for Medium-Range Weather Forecasts wedi@ecmwf.int Advanced Numerical Methods for Earth-System Modelling Slide 1 Advanced Numerical Methods for

More information

ECMWF Computing & Forecasting System

ECMWF Computing & Forecasting System ECMWF Computing & Forecasting System icas 2015, Annecy, Sept 2015 Isabella Weger, Deputy Director of Computing ECMWF September 17, 2015 October 29, 2014 ATMOSPHERE MONITORING SERVICE CLIMATE CHANGE SERVICE

More information

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting Thomas C. Schulthess 1 Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Piz Daint Compute node:

More information

Performance Analysis of Lattice QCD Application with APGAS Programming Model

Performance Analysis of Lattice QCD Application with APGAS Programming Model Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models

More information

11 Parallel programming models

11 Parallel programming models 237 // Program Design 10.3 Assessing parallel programs 11 Parallel programming models Many different models for expressing parallelism in programming languages Actor model Erlang Scala Coordination languages

More information

Operational and research activities at ECMWF now and in the future

Operational and research activities at ECMWF now and in the future Operational and research activities at ECMWF now and in the future Sarah Keeley Education Officer Erland Källén Director of Research ECMWF An independent intergovernmental organisation established in 1975

More information

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville Performance of the fusion code GYRO on three four generations of Crays Mark Fahey mfahey@utk.edu University of Tennessee, Knoxville Contents Introduction GYRO Overview Benchmark Problem Test Platforms

More information

Julian Merten. GPU Computing and Alternative Architecture

Julian Merten. GPU Computing and Alternative Architecture Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg

More information

Improvement of MPAS on the Integration Speed and the Accuracy

Improvement of MPAS on the Integration Speed and the Accuracy ICAS2017 Annecy, France Improvement of MPAS on the Integration Speed and the Accuracy Wonsu Kim, Ji-Sun Kang, Jae Youp Kim, and Minsu Joh Disaster Management HPC Technology Research Center, Korea Institute

More information

Scalability Ini,a,ve at ECMWF

Scalability Ini,a,ve at ECMWF Scalability Ini,a,ve at ECMWF Picture: Stan Tomov, ICL, University of Tennessee, Knoxville Peter Bauer, Mike Hawkins, George Mozdzynski, Deborah Salmond, Stephan Siemen, Peter Towers, Yannick Trémolet,

More information

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia On the Paths to Exascale: Crossing the Chasm Presented by Mike Rezny, Monash University, Australia michael.rezny@monash.edu Crossing the Chasm meeting Reading, 24 th October 2016 Version 0.1 In collaboration

More information

Numerical Weather Prediction in 2040

Numerical Weather Prediction in 2040 Numerical Weather Prediction in 2040 10.8 µm GEO imagery (simulated!) Peter Bauer, ECMWF Acks.: N. Bormann, C. Cardinali, A. Geer, C. Kuehnlein, C. Lupu, T. McNally, S. English, N. Wedi will not discuss

More information

S8241 VERSIONING GPU- ACCLERATED WRF TO Jeff Adie, 26 March, 2018 (Presented by Stan Posey, NVIDIA)

S8241 VERSIONING GPU- ACCLERATED WRF TO Jeff Adie, 26 March, 2018 (Presented by Stan Posey, NVIDIA) S8241 VERSIONING GPU- ACCLERATED WRF TO 3.7.1 Jeff Adie, 26 March, 2018 (Presented by Stan Posey, NVIDIA) 1 ACKNOWLEDGEMENT The work presented here today would not have been possible without the efforts

More information

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013

More information

The Panel: What does the future look like for NPW application development? 17 th ECMWF Workshop on High Performance Computing in Meteorology

The Panel: What does the future look like for NPW application development? 17 th ECMWF Workshop on High Performance Computing in Meteorology The Panel: What does the future look like for NPW application development? 17 th ECMWF Workshop on High Performance Computing in Meteorology 16:00-17:30 27 October 2016 Panelists John Michalakes (UCAR,

More information

How to Prepare Weather and Climate Models for Future HPC Hardware

How to Prepare Weather and Climate Models for Future HPC Hardware How to Prepare Weather and Climate Models for Future HPC Hardware Peter Düben European Weather Centre (ECMWF) Peter Düben Page 2 The European Weather Centre (ECMWF) www.ecmwf.int Independent, intergovernmental

More information

Scaling the Software and Advancing the Science of Global Modeling and Assimilation Systems at NASA. Bill Putman

Scaling the Software and Advancing the Science of Global Modeling and Assimilation Systems at NASA. Bill Putman Global Modeling and Assimilation Office Scaling the Software and Advancing the Science of Global Modeling and Assimilation Systems at NASA Bill Putman Max Suarez, Lawrence Takacs, Atanas Trayanov and Hamid

More information

Future Improvements of Weather and Climate Prediction

Future Improvements of Weather and Climate Prediction Future Improvements of Weather and Climate Prediction Unidata Policy Committee October 21, 2010 Alexander E. MacDonald, Ph.D. Deputy Assistant Administrator for Labs and Cooperative Institutes & Director,

More information

A CUDA Solver for Helmholtz Equation

A CUDA Solver for Helmholtz Equation Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College

More information

From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system. Oliver Fuhrer and Thomas C. Schulthess

From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system. Oliver Fuhrer and Thomas C. Schulthess From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system Oliver Fuhrer and Thomas C. Schulthess 1 Piz Daint Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Compute node:

More information

The next-generation supercomputer and NWP system of the JMA

The next-generation supercomputer and NWP system of the JMA The next-generation supercomputer and NWP system of the JMA Masami NARITA m_narita@naps.kishou.go.jp Numerical Prediction Division (NPD), Japan Meteorological Agency (JMA) Purpose of supercomputer & NWP

More information

Performance Predictions for Storm-Resolving Simulations of the Climate System

Performance Predictions for Storm-Resolving Simulations of the Climate System Performance Predictions for Storm-Resolving Simulations of the Climate System Philipp Neumann, Joachim Biercamp, Niklas Röber Deutsches Klimarechenzentrum (DKRZ) Luis Kornblueh, Matthias Brück Max-Planck-Institut

More information

Recent advances in the GFDL Flexible Modeling System

Recent advances in the GFDL Flexible Modeling System Recent advances in the GFDL Flexible Modeling System 4th ENES HPC Workshop Toulouse, FRANCE V. Balaji and many others NOAA/GFDL and Princeton University 6 April 2016 V. Balaji (balaji@princeton.edu) GFDL

More information

Exascale I/O challenges for Numerical Weather Prediction

Exascale I/O challenges for Numerical Weather Prediction Exascale I/O challenges for Numerical Weather Prediction A view from ECMWF Tiago Quintino, B. Raoult, S. Smart, A. Bonanni, F. Rathgeber, P. Bauer, N. Wedi ECMWF tiago.quintino@ecmwf.int SuperComputing

More information

The coupled ocean atmosphere model at ECMWF: overview and technical challenges. Kristian S. Mogensen Marine Prediction Section ECMWF

The coupled ocean atmosphere model at ECMWF: overview and technical challenges. Kristian S. Mogensen Marine Prediction Section ECMWF The coupled ocean atmosphere model at ECMWF: overview and technical challenges Kristian S. Mogensen Marine Prediction Section ECMWF Slide 1 Overview of talk: Baseline: The focus of this talk is going to

More information

Introduction of a Stabilized Bi-Conjugate Gradient iterative solver for Helmholtz s Equation on the CMA GRAPES Global and Regional models.

Introduction of a Stabilized Bi-Conjugate Gradient iterative solver for Helmholtz s Equation on the CMA GRAPES Global and Regional models. Introduction of a Stabilized Bi-Conjugate Gradient iterative solver for Helmholtz s Equation on the CMA GRAPES Global and Regional models. Peng Hong Bo (IBM), Zaphiris Christidis (Lenovo) and Zhiyan Jin

More information

Swedish Meteorological and Hydrological Institute

Swedish Meteorological and Hydrological Institute Swedish Meteorological and Hydrological Institute Norrköping, Sweden 1. Summary of highlights HIRLAM at SMHI is run on a CRAY T3E with 272 PEs at the National Supercomputer Centre (NSC) organised together

More information

Reflecting on the Goal and Baseline of Exascale Computing

Reflecting on the Goal and Baseline of Exascale Computing Reflecting on the Goal and Baseline of Exascale Computing Thomas C. Schulthess!1 Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b!2 Tracking supercomputer performance over

More information

Parallel Multivariate SpatioTemporal Clustering of. Large Ecological Datasets on Hybrid Supercomputers

Parallel Multivariate SpatioTemporal Clustering of. Large Ecological Datasets on Hybrid Supercomputers Parallel Multivariate SpatioTemporal Clustering of Large Ecological Datasets on Hybrid Supercomputers Sarat Sreepathi1, Jitendra Kumar1, Richard T. Mills2, Forrest M. Hoffman1, Vamsi Sripathi3, William

More information

Acceleration of WRF on the GPU

Acceleration of WRF on the GPU Acceleration of WRF on the GPU Daniel Abdi, Sam Elliott, Iman Gohari Don Berchoff, Gene Pache, John Manobianco TempoQuest 1434 Spruce Street Boulder, CO 80302 720 726 9032 TempoQuest.com THE WORLD S FASTEST

More information

Improving weather prediction via advancing model initialization

Improving weather prediction via advancing model initialization Improving weather prediction via advancing model initialization Brian Etherton, with Christopher W. Harrop, Lidia Trailovic, and Mark W. Govett NOAA/ESRL/GSD 15 November 2016 The HPC group at NOAA/ESRL/GSD

More information

A Global Atmospheric Model. Joe Tribbia NCAR Turbulence Summer School July 2008

A Global Atmospheric Model. Joe Tribbia NCAR Turbulence Summer School July 2008 A Global Atmospheric Model Joe Tribbia NCAR Turbulence Summer School July 2008 Outline Broad overview of what is in a global climate/weather model of the atmosphere Spectral dynamical core Some results-climate

More information

Development of Yin-Yang Grid Global Model Using a New Dynamical Core ASUCA.

Development of Yin-Yang Grid Global Model Using a New Dynamical Core ASUCA. Development of Yin-Yang Grid Global Model Using a New Dynamical Core ASUCA. M. Sakamoto, J. Ishida, K. Kawano, K. Matsubayashi, K. Aranami, T. Hara, H. Kusabiraki, C. Muroi, Y. Kitamura Japan Meteorological

More information

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,

More information

Moving to a simpler NCEP production suite

Moving to a simpler NCEP production suite Moving to a simpler NCEP production suite Unified coupled global modeling Hendrik L. Tolman Director, Environmental Modeling Center NOAA / NWS / NCEP Hendrik.Tolman@NOAA.gov page 1 of 14 Content The suite

More information

Performance of WRF using UPC

Performance of WRF using UPC Performance of WRF using UPC Hee-Sik Kim and Jong-Gwan Do * Cray Korea ABSTRACT: The Weather Research and Forecasting (WRF) model is a next-generation mesoscale numerical weather prediction system. We

More information

Deutscher Wetterdienst

Deutscher Wetterdienst Deutscher Wetterdienst NUMEX Numerical Experiments and NWP-development at DWD 14th Workshop on Meteorological Operational Systems ECMWF 18-20 November 2013 Thomas Hanisch GB Forschung und Entwicklung (FE)

More information

S3D Direct Numerical Simulation: Preparation for the PF Era

S3D Direct Numerical Simulation: Preparation for the PF Era S3D Direct Numerical Simulation: Preparation for the 10 100 PF Era Ray W. Grout, Scientific Computing SC 12 Ramanan Sankaran ORNL John Levesque Cray Cliff Woolley, Stan Posey nvidia J.H. Chen SNL NREL

More information

SPECIAL PROJECT PROGRESS REPORT

SPECIAL PROJECT PROGRESS REPORT SPECIAL PROJECT PROGRESS REPORT Progress Reports should be 2 to 10 pages in length, depending on importance of the project. All the following mandatory information needs to be provided. Reporting year

More information

Tuning And Understanding MILC Performance In Cray XK6 GPU Clusters. Mike Showerman, Guochun Shi Steven Gottlieb

Tuning And Understanding MILC Performance In Cray XK6 GPU Clusters. Mike Showerman, Guochun Shi Steven Gottlieb Tuning And Understanding MILC Performance In Cray XK6 GPU Clusters Mike Showerman, Guochun Shi Steven Gottlieb Outline Background Lattice QCD and MILC GPU and Cray XK6 node architecture Implementation

More information

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing

More information

ESiWACE. A Center of Excellence for HPC applications to support cloud resolving earth system modelling

ESiWACE. A Center of Excellence for HPC applications to support cloud resolving earth system modelling ESiWACE A Center of Excellence for HPC applications to support cloud resolving earth system modelling Joachim Biercamp, Panagiotis Adamidis, Philipp Neumann Deutsches Klimarechenzentrum (DKRZ) Motivation:

More information

Scalable and Power-Efficient Data Mining Kernels

Scalable and Power-Efficient Data Mining Kernels Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

Improving Dynamical Core Scalability, Accuracy, and Limi:ng Flexibility with the ADER- DT Time Discre:za:on

Improving Dynamical Core Scalability, Accuracy, and Limi:ng Flexibility with the ADER- DT Time Discre:za:on Improving Dynamical Core Scalability, Accuracy, and Limi:ng Flexibility with the ADER- DT Time Discre:za:on Matthew R. Norman Scientific Computing Group National Center for Computational Sciences Oak Ridge

More information

GPU Acceleration of Weather Forecasting and Meteorological Satellite Data Assimilation, Processing and Applications http://www.tempoquest.com Allen Huang, Ph.D. allen@tempoquest.com CTO, Tempo Quest Inc.

More information

Perm State University Research-Education Center Parallel and Distributed Computing

Perm State University Research-Education Center Parallel and Distributed Computing Perm State University Research-Education Center Parallel and Distributed Computing A 25-minute Talk (S4493) at the GPU Technology Conference (GTC) 2014 MARCH 24-27, 2014 SAN JOSE, CA GPU-accelerated modeling

More information

WRF Modeling System Overview

WRF Modeling System Overview WRF Modeling System Overview Jimy Dudhia What is WRF? WRF: Weather Research and Forecasting Model Used for both research and operational forecasting It is a supported community model, i.e. a free and shared

More information

arxiv: v1 [hep-lat] 10 Jul 2012

arxiv: v1 [hep-lat] 10 Jul 2012 Hybrid Monte Carlo with Wilson Dirac operator on the Fermi GPU Abhijit Chakrabarty Electra Design Automation, SDF Building, SaltLake Sec-V, Kolkata - 700091. Pushan Majumdar Dept. of Theoretical Physics,

More information

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)

More information

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Katharina Kormann 1 Klaus Reuter 2 Markus Rampp 2 Eric Sonnendrücker 1 1 Max Planck Institut für Plasmaphysik 2 Max Planck Computing

More information

Mesoscale meteorological models. Claire L. Vincent, Caroline Draxl and Joakim R. Nielsen

Mesoscale meteorological models. Claire L. Vincent, Caroline Draxl and Joakim R. Nielsen Mesoscale meteorological models Claire L. Vincent, Caroline Draxl and Joakim R. Nielsen Outline Mesoscale and synoptic scale meteorology Meteorological models Dynamics Parametrizations and interactions

More information

WRF Modeling System Overview

WRF Modeling System Overview WRF Modeling System Overview Jimy Dudhia What is WRF? WRF: Weather Research and Forecasting Model Used for both research and operational forecasting It is a supported community model, i.e. a free and shared

More information

arxiv: v1 [hep-lat] 7 Oct 2010

arxiv: v1 [hep-lat] 7 Oct 2010 arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA

More information

ECMWF Forecasting System Research and Development

ECMWF Forecasting System Research and Development ECMWF Forecasting System Research and Development Jean-Noël Thépaut ECMWF October 2012 Slide 1 and many colleagues from the Research Department Slide 1, ECMWF The ECMWF Integrated Forecasting System (IFS)

More information

SPECIAL PROJECT PROGRESS REPORT

SPECIAL PROJECT PROGRESS REPORT SPECIAL PROJECT PROGRESS REPORT Progress Reports should be 2 to 10 pages in length, depending on importance of the project. All the following mandatory information needs to be provided. Reporting year

More information

THE WEATHER RESEARCH AND FORECAST MODEL VERSION 2.0

THE WEATHER RESEARCH AND FORECAST MODEL VERSION 2.0 THE WEATHER RESEARCH AND FORECAST MODEL VERSION 2.0 J. MICHALAKES, J. DUDHIA, D. GILL J. KLEMP, W. SKAMAROCK, W. WANG Mesoscale and Microscale Meteorology National Center for Atmospheric Research Boulder,

More information

Targeting Extreme Scale Computational Challenges with Heterogeneous Systems

Targeting Extreme Scale Computational Challenges with Heterogeneous Systems Targeting Extreme Scale Computational Challenges with Heterogeneous Systems Oreste Villa, Antonino Tumeo Pacific Northwest Na/onal Laboratory (PNNL) 1 Introduction! PNNL Laboratory Directed Research &

More information

Building Ensemble-Based Data Assimilation Systems. for High-Dimensional Models

Building Ensemble-Based Data Assimilation Systems. for High-Dimensional Models 47th International Liège Colloquium, Liège, Belgium, 4 8 May 2015 Building Ensemble-Based Data Assimilation Systems for High-Dimensional s Lars Nerger, Paul Kirchgessner Alfred Wegener Institute for Polar

More information

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012 Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

Panorama des modèles et outils de programmation parallèle

Panorama des modèles et outils de programmation parallèle Panorama des modèles et outils de programmation parallèle Sylvain HENRY sylvain.henry@inria.fr University of Bordeaux - LaBRI - Inria - ENSEIRB April 19th, 2013 1/45 Outline Introduction Accelerators &

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

PRODUCT USER MANUAL For GLOBAL Ocean Waves Analysis and Forecasting Product GLOBAL_ANALYSIS_FORECAST_WAV_001_027

PRODUCT USER MANUAL For GLOBAL Ocean Waves Analysis and Forecasting Product GLOBAL_ANALYSIS_FORECAST_WAV_001_027 PRODUCT USER MANUAL For GLOBAL Ocean Waves Analysis and GLOBAL_ANALYSIS_FORECAST_WAV_001_027 Contributors: E. Fernandez, L. Aouf CMEMS version scope : 4 Approval Date by CMEMS products team : 22/03/2018

More information

JOINT WMO TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA PROCESSING AND FORECASTING SYSTEM AND NUMERICAL WEATHER PREDICTION RESEARCH ACTIVITIES FOR 2007

JOINT WMO TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA PROCESSING AND FORECASTING SYSTEM AND NUMERICAL WEATHER PREDICTION RESEARCH ACTIVITIES FOR 2007 JOINT WMO TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA PROCESSING AND FORECASTING SYSTEM AND NUMERICAL WEATHER PREDICTION RESEARCH ACTIVITIES FOR 2007 [TURKEY/Turkish State Meteorological Service] 1. Summary

More information

High-performance processing and development with Madagascar. July 24, 2010 Madagascar development team

High-performance processing and development with Madagascar. July 24, 2010 Madagascar development team High-performance processing and development with Madagascar July 24, 2010 Madagascar development team Outline 1 HPC terminology and frameworks 2 Utilizing data parallelism 3 HPC development with Madagascar

More information

ECMWF Overview. The European Centre for Medium-Range Weather Forecasts is an international. organisation supported by 23 European States.

ECMWF Overview. The European Centre for Medium-Range Weather Forecasts is an international. organisation supported by 23 European States. ECMWF Overview The European Centre for Medium-Range Weather Forecasts is an international organisation supported by 3 European States. The center was established in 1973 by a Convention and the real-time

More information

EUMETSAT SAF NETWORK. Lothar Schüller, EUMETSAT SAF Network Manager

EUMETSAT SAF NETWORK. Lothar Schüller, EUMETSAT SAF Network Manager 1 EUMETSAT SAF NETWORK Lothar Schüller, EUMETSAT SAF Network Manager EUMETSAT ground segment overview METEOSAT JASON-2 INITIAL JOINT POLAR SYSTEM METOP NOAA SATELLITES CONTROL AND DATA ACQUISITION FLIGHT

More information

Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6

Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6 Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6 Sangwon Joo, Yoonjae Kim, Hyuncheol Shin, Eunhee Lee, Eunjung Kim (Korea Meteorological Administration) Tae-Hun

More information

Lightweight Superscalar Task Execution in Distributed Memory

Lightweight Superscalar Task Execution in Distributed Memory Lightweight Superscalar Task Execution in Distributed Memory Asim YarKhan 1 and Jack Dongarra 1,2,3 1 Innovative Computing Lab, University of Tennessee, Knoxville, TN 2 Oak Ridge National Lab, Oak Ridge,

More information

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015 Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done

More information

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

Performance Evaluation of MPI on Weather and Hydrological Models

Performance Evaluation of MPI on Weather and Hydrological Models NCAR/RAL Performance Evaluation of MPI on Weather and Hydrological Models Alessandro Fanfarillo elfanfa@ucar.edu August 8th 2018 Cheyenne - NCAR Supercomputer Cheyenne is a 5.34-petaflops, high-performance

More information

Supercomputer Programme

Supercomputer Programme Supercomputer Programme A seven-year programme to enhance the computational and numerical prediction capabilities of the Bureau s forecast and warning services. Tim Pugh, Lesley Seebeck, Tennessee Leeuwenburg,

More information

CEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication

CEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication 1 / 26 CEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole

More information

WRF Modeling System Overview

WRF Modeling System Overview WRF Modeling System Overview Wei Wang & Jimy Dudhia Nansha, Guangdong, China December 2015 What is WRF? WRF: Weather Research and Forecasting Model Used for both research and operational forecasting It

More information

Parallelization of the Dirac operator. Pushan Majumdar. Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata

Parallelization of the Dirac operator. Pushan Majumdar. Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata Parallelization of the Dirac operator Pushan Majumdar Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata Outline Introduction Algorithms Parallelization Comparison of performances Conclusions

More information

History of the partnership between SMHI and NSC. Per Undén

History of the partnership between SMHI and NSC. Per Undén History of the partnership between SMHI and NSC Per Undén Outline Pre-history and NWP Preparations parallelisation HPD Council Decision and early developments Climate modelling Other applications HPD Project

More information

ERLANGEN REGIONAL COMPUTING CENTER

ERLANGEN REGIONAL COMPUTING CENTER ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,

More information

WRF Modeling System Overview

WRF Modeling System Overview WRF Modeling System Overview Jimy Dudhia What is WRF? WRF: Weather Research and Forecasting Model Used for both research and operational forecasting It is a supported community model, i.e. a free and shared

More information

Empowering Scientists with Domain Specific Languages

Empowering Scientists with Domain Specific Languages Empowering Scientists with Domain Specific Languages Julian Kunkel, Nabeeh Jum ah Scientific Computing Department of Informatics University of Hamburg SciCADE2017 2017-09-13 Outline 1 Developing Scientific

More information

MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors

MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors J. Dongarra, M. Gates, A. Haidar, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov University of Tennessee, Knoxville 05 / 03 / 2013 MAGMA:

More information

On the Paths to Exascale: Will We be Hungry?

On the Paths to Exascale: Will We be Hungry? On the Paths to Exascale: Will We be Hungry? Presentation by Mike Rezny, Monash University, Australia michael.rezny@monash.edu 4th ENES Workshop High Performance Computing for Climate and Weather Toulouse,

More information

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal

More information

A framework for detailed multiphase cloud modeling on HPC systems

A framework for detailed multiphase cloud modeling on HPC systems Center for Information Services and High Performance Computing (ZIH) A framework for detailed multiphase cloud modeling on HPC systems ParCo 2009, 3. September 2009, ENS Lyon, France Matthias Lieber a,

More information

Application and verification of ECMWF products 2016

Application and verification of ECMWF products 2016 Application and verification of ECMWF products 2016 RHMS of Serbia 1 Summary of major highlights ECMWF forecast products became the backbone in operational work during last several years. Starting from

More information

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures José I. Aliaga Performance and Energy Analysis of the Iterative Solution of Sparse

More information

Figure 1 - Resources trade-off. Image of Jim Kinter (COLA)

Figure 1 - Resources trade-off. Image of Jim Kinter (COLA) CLIMATE CHANGE RESEARCH AT THE EXASCALE Giovanni Aloisio *,, Italo Epicoco *,, Silvia Mocavero and Mark Taylor^ (*) University of Salento, Lecce, Italy ( ) Euro-Mediterranean Centre for Climate Change

More information

N-body Simulations. On GPU Clusters

N-body Simulations. On GPU Clusters N-body Simulations On GPU Clusters Laxmikant Kale Filippo Gioachin Pritish Jetley Thomas Quinn Celso Mendes Graeme Lufkin Amit Sharma Joachim Stadel Lukasz Wesolowski James Wadsley Edgar Solomonik Fabio

More information

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance

More information

How to shape future met-services: a seamless perspective

How to shape future met-services: a seamless perspective How to shape future met-services: a seamless perspective Paolo Ruti, Chief World Weather Research Division Sarah Jones, Chair Scientific Steering Committee Improving the skill big resources ECMWF s forecast

More information