Exascale challenges for Numerical Weather Prediction : the ESCAPE project
|
|
- Amice Henry
- 5 years ago
- Views:
Transcription
1 Exascale challenges for Numerical Weather Prediction : the ESCAPE project O Olivier Marsden This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No
2 European Centre for Medium-Range Weather Forecasts Independent intergovernmental organisation established in 1975 with 19 Member States 15 Co-operating States 2
3 The success story of Numerical Weather Prediction: Hurricanes May be one of the best medium-range forecasts of all times! 3
4 NWP: Benefit of high-resolution Mean sea-level pressure AN 30 Oct 5d FC T3999 5d FC T1279 5d FC T639 Sandy 28 Oct 2012 Precipitation: NEXRAD 27 Oct 3d FC: Wave height Mean sea-level pressure 4d FC T639 4d FC T1279 4d FC T m wind speed 4
5 What is the challenge? Observations Models Today: Tomorrow: Volume 20 million = 2 x million grid points 100 levels 10 prognostic variables = 5 x 10 9 Type 98% from 60 different satellite instruments Observations physical parameters of atmosphere, waves, ocean Models Volume 200 million = 2 x million grid points 200 levels 100 prognostic variables = 1 x Type 98% from 80 different satellite instruments physical and chemical parameters of atmosphere, waves, ocean, ice, vegetation Factor 10 per day Factor 2000 per time step 5
6 Fraction of Operational Threshold AVEC forecast model intercomparison: 13 km km Case: Speed Normalized to Operational Threshold (8.5 mins per day) IFS NMM-UJ FV3, single precision NIM FV3, double precision MPAS NEPTUNE 13km Oper. Threshold 1 0 Number of Edison Cores (CRAY XC-30) [Michalakes et al. 2015: AVEC-Report: NGGPS level-1 benchmarks and software evaluation] 6
7 Fraction of Operational Threshold AVEC forecast model intercomparison: 3 km km Case: Speed Normalized to Operational Threshold (8.5 mins per day) IFS NMM-UJ FV3 single precision FV3 double precision NIM NIM, improved MPI comms MPAS NEPTUNE 3km Oper. Threshold Advanced Computing Evaluation Committee (AVEC) to evaluate HPC performance of five Next Generation Global Prediction System candidates to meet operational forecast requirements at the National Weather Service through Number of Edison Cores (CRAY XC-30) 7
8 Technology applied at ECMWF for the last 30 years A spectral transform, semi-lagrangian, semi-implicit (compressible) hydrostatic model How long can ECMWF continue to run such a model? IFS data assimilation and model must EACH run in under ONE HOUR for a 10 day global forecast 8
9 IFS today (MPI + OpenMP parallel) IFS = Integrated Forecasting System 9
10 Predicted 2.5 km model scaling on a XC-30 Operational requirement 2 MW 6 MW (for a single HRES forecast) two XC-30 clusters each with 85K cores ECMWF require system capacity for October 29, to 20 simultaneous HRES forecasts
11 Numerical methods Code Adaptation - Architecture ESCAPE*, Energy efficient SCalable Algorithms for weather Prediction at Exascale: Next generation IFS numerical building blocks and compute intensive algorithms Compute/energy efficiency diagnostics New approaches and implementation on novel architectures Testing in operational configurations *Funded by EC H2020 framework, Future and Emerging Technologies High-Performance Computing Partners: ECMWF, Météo-France, RMI, DMI, Meteo Swiss, DWD, U Loughborough, PSNC, ICHEC, Bull, NVIDIA, Optalysys
12 Schematic description of the spectral transform method in the ECMWF IFS model FFT Grid-point space -semi-lagrangian advection -physical parametrizations -products of terms Inverse FFT Fourier space Fourier space LT Spectral space -horizontal gradients -semi-implicit calculations -horizontal diffusion Inverse LT FFT: Fast Fourier Transform, LT: Legendre Transform 13
13 Schematic description of the spectral transform warf in ESCAPE Grid-point space d FFT Fourier space LT 100 iterations Time-stepping loop in dwarf1-atlas.f90 DO JSTEP=1,ITERS call trans%invtrans(spfields,gpfields) call trans%dirtrans(gpfields,spfields) ENDDO Inverse FFT Fourier space Inverse LT Spectral space FFT: Fast Fourier Transform, LT: Legendre Transform 14
14 GPU-related work on this dwarf Work carried out by George Mozdzynski, ECMWF An OpenACC port of a spectral transform test (transform_test.f90) Using 1D parallelisation over spectral waves Contrast with IFS which uses 2D parallelisation (waves, levels) About 30 routines ported, 280!$ACC directives Major focus on FFTs, using NVIDIA cufft library Legendre Transform uses DGEMM_ACC Fast Legendre Transform not ported (need working deep copy) CRAY provided access to SWAN (6 NVIDIA K20X GPUs) Latest 8.4 CRAY compilers Larger runs performed on TITAN Each node has 16 AMD Interlagos cores & 1 NVIDIA K20X GPU (6GB) CRESTA INCITE14 access Used CRAY compiler Compare performance of XK7/Titan node with XC-30 node (24 core Ivybridge) 15
15 msec per time-step 300 Tc km model Spectral Transform Compute Cost (40 nodes, 800 fields) XC TITAN LTINV_CTL LTDIR_CTL FTDIR_CTL FTINV_CTL 16
16 msec per time-step 700 Tc km model Spectral Transform Compute Cost (120 nodes, 800 fields) XC-30 TITAN LTINV_CTL LTDIR_CTL FTDIR_CTL FTINV_CTL 17
17 msec per time-step 1400 Tc km model Spectral Transform Compute Cost (400 nodes, 800 fields) XC-30 TITAN LTINV_CTL LTDIR_CTL FTDIR_CTL FTINV_CTL 18
18 Relative Performance Relative FFT performance NVIDIA K20X GPU (v2) v 24 core Ivybridge CRAY XC-30 node (FFT992) T95 T159 T399 T1023 T1279 T2047 T T K20X GPU performance up to 1.4 times faster than 24 Ivybridge core XC-30 node 22
19 Time 0.50 Comparison of FFT cost for LOT size GPU ver 1 GPU ver FFT FFTW FFT length (latitude points) 24
20 What about MPI communications? Cost very much greater than compute for Spectral Transform test Tc3999 example follows XC-30 (Aries) is faster than XK7/Titan (Gemini) So made prediction for XC-30 comms with K20X GPU Potential for compute / communications overlap GPU compute while MPI transfers are taking place Not done (yet) 25
21 Tc3999, 400 nodes, 800 fields (ms per time-step) Tc3999 XC-30 TITAN XC-30+GPU Prediction LTINV_CTL LTDIR_CTL FTDIR_CTL FTINV_CTL MTOL LTOM LTOG GTOL HOST2GPU** GPU2HOST** ** included in comms (red) times 26
22 Spectral transforms experience OpenACC not that difficult, but Replaced ~10 OpenMP directives (high-level parallelisation) By ~280 OpenACC directives (low-level parallelisation) Most of the porting time spent on Strategy for porting IFS FFT992 interface (algor/fourier) Replaced by calls to new cuda FFT993 interface Calling NVIDIA cufft library routines Coding versions of FTDIR and FTINV where FFT992 and FFT993 both ran on same data to compare results Writing several offline FFT tests to explore performance Performance issues Used nvprof, gstats 27
23 Physics dwarf : CloudSC Work done by Sami Saarinen, ECMWF Adaptation of IFS physics cloud scheme (CLOUDSC) to new architectures as part of ECMWF Scalability programme Emphasis was on GPU-migration by use of OpenACC directives CLOUDSC consumes about 10% of IFS Forecast time Some 3500 lines of Fortran2003 before OpenACC directives Focus on performance comparison between - OpenMP version of CLOUDSC on Haswell -OpenACC version of CLOUDSC on NVIDIA K40 28
24 Problem parameters: Given 160,000 grid point columns (NGPTOT) Each with 137 levels (NLEV) About 80,000 columns fit into one K40 GPU Grid point columns are independent of each other So no horizontal dependencies here, but level dependency prevents parallelization along vertical dim Arrays are organized in blocks of grid point columns Instead of using ARRAY(NGPTOT, NLEV) we use ARRAY(NPROMA, NLEV, NBLKS) NPROMA is a (runtime) fixed blocking factor Arrays are OpenMP thread safe over NBLKS 29
25 Details on hardware, compilers, NPROMA: Haswell-node : 2.5GHz 2 x NVIDIA K40c GPUs on each Haswell-node via PCIe Each GPU equipped with 12GB memory with CUDA 7.0 PGI Compiler 15.7 with OpenMP & OpenACC O4 fast mp=numa,allcores,bind Mfprelaxed tp haswell Mvect=simd:256 [ -acc ] Environment variables PGI_ACC_NOSHARED=1 PGI_ACC_BUFFERSIZE=4M Typical good NPROMA value for Haswell~ For GPUs NPROMA up to 80,000 for max performance 30
26 OpenMP loop around CLOUDSC call: REAL(kind=8) :: array(nproma, NLEV, NGPBLKS)!$OMP PARALLEL PRIVATE(JKGLO,IBL,ICEND)!$OMP DO SCHEDULE(DYNAMIC,1) DO JKGLO=1,NGPTOT,NPROMA! So called NPROMA-loop IBL=(JKGLO-1)/NPROMA+1! Current block number ICEND=MIN(NPROMA,NGPTOT-JKGLO+1)! Block length <= NPROMA CALL CLOUDSC( 1, ICEND, NPROMA, KLEV, & END DO & array(1,1,ibl), &! ~ 65 arrays like this ) Typical values for!$omp END DO NPROMA in OpenMP implementation:!$omp END PARALLEL
27 OpenMP scaling (Haswell, in GFlops) 32
28 Development of OpenACC/GPU-version The driver-code with OpenMP-loop kept roughly unchanged GPU to HOST data mapping (ACC DATA) added OpenACC can (in most cases) co-exist with OpenMP Allows an elegant multi-gpu implementation CLOUDSC was pre-processed with acc_insert Perl-script Allowed automatic creation of ACC KERNELS and ACC DATA PRESENT / CREATE clauses to CLOUDSC In addition some minimal manual source code clean-up CLOUDSC performance on GPU needs very large NPROMA Lack of multilevel parallelism (only across NPROMA, not NLEV) 33
29 Driving OpenACC CLOUDSC with OpenMP!$OMP PARALLEL PRIVATE(JKGLO,IBL,ICEND) &!$OMP& PRIVATE(tid, idgpu) num_threads(numgpus) tid = omp_get_thread_num()! OpenMP thread number idgpu = mod(tid, NumGPUs)! Effective GPU# for this thread CALL acc_set_device_num(idgpu, acc_get_device_type())!$omp DO SCHEDULE(STATIC) DO JKGLO=1,NGPTOT,NPROMA! NPROMA-loop IBL=(JKGLO-1)/NPROMA+1! Current block number ICEND=MIN(NPROMA,NGPTOT-JKGLO+1)! Block length <= NPROMA!$acc data copyout(array(:,:,ibl),...) &! ~22 : GPU to Host!$acc& copyin(array(:,:,ibl))! ~43 : Host to GPU CALL CLOUDSC (... array(1,1,ibl)...)! Runs on GPU#<idgpu>!$acc end data END DO!$OMP END DO!$OMP END PARALLEL Typical values for NPROMA in OpenACC implementation: > 10,000 34
30 Sample OpenACC coding of CLOUDSC!$ACC KERNELS LOOP COLLAPSE(2) PRIVATE(ZTMP_Q,ZTMP) DO JK=1,KLEV DO JL=KIDIA,KFDIA ztmp_q = 0.0_JPRB ztmp = 0.0_JPRB!$ACC LOOP PRIVATE(ZQADJ) REDUCTION(+:ZTMP_Q, +:ZTMP) DO JM=1,NCLV-1 IF (ZQX(JL,JK,JM)<RLMIN) THEN ZLNEG(JL,JK,JM) = ZLNEG(JL,JK,JM)+ZQX(JL,JK,JM) ZQADJ = ZQX(JL,JK,JM)*ZQTMST ztmp_q = ztmp_q + ZQADJ ztmp = ztmp + ZQX(JL,JK,JM) ZQX(JL,JK,JM) = 0.0_JPRB ENDIF ENDDO PSTATE_q_loc(JL,JK) = PSTATE_q_loc(JL,JK) + ztmp_q ZQX(JL,JK,NCLDQV) ENDDO ENDDO!$ACC END KERNELS ASYNC(IBL) = ZQX(JL,JK,NCLDQV) + ztmp ASYNC removes CUDA-thread syncs 35
31 OpenACC scaling (K40c, in GFlops) GPU 2 GPUs 4 2 NPROMA
32 Timing (ms) breakdown : single GPU Other overhead Communication Computation Haswell NPROMA 37
33 Saturating GPUs with more work More threads here!$omp PARALLEL PRIVATE(JKGLO,IBL,ICEND) &!$OMP& PRIVATE(tid, idgpu) num_threads(numgpus * 4) tid = omp_get_thread_num()! OpenMP thread number idgpu = mod(tid, NumGPUs)! Effective GPU# for this thread CALL acc_set_device_num(idgpu, acc_get_device_type())!$omp DO SCHEDULE(STATIC) DO JKGLO=1,NGPTOT,NPROMA! NPROMA-loop IBL=(JKGLO-1)/NPROMA+1! Current block number ICEND=MIN(NPROMA,NGPTOT-JKGLO+1)! Block length <= NPROMA!$acc data copyout(array(:,:,ibl),...) &! ~22 : GPU to Host!$acc& copyin(array(:,:,ibl))! ~43 : Host to GPU CALL CLOUDSC (... array(1,1,ibl)...)! Runs on GPU#<idgpu>!$acc end data END DO!$OMP END DO!$OMP END PARALLEL 38
34 Saturating GPUs with more work Consider few performance degradation facts at present Parallelism only in NPROMA dimension in CLOUDSC Updating 60-odd arrays back and forth every time step OpenACC overhead related to data transfers & ACC DATA Can we do better? YES! We can enable concurrently executed kernels through OpenMP! Time-sharing GPU(s) across multiple OpenMP-threads About 4 simultaneous OpenMP host threads can saturate a single GPU in our CLOUDSC case Extra care must be taken to avoid running out of memory on GPU Needs ~ 4X smaller NPROMA : 20,000 instead of 80,000 39
35 Multiple copies of CLOUDSC per GPU (GFlops) GPU 2 GPUs Copies
36 nvvp profiler shows time-sharing impact GPU is fed with work by one OpenMP thread only GPU is 4-way time-shared 41
37 Timing (ms) : 4-way time-shared vs. no T/S GPU is not time-shared GPU is 4-way time-shared Other overhead Communication Computation Haswell NPROMA
38 24-core Haswell 2.5GHz vs. K40c GPU(s) (GFlops) 18 T/S = GPUs timeshared Haswell 2 GPUs (T/S) 2 GPUs 1 GPU (T/S) 1 GPU
39 Conclusions CLOUDSC OpenACC prototype from 3Q/2014 was ported to ECMWF s tiny GPU cluster in 3Q/2015 Since last time PGI compiler has improved and OpenACC overheads have been greatly reduced (PGI 14.7 vs. 15.7) With CUDA 7.0 and concurrent kernels it seems time-sharing (oversubscribing) GPUs with more work pays off Saturation of GPUs can be achieved not surprisingly by help of multi-core host launching more data blocks onto GPUs The outcome is not bad considering we seem to be underutilizing the GPUs (parallelism just along NPROMA) 44
40 Thank You! This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No
Scalability Programme at ECMWF
Scalability Programme at ECMWF Picture: Stan Tomov, ICL, University of Tennessee, Knoxville Peter Bauer, Mike Hawkins, George Mozdzynski, Tiago Quintino, Deborah Salmond, Stephan Siemen, Yannick Trémolet
More informationImproving ECMWF s IFS model by Nils Wedi
Improving ECMWF s IFS model by Nils Wedi wedi@ecmwf.int Anna Agusti-Panareda, Gianpaolo Balsamo, Peter Bauer, Peter Bechtold, Willem Deconinck, Mikhail Diamantakis, Mats Hamrud, Christian Kuehnlein, Martin
More informationAdvancing Weather Prediction at NOAA. 18 November 2015 Tom Henderson NOAA / ESRL / GSD
Advancing Weather Prediction at NOAA 18 November 2015 Tom Henderson NOAA / ESRL / GSD The U. S. Needs Better Global Numerical Weather Prediction Hurricane Sandy October 28, 2012 A European forecast that
More informationACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS
ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS Alan Gray, Developer Technology Engineer, NVIDIA ECMWF 18th Workshop on high performance computing in meteorology, 28 th September 2018 ESCAPE NVIDIA s
More informationParalleliza(on and Performance of the NIM Weather Model on CPU, GPU and MIC Architectures
Paralleliza(on and Performance of the NIM Weather Model on CPU, GPU and MIC Architectures Mark Gove? NOAA Earth System Research Laboratory We Need Be?er Numerical Weather Predic(on Superstorm Sandy Hurricane
More informationECMWF Scalability Programme
ECMWF Scalability Programme Picture: Stan Tomov, ICL, University of Tennessee, Knoxville Peter Bauer, Mike Hawkins, Deborah Salmond, Stephan Siemen, Yannick Trémolet, and Nils Wedi Next generation science
More informationProgress in Numerical Methods at ECMWF
Progress in Numerical Methods at ECMWF EWGLAM / SRNWP October 2016 W. Deconinck, G. Mengaldo, C. Kühnlein, P.K. Smolarkiewicz, N.P. Wedi, P. Bauer willem.deconinck@ecmwf.int ECMWF November 7, 2016 2 The
More informationHYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017
HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher
More informationAn Overview of HPC at the Met Office
An Overview of HPC at the Met Office Paul Selwood Crown copyright 2006 Page 1 Introduction The Met Office National Weather Service for the UK Climate Prediction (Hadley Centre) Operational and Research
More informationFirst, a look at using OpenACC on WRF subroutine advance_w dynamics routine
First, a look at using OpenACC on WRF subroutine advance_w dynamics routine Second, an estimate of WRF multi-node performance on Cray XK6 with GPU accelerators Based on performance of WRF kernels, what
More informationThe spectral transform method
The spectral transform method by Nils Wedi European Centre for Medium-Range Weather Forecasts wedi@ecmwf.int Advanced Numerical Methods for Earth-System Modelling Slide 1 Advanced Numerical Methods for
More informationECMWF Computing & Forecasting System
ECMWF Computing & Forecasting System icas 2015, Annecy, Sept 2015 Isabella Weger, Deputy Director of Computing ECMWF September 17, 2015 October 29, 2014 ATMOSPHERE MONITORING SERVICE CLIMATE CHANGE SERVICE
More informationPiz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess
Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting Thomas C. Schulthess 1 Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Piz Daint Compute node:
More informationPerformance Analysis of Lattice QCD Application with APGAS Programming Model
Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models
More information11 Parallel programming models
237 // Program Design 10.3 Assessing parallel programs 11 Parallel programming models Many different models for expressing parallelism in programming languages Actor model Erlang Scala Coordination languages
More informationOperational and research activities at ECMWF now and in the future
Operational and research activities at ECMWF now and in the future Sarah Keeley Education Officer Erland Källén Director of Research ECMWF An independent intergovernmental organisation established in 1975
More informationPerformance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville
Performance of the fusion code GYRO on three four generations of Crays Mark Fahey mfahey@utk.edu University of Tennessee, Knoxville Contents Introduction GYRO Overview Benchmark Problem Test Platforms
More informationJulian Merten. GPU Computing and Alternative Architecture
Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg
More informationImprovement of MPAS on the Integration Speed and the Accuracy
ICAS2017 Annecy, France Improvement of MPAS on the Integration Speed and the Accuracy Wonsu Kim, Ji-Sun Kang, Jae Youp Kim, and Minsu Joh Disaster Management HPC Technology Research Center, Korea Institute
More informationScalability Ini,a,ve at ECMWF
Scalability Ini,a,ve at ECMWF Picture: Stan Tomov, ICL, University of Tennessee, Knoxville Peter Bauer, Mike Hawkins, George Mozdzynski, Deborah Salmond, Stephan Siemen, Peter Towers, Yannick Trémolet,
More informationCrossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia
On the Paths to Exascale: Crossing the Chasm Presented by Mike Rezny, Monash University, Australia michael.rezny@monash.edu Crossing the Chasm meeting Reading, 24 th October 2016 Version 0.1 In collaboration
More informationNumerical Weather Prediction in 2040
Numerical Weather Prediction in 2040 10.8 µm GEO imagery (simulated!) Peter Bauer, ECMWF Acks.: N. Bormann, C. Cardinali, A. Geer, C. Kuehnlein, C. Lupu, T. McNally, S. English, N. Wedi will not discuss
More informationS8241 VERSIONING GPU- ACCLERATED WRF TO Jeff Adie, 26 March, 2018 (Presented by Stan Posey, NVIDIA)
S8241 VERSIONING GPU- ACCLERATED WRF TO 3.7.1 Jeff Adie, 26 March, 2018 (Presented by Stan Posey, NVIDIA) 1 ACKNOWLEDGEMENT The work presented here today would not have been possible without the efforts
More informationParallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2
1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013
More informationThe Panel: What does the future look like for NPW application development? 17 th ECMWF Workshop on High Performance Computing in Meteorology
The Panel: What does the future look like for NPW application development? 17 th ECMWF Workshop on High Performance Computing in Meteorology 16:00-17:30 27 October 2016 Panelists John Michalakes (UCAR,
More informationHow to Prepare Weather and Climate Models for Future HPC Hardware
How to Prepare Weather and Climate Models for Future HPC Hardware Peter Düben European Weather Centre (ECMWF) Peter Düben Page 2 The European Weather Centre (ECMWF) www.ecmwf.int Independent, intergovernmental
More informationScaling the Software and Advancing the Science of Global Modeling and Assimilation Systems at NASA. Bill Putman
Global Modeling and Assimilation Office Scaling the Software and Advancing the Science of Global Modeling and Assimilation Systems at NASA Bill Putman Max Suarez, Lawrence Takacs, Atanas Trayanov and Hamid
More informationFuture Improvements of Weather and Climate Prediction
Future Improvements of Weather and Climate Prediction Unidata Policy Committee October 21, 2010 Alexander E. MacDonald, Ph.D. Deputy Assistant Administrator for Labs and Cooperative Institutes & Director,
More informationA CUDA Solver for Helmholtz Equation
Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College
More informationFrom Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system. Oliver Fuhrer and Thomas C. Schulthess
From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system Oliver Fuhrer and Thomas C. Schulthess 1 Piz Daint Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Compute node:
More informationThe next-generation supercomputer and NWP system of the JMA
The next-generation supercomputer and NWP system of the JMA Masami NARITA m_narita@naps.kishou.go.jp Numerical Prediction Division (NPD), Japan Meteorological Agency (JMA) Purpose of supercomputer & NWP
More informationPerformance Predictions for Storm-Resolving Simulations of the Climate System
Performance Predictions for Storm-Resolving Simulations of the Climate System Philipp Neumann, Joachim Biercamp, Niklas Röber Deutsches Klimarechenzentrum (DKRZ) Luis Kornblueh, Matthias Brück Max-Planck-Institut
More informationRecent advances in the GFDL Flexible Modeling System
Recent advances in the GFDL Flexible Modeling System 4th ENES HPC Workshop Toulouse, FRANCE V. Balaji and many others NOAA/GFDL and Princeton University 6 April 2016 V. Balaji (balaji@princeton.edu) GFDL
More informationExascale I/O challenges for Numerical Weather Prediction
Exascale I/O challenges for Numerical Weather Prediction A view from ECMWF Tiago Quintino, B. Raoult, S. Smart, A. Bonanni, F. Rathgeber, P. Bauer, N. Wedi ECMWF tiago.quintino@ecmwf.int SuperComputing
More informationThe coupled ocean atmosphere model at ECMWF: overview and technical challenges. Kristian S. Mogensen Marine Prediction Section ECMWF
The coupled ocean atmosphere model at ECMWF: overview and technical challenges Kristian S. Mogensen Marine Prediction Section ECMWF Slide 1 Overview of talk: Baseline: The focus of this talk is going to
More informationIntroduction of a Stabilized Bi-Conjugate Gradient iterative solver for Helmholtz s Equation on the CMA GRAPES Global and Regional models.
Introduction of a Stabilized Bi-Conjugate Gradient iterative solver for Helmholtz s Equation on the CMA GRAPES Global and Regional models. Peng Hong Bo (IBM), Zaphiris Christidis (Lenovo) and Zhiyan Jin
More informationSwedish Meteorological and Hydrological Institute
Swedish Meteorological and Hydrological Institute Norrköping, Sweden 1. Summary of highlights HIRLAM at SMHI is run on a CRAY T3E with 272 PEs at the National Supercomputer Centre (NSC) organised together
More informationReflecting on the Goal and Baseline of Exascale Computing
Reflecting on the Goal and Baseline of Exascale Computing Thomas C. Schulthess!1 Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b!2 Tracking supercomputer performance over
More informationParallel Multivariate SpatioTemporal Clustering of. Large Ecological Datasets on Hybrid Supercomputers
Parallel Multivariate SpatioTemporal Clustering of Large Ecological Datasets on Hybrid Supercomputers Sarat Sreepathi1, Jitendra Kumar1, Richard T. Mills2, Forrest M. Hoffman1, Vamsi Sripathi3, William
More informationAcceleration of WRF on the GPU
Acceleration of WRF on the GPU Daniel Abdi, Sam Elliott, Iman Gohari Don Berchoff, Gene Pache, John Manobianco TempoQuest 1434 Spruce Street Boulder, CO 80302 720 726 9032 TempoQuest.com THE WORLD S FASTEST
More informationImproving weather prediction via advancing model initialization
Improving weather prediction via advancing model initialization Brian Etherton, with Christopher W. Harrop, Lidia Trailovic, and Mark W. Govett NOAA/ESRL/GSD 15 November 2016 The HPC group at NOAA/ESRL/GSD
More informationA Global Atmospheric Model. Joe Tribbia NCAR Turbulence Summer School July 2008
A Global Atmospheric Model Joe Tribbia NCAR Turbulence Summer School July 2008 Outline Broad overview of what is in a global climate/weather model of the atmosphere Spectral dynamical core Some results-climate
More informationDevelopment of Yin-Yang Grid Global Model Using a New Dynamical Core ASUCA.
Development of Yin-Yang Grid Global Model Using a New Dynamical Core ASUCA. M. Sakamoto, J. Ishida, K. Kawano, K. Matsubayashi, K. Aranami, T. Hara, H. Kusabiraki, C. Muroi, Y. Kitamura Japan Meteorological
More informationScalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver
Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,
More informationMoving to a simpler NCEP production suite
Moving to a simpler NCEP production suite Unified coupled global modeling Hendrik L. Tolman Director, Environmental Modeling Center NOAA / NWS / NCEP Hendrik.Tolman@NOAA.gov page 1 of 14 Content The suite
More informationPerformance of WRF using UPC
Performance of WRF using UPC Hee-Sik Kim and Jong-Gwan Do * Cray Korea ABSTRACT: The Weather Research and Forecasting (WRF) model is a next-generation mesoscale numerical weather prediction system. We
More informationDeutscher Wetterdienst
Deutscher Wetterdienst NUMEX Numerical Experiments and NWP-development at DWD 14th Workshop on Meteorological Operational Systems ECMWF 18-20 November 2013 Thomas Hanisch GB Forschung und Entwicklung (FE)
More informationS3D Direct Numerical Simulation: Preparation for the PF Era
S3D Direct Numerical Simulation: Preparation for the 10 100 PF Era Ray W. Grout, Scientific Computing SC 12 Ramanan Sankaran ORNL John Levesque Cray Cliff Woolley, Stan Posey nvidia J.H. Chen SNL NREL
More informationSPECIAL PROJECT PROGRESS REPORT
SPECIAL PROJECT PROGRESS REPORT Progress Reports should be 2 to 10 pages in length, depending on importance of the project. All the following mandatory information needs to be provided. Reporting year
More informationTuning And Understanding MILC Performance In Cray XK6 GPU Clusters. Mike Showerman, Guochun Shi Steven Gottlieb
Tuning And Understanding MILC Performance In Cray XK6 GPU Clusters Mike Showerman, Guochun Shi Steven Gottlieb Outline Background Lattice QCD and MILC GPU and Cray XK6 node architecture Implementation
More informationParallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29
Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing
More informationESiWACE. A Center of Excellence for HPC applications to support cloud resolving earth system modelling
ESiWACE A Center of Excellence for HPC applications to support cloud resolving earth system modelling Joachim Biercamp, Panagiotis Adamidis, Philipp Neumann Deutsches Klimarechenzentrum (DKRZ) Motivation:
More informationScalable and Power-Efficient Data Mining Kernels
Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the
More informationAccelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric
More informationImproving Dynamical Core Scalability, Accuracy, and Limi:ng Flexibility with the ADER- DT Time Discre:za:on
Improving Dynamical Core Scalability, Accuracy, and Limi:ng Flexibility with the ADER- DT Time Discre:za:on Matthew R. Norman Scientific Computing Group National Center for Computational Sciences Oak Ridge
More informationGPU Acceleration of Weather Forecasting and Meteorological Satellite Data Assimilation, Processing and Applications http://www.tempoquest.com Allen Huang, Ph.D. allen@tempoquest.com CTO, Tempo Quest Inc.
More informationPerm State University Research-Education Center Parallel and Distributed Computing
Perm State University Research-Education Center Parallel and Distributed Computing A 25-minute Talk (S4493) at the GPU Technology Conference (GTC) 2014 MARCH 24-27, 2014 SAN JOSE, CA GPU-accelerated modeling
More informationWRF Modeling System Overview
WRF Modeling System Overview Jimy Dudhia What is WRF? WRF: Weather Research and Forecasting Model Used for both research and operational forecasting It is a supported community model, i.e. a free and shared
More informationarxiv: v1 [hep-lat] 10 Jul 2012
Hybrid Monte Carlo with Wilson Dirac operator on the Fermi GPU Abhijit Chakrabarty Electra Design Automation, SDF Building, SaltLake Sec-V, Kolkata - 700091. Pushan Majumdar Dept. of Theoretical Physics,
More informationClaude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique
Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)
More informationMassively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem
Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Katharina Kormann 1 Klaus Reuter 2 Markus Rampp 2 Eric Sonnendrücker 1 1 Max Planck Institut für Plasmaphysik 2 Max Planck Computing
More informationMesoscale meteorological models. Claire L. Vincent, Caroline Draxl and Joakim R. Nielsen
Mesoscale meteorological models Claire L. Vincent, Caroline Draxl and Joakim R. Nielsen Outline Mesoscale and synoptic scale meteorology Meteorological models Dynamics Parametrizations and interactions
More informationWRF Modeling System Overview
WRF Modeling System Overview Jimy Dudhia What is WRF? WRF: Weather Research and Forecasting Model Used for both research and operational forecasting It is a supported community model, i.e. a free and shared
More informationarxiv: v1 [hep-lat] 7 Oct 2010
arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA
More informationECMWF Forecasting System Research and Development
ECMWF Forecasting System Research and Development Jean-Noël Thépaut ECMWF October 2012 Slide 1 and many colleagues from the Research Department Slide 1, ECMWF The ECMWF Integrated Forecasting System (IFS)
More informationSPECIAL PROJECT PROGRESS REPORT
SPECIAL PROJECT PROGRESS REPORT Progress Reports should be 2 to 10 pages in length, depending on importance of the project. All the following mandatory information needs to be provided. Reporting year
More informationTHE WEATHER RESEARCH AND FORECAST MODEL VERSION 2.0
THE WEATHER RESEARCH AND FORECAST MODEL VERSION 2.0 J. MICHALAKES, J. DUDHIA, D. GILL J. KLEMP, W. SKAMAROCK, W. WANG Mesoscale and Microscale Meteorology National Center for Atmospheric Research Boulder,
More informationTargeting Extreme Scale Computational Challenges with Heterogeneous Systems
Targeting Extreme Scale Computational Challenges with Heterogeneous Systems Oreste Villa, Antonino Tumeo Pacific Northwest Na/onal Laboratory (PNNL) 1 Introduction! PNNL Laboratory Directed Research &
More informationBuilding Ensemble-Based Data Assimilation Systems. for High-Dimensional Models
47th International Liège Colloquium, Liège, Belgium, 4 8 May 2015 Building Ensemble-Based Data Assimilation Systems for High-Dimensional s Lars Nerger, Paul Kirchgessner Alfred Wegener Institute for Polar
More informationWeather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012
Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,
More informationPanorama des modèles et outils de programmation parallèle
Panorama des modèles et outils de programmation parallèle Sylvain HENRY sylvain.henry@inria.fr University of Bordeaux - LaBRI - Inria - ENSEIRB April 19th, 2013 1/45 Outline Introduction Accelerators &
More informationTR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems
TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a
More informationPRODUCT USER MANUAL For GLOBAL Ocean Waves Analysis and Forecasting Product GLOBAL_ANALYSIS_FORECAST_WAV_001_027
PRODUCT USER MANUAL For GLOBAL Ocean Waves Analysis and GLOBAL_ANALYSIS_FORECAST_WAV_001_027 Contributors: E. Fernandez, L. Aouf CMEMS version scope : 4 Approval Date by CMEMS products team : 22/03/2018
More informationJOINT WMO TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA PROCESSING AND FORECASTING SYSTEM AND NUMERICAL WEATHER PREDICTION RESEARCH ACTIVITIES FOR 2007
JOINT WMO TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA PROCESSING AND FORECASTING SYSTEM AND NUMERICAL WEATHER PREDICTION RESEARCH ACTIVITIES FOR 2007 [TURKEY/Turkish State Meteorological Service] 1. Summary
More informationHigh-performance processing and development with Madagascar. July 24, 2010 Madagascar development team
High-performance processing and development with Madagascar July 24, 2010 Madagascar development team Outline 1 HPC terminology and frameworks 2 Utilizing data parallelism 3 HPC development with Madagascar
More informationECMWF Overview. The European Centre for Medium-Range Weather Forecasts is an international. organisation supported by 23 European States.
ECMWF Overview The European Centre for Medium-Range Weather Forecasts is an international organisation supported by 3 European States. The center was established in 1973 by a Convention and the real-time
More informationEUMETSAT SAF NETWORK. Lothar Schüller, EUMETSAT SAF Network Manager
1 EUMETSAT SAF NETWORK Lothar Schüller, EUMETSAT SAF Network Manager EUMETSAT ground segment overview METEOSAT JASON-2 INITIAL JOINT POLAR SYSTEM METOP NOAA SATELLITES CONTROL AND DATA ACQUISITION FLIGHT
More informationPerformance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6
Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6 Sangwon Joo, Yoonjae Kim, Hyuncheol Shin, Eunhee Lee, Eunjung Kim (Korea Meteorological Administration) Tae-Hun
More informationLightweight Superscalar Task Execution in Distributed Memory
Lightweight Superscalar Task Execution in Distributed Memory Asim YarKhan 1 and Jack Dongarra 1,2,3 1 Innovative Computing Lab, University of Tennessee, Knoxville, TN 2 Oak Ridge National Lab, Oak Ridge,
More informationTips Geared Towards R. Adam J. Suarez. Arpil 10, 2015
Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done
More informationGPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications
GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign
More informationPerformance Evaluation of MPI on Weather and Hydrological Models
NCAR/RAL Performance Evaluation of MPI on Weather and Hydrological Models Alessandro Fanfarillo elfanfa@ucar.edu August 8th 2018 Cheyenne - NCAR Supercomputer Cheyenne is a 5.34-petaflops, high-performance
More informationSupercomputer Programme
Supercomputer Programme A seven-year programme to enhance the computational and numerical prediction capabilities of the Bureau s forecast and warning services. Tim Pugh, Lesley Seebeck, Tennessee Leeuwenburg,
More informationCEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication
1 / 26 CEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole
More informationWRF Modeling System Overview
WRF Modeling System Overview Wei Wang & Jimy Dudhia Nansha, Guangdong, China December 2015 What is WRF? WRF: Weather Research and Forecasting Model Used for both research and operational forecasting It
More informationParallelization of the Dirac operator. Pushan Majumdar. Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata
Parallelization of the Dirac operator Pushan Majumdar Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata Outline Introduction Algorithms Parallelization Comparison of performances Conclusions
More informationHistory of the partnership between SMHI and NSC. Per Undén
History of the partnership between SMHI and NSC Per Undén Outline Pre-history and NWP Preparations parallelisation HPD Council Decision and early developments Climate modelling Other applications HPD Project
More informationERLANGEN REGIONAL COMPUTING CENTER
ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,
More informationWRF Modeling System Overview
WRF Modeling System Overview Jimy Dudhia What is WRF? WRF: Weather Research and Forecasting Model Used for both research and operational forecasting It is a supported community model, i.e. a free and shared
More informationEmpowering Scientists with Domain Specific Languages
Empowering Scientists with Domain Specific Languages Julian Kunkel, Nabeeh Jum ah Scientific Computing Department of Informatics University of Hamburg SciCADE2017 2017-09-13 Outline 1 Developing Scientific
More informationMAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors
MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors J. Dongarra, M. Gates, A. Haidar, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov University of Tennessee, Knoxville 05 / 03 / 2013 MAGMA:
More informationOn the Paths to Exascale: Will We be Hungry?
On the Paths to Exascale: Will We be Hungry? Presentation by Mike Rezny, Monash University, Australia michael.rezny@monash.edu 4th ENES Workshop High Performance Computing for Climate and Weather Toulouse,
More informationA model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)
A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal
More informationA framework for detailed multiphase cloud modeling on HPC systems
Center for Information Services and High Performance Computing (ZIH) A framework for detailed multiphase cloud modeling on HPC systems ParCo 2009, 3. September 2009, ENS Lyon, France Matthias Lieber a,
More informationApplication and verification of ECMWF products 2016
Application and verification of ECMWF products 2016 RHMS of Serbia 1 Summary of major highlights ECMWF forecast products became the backbone in operational work during last several years. Starting from
More informationPerformance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures
Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures José I. Aliaga Performance and Energy Analysis of the Iterative Solution of Sparse
More informationFigure 1 - Resources trade-off. Image of Jim Kinter (COLA)
CLIMATE CHANGE RESEARCH AT THE EXASCALE Giovanni Aloisio *,, Italo Epicoco *,, Silvia Mocavero and Mark Taylor^ (*) University of Salento, Lecce, Italy ( ) Euro-Mediterranean Centre for Climate Change
More informationN-body Simulations. On GPU Clusters
N-body Simulations On GPU Clusters Laxmikant Kale Filippo Gioachin Pritish Jetley Thomas Quinn Celso Mendes Graeme Lufkin Amit Sharma Joachim Stadel Lukasz Wesolowski James Wadsley Edgar Solomonik Fabio
More informationOn Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code
On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance
More informationHow to shape future met-services: a seamless perspective
How to shape future met-services: a seamless perspective Paolo Ruti, Chief World Weather Research Division Sarah Jones, Chair Scientific Steering Committee Improving the skill big resources ECMWF s forecast
More information