Analysis of the Weather Research and Forecasting (WRF) Model on Large-Scale Systems

Size: px
Start display at page:

Download "Analysis of the Weather Research and Forecasting (WRF) Model on Large-Scale Systems"

Transcription

1 John von Neumann Institute for Computing Analysis of the Weather Research and Forecasting (WRF) Model on Large-Scale Systems Darren J. Kerbyson, Kevin J. Barker, Kei Davis published in Parallel Computing: Architectures, Algorithms and Applications, C. Bischof, M. Bücker, P. Gibbon, G.R. Joubert, T. Lippert, B. Mohr, F. Peters (Eds.), John von Neumann Institute for Computing, Jülich, NIC Series, Vol. 38, ISBN , pp , Reprinted in: Advances in Parallel Computing, Volume 15, ISSN , ISBN (IOS Press), c 2007 by John von Neumann Institute for Computing Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher mentioned above.

2 Analysis of the Weather Research and Forecasting (WRF) Model on Large-Scale Systems Darren J. Kerbyson, Kevin J. Barker, and Kei Davis Performance and Architecture Lab Los Alamos National Laboratory Los Alamos, NM USA {djk, kjbarker, In this work we analyze the performance of the Weather Research and Forecasting (WRF) model using both empirical data and an accurate analytic performance model. WRF is a largescale mesoscale numerical weather prediction system designed for both operational forecasting and atmospheric research. It is in active development at the National Center for Atmospheric Research (NCAR), and can use 1,000 s of processors in parallel. In this work we compare the performance of WRF on a cluster-based system (AMD Opteron processors interconnected with 4x SDR Infiniband) to that on a mesh-based system (IBM Blue Gene/L interconnected with a proprietary 3-D torus). In addition, we develop a performance model of WRF that is validated against these two systems and that exhibits high prediction accuracy. The model is then used to examine the performance of a near-term future generation supercomputer. 1 Introduction The Weather Research and Forecasting (WRF) model is a community mesoscale numerical weather prediction system with nearly 5,000 users, developed by a consortium of government agencies together with the research community. It is used for both operational forecasting research and atmospheric research, particularly at the 1-10 km scale, and is capable of modelling events such as storm systems and hurricanes 10. It is also being used for regional climate modelling, chemistry and air-quality research and prediction, large eddy simulations, cloud and storm modelling, and data assimilation. Features of WRF include dynamical cores based on finite difference methods and many options for physical parameterizations (microphysics, cumulus parameterization, planetary boundary layers, turbulence, radiation, and surface models) that are being developed by various groups. It includes two-way moving nests and can be coupled with other models including hydrology, land-surface, and ocean models. WRF has been ported to various platforms and can utilize thousands of processors in parallel. Future computational requirements are expected to increase as a consequence of both increased resolution and the use of increasingly sophisticated physics models. Our performance model of WRF allows accurate prediction of the performance of WRF on near-future large-scale systems that may contain many hundreds of thousands of processors. In this work we analyze the performance of WRF (version 2.2) on two very different large-scale systems: a cluster of 256 Opteron nodes (1,024 processing cores) interconnected using Infiniband, and a small Blue Gene/L system containing 1,024 nodes (2,048 processing cores) interconnected by a proprietary 3-D torus 1. This comparison allows us to draw conclusions concerning the system sizes required to achieve an equivalent level of performance on WRF. 89

3 An important aspect of this work is the capture of key performance characteristics into an analytical performance model. This model is parameterized in terms of the main application inputs (iteration count, number of grid points in each dimension, the computation load per grid point, etc.) as well as system parameters (processor count, communication topology, latencies and bandwidths, etc.). The model also takes as input the time per cell when using all processing cores in a single node this can be measured on an available system, or determined for a future system using a processor simulator. The utility of the model is its capability of predicting for larger-scale systems that are not available for measurement, and for predicting for future or hypothetical systems. In Section 2 we provide an overview of WRF, two commonly used input decks, and the measured performance on the two systems. In Section 3 we detail the performance model and show that it has high prediction accuracy when compared to the measured data. In Section 4 we compare the performance of the two systems and quantify the size of the systems that are required to achieve equivalent performance. We then extend this work to a future large-scale system architecture using the analytic performance model we develop. 2 Overview of WRF WRF uses a three-dimensional grid to represent the atmosphere at scales ranging from meters to thousands of kilometers, topographical land information, and observational data to define initial conditions for a forecasting simulation. It features multiple dynamical cores of which one is chosen for a particular simulation. In this work we have analyzed two simulations that are defined by separate input decks, standard.input and large.input, as used in the Department of Defense technology insertion benchmark suite (TI-06). Both input decks are commonly used in the performance assessment of WRF and are representative of real-world forecasting scenarios. The main characteristics of these input decks are listed in Table 1. Both inputs define a weather forecast for the continental United States but at different resolutions, resulting in differences in the global problem size (number of cells) processed. As a consequence a smaller time step is necessary for large.input. Different physics are also used in the two forecasts, the details of which are not of concern to this work but nevertheless impact the processing requirements (and so processing time per cell). The global grid is partitioned in the two horizontal dimensions across a logical 2-D processor array. Each processor is assigned a subgrid of approximately equal size. Strong scaling is used to achieve faster time to solution, thus the subgrid on each processor becomes increasingly smaller with increased processor count, and the proportion of time spent in parallel activities increases. The main parallel activities in WRF consist of boundary exchanges that occur in all four logical directions 39 such exchanges occur in each direction in each iteration when using large.input, and 35 when using standard.input. All message sizes are a function of the two horizontal dimensions of the subgrid as well as the subgrid depth and can range from 10 s to 100 s of KB at a 512 processor scale. Each iteration on large.input advances the simulation time by 30 seconds. A total of 5720 iterations performs a 48-hour weather forecast. In addition, every 10 minutes in the simulation (every 20th iteration) a radiation physics step occurs, and every hour in the simulation (every 120th iteration) a history file is generated. These latter iterations involve large I/O operations and are excluded from our analysis. An example of the measured 90

4 standard.input large.input Simulation Resolution 12 km 5 km Duration 2 days 2 days Iteration time-step 72 s 30 s Total no. iterations 2,400 5,760 Grid dimensions East-west North-south Vertical Total grid points 4.5 M 27.6 M Physics Microphysics WSM 3-class simple WSM 5-class Land surface Thermal diffusion Noah model Radiation physics 10 min 10 min Lateral boundary updates 6 hr 3 hr History file gen. 1 hr 1 hr Table 1. Characteristics of the two commonly used WRF input decks Figure 1. Variation in the iteration time on a 512 processor run for large.input iteration time, for the first 720 iterations, taken from a 512 processor run using large.input on the Opteron cluster is shown in Fig. 1. Large peaks on the iteration time can clearly be seen every 120 iterations because of the history file generation. Note that the range on the vertical axis in Fig. 1 is 0 to 2 s while the history file generation (off scale) takes 160 s. The smaller peaks every 20th iteration result from the extra processing by the radiation physics. It should also be noted that the time for a typical iteration is nearly constant. 3 Performance Model Overview An overview of the performance model we have developed is described below, followed by a validation of the model s predictive capability. 91

5 3.1 Model Description The runtime of WRF is modeled as T RunTime = N Iter T CompIter +N IterRP T CompIterRP +(N Iter +N IterRP ) T CommIterRP (1) where N Iter is the number of normal iterations, N IterRP is the number of iterations with radiation physics, T CompIter and T CompIterRP are the modeled computation time per iteration for the two types of iterations, respectively, and T CommIter is the communication time per iteration. The computation times are a function of the number of grid points assigned to each processor, and the measured time per cell on a processing node of the system: T CompIter = T CompIterRP = N x N x P x P x Ny P y Ny P y N z T CompPerCell (2) N z T CompRPPerCell Note that the computation time per cell for both types of iteration may also be a function of the number of cells in a subgrid. The communication time consists of two components: one for the east-west (horizontal x dimension), and one for the north-south (horizontal y dimension) boundary exchanges. Each exchange is done by two calls to MPI Isend and two calls to MPI Irecv followed by an MPI Waitall. From an analysis of an execution of WRF the number of boundary exchanges, NumBX, was found to be 35 and 39 for standard.input and large.input, respectively. The time to perform a single boundary exchange is modeled as T CommIter = NumBX i=1 (T comm (Size xi, C x ) + T comm (Size yi, C y )) (3) where the size of the messages, Size xi and Size yi, vary over the NumBX boundary exchanges. A piece-wise linear model for the communication time is assumed which uses the latency, L c, and bandwidth, B c, of the communication network in the system. The effective communication latency and bandwidth vary depending on the size of a message and also the number of processors used (for example, in the cases of intra-node and inter-node communications for an SMP-based machine). T comm (S, C) = L c (S) + C S 1 B c (S) The communication model uses the bandwidths and latencies of the communication network observed in a single direction when performing bi-directional communications, as is the case in WRF for the boundary exchanges. They are obtained from a ping-pong type communication micro-benchmark that is independent of the application and in which the round-trip time required for bi-directional communications is measured as a function of the message size. This should not be confused with the peak uni-directional performance of the network or peak measured bandwidths from a performance evaluation exercise. The contention that occurs during inter-node communication is dependent on the communication direction (x or y dimension), the arrangement of subgrids across processing (4) 92

6 Opteron cluster Blue Gene/L System Peak 4.1 Tflops 5.7 Tflops Node Count 256 1,024 Core Count 1,024 2,048 Core Speed 2.0 GHz 0.7 GHz Nodes Peak 16 Gflops 5.6 Gflops Cores 4 2 Memory 8 GB 512 MB Network Topology 12-ary fat tree 3D torus MPI (zero-byte) latency 4.0 µs 2.8 µs MPI (1 MB) bandwidth 950 MB/s 154 MB/s n 1/2 15,000 B 1,400 B Table 2. Characteristics of the two systems used in the validation of the WRF model. nodes, the network topology and routing mechanism, and the number of processing cores per node. The contention within the network is denoted by the parameters C x and C y in equation 4. For example, a logical 2-D array of sub-grids can be folded into the 3-D topology of the Blue Gene interconnection network but will at certain scales result in more than one message requiring the use of the same communication channel, resulting in contention 3, 8. Contention can also occur on a fat tree network due to static routing (as used in Infiniband for example) but can be eliminated through routing table optimization 6. The node size also determines the number of processing cores that will share the connections to the network and hence also impact the contention. 3.2 Model Validation Two systems were used to validate the performance model of WRF. The first contains 256 nodes, each containing two dual-core AMD Opteron processors running at 2.0 GHz. Each node contains a single Mellanox Infiniband HCA having a single 4x SDR connection to a 288-port Voltaire ISR9288 switch. In this switch 264 ports are populated for the 256 compute nodes and also for a single head node. The Voltaire ISR9288 implements a twolevel 12-ary fat tree. All communication channels have a peak of 10 Gb/s per direction. This system is physically located at Los Alamos National Laboratory. The second system is a relatively small-scale Blue Gene/L system located at Lawrence Livermore National Laboratory (a sister to the 64K node system used for classified computing). It consisted of two mid-planes, each containing 512 dual-core embedded PowerPC440 processors running at 700 MHz. The main communication network arranges these nodes in a 3-D torus, and a further network is available for some collective operations and for global interrupts. The characteristics of both of these systems are listed in Table 2. Note that the stated MPI performance is for near-neighbour uni-directional communication 4. The quantity n 1/2 is the message size that achieves half of the peak bandwidth. It effectively indicates when a message is latency bound (when its size is less than n 1/2 ) or bandwidth bound. 93

7 (a) Opteron cluster (b) Blue Gene/L Figure 2. Measured and modeled performance of WRF for both typical and radiation physics iterations 4 Performance Comparison of Current Systems An interesting aspect of this work is the direct performance comparison of Blue Gene/L with the Opteron cluster on the WRF workload. We consider this in two steps, the first based on measured data alone and the second comparing larger scale systems using the performance model described in Section 3. In this way we show the utility of the performance model by exploring the performance of systems that could not be measured. The time for the typical iteration of WRF on standard.input is shown in Fig. 3(a) for both systems using the measured performance up to 256 nodes of the Opteron system, and up to 1024 nodes of the Blue Gene/L system. The model is used to predict the performance up to 16K nodes of an Opteron system and up to 64K nodes of the Blue Gene/L system. It is clear from this data that the Blue Gene/L system has a much lower performance (longer run-time) when using the same number of nodes. It should also be noted that the performance of WRF is expected to improve only up to 32K nodes of Blue Gene/L this limitation is due to the small sub-grid sizes that occur at this scale for standard.input, and the resulting high communication-to-computation ratio. The relative performance between the two systems is shown in Fig. 3(b). When comparing performance based on an equal number of processors, the Opteron cluster is between 3 and 6 times faster than Blue Gene/L. When comparing performance based on an equal node count, the Opteron cluster is between 5 and 6 times faster than Blue Gene/L. Note that for larger problem sizes, we would expect that runtime on Blue Gene would continue to decrease at larger scales, and thus the additional parallelism available in the system would improve performance. 5 Performance of Possible Future Blue Gene Systems The utility of the performance model lies in its ability to explore the performance of systems that cannot be directly measured. To illustrate this we consider a potential next- 94

8 (a) Time for typical iteration (b) Relative performance (Opteron to BG/L) Figure 3. Predicted performance of WRF on large-scale Blue Gene/L and Opteron/Infiniband systems. Blue Gene/P System Peak 891 Tflops Node Count 65,536 Core Count 262,144 Core Speed 850 MHz Nodes Peak 13.6 Gflops Cores 4 Memory 4 GB Network Topology 3D torus MPI (zero-byte) latency 1.5 µs MPI (1 MB) bandwidth 500 MB/s 750 B n 1/2 Table 3. Characteristics of the potential BG/P system. generation configuration of Blue Gene (Blue Gene/P). The characteristics of Blue Gene/P that are used in the following analysis are listed in Table 3. It should be noted that this analysis was undertaken prior to any actual Blue Gene/P hardware being available for measurement. We also assume the same logical arrangement as the largest Blue Gene/L system presently installed (a 32x32x64 node 3D torus), but containing quad-core processors with an 850 MHz clock speed and increased communication performance. Note that the peak of this system is 891 Tflops. The predicted performance of WRF using standard.input is shown in Fig. 4 and compared with that of Blue Gene/L (as presented earlier in Fig. 3). It was assumed that the processing rate per cell on a Blue Gene/P processing core would be the same as that on a Blue Gene/L processing core. It can be seen that we expect Blue Gene/P to result in improved performance (reduced processing time) when using up to approximately 10,000 95

9 Figure 4. Comparison of Blue Gene/P and Blue Gene/L performance Processing time/cell (µs) MPI latency (µs) MPI bandwidth (MB/s) -20% % Baseline % % Table 4. Performance characteristics used in the sensitivity analysis of Blue Gene/P. nodes. For larger input decks the performance should improve to an even larger scale. The expected performance of WRF on Blue Gene/P is also analyzed in terms of its sensitivity to the compute speed on a single node, the MPI latency, and the MPI bandwidth. The expected performance has a degree of uncertainty due to the inputs to the performance model being assumed rather than measured for the possible configuration of Blue Gene/P. A range of values for each of the compute performance, MPI latency and MPI bandwidth is used as listed in Table 4. Each of these values are varied from 20% to +20% of the baseline configuration. Each of the three values is varied independently, that is, one quantity is varied while the other two quantities are fixed at the baseline value. Two graphs are presented: the range in computational processing rates in Fig. 5(a), and the range in MPI bandwidths in Fig. 5(b). The sensitivity due to MPI latency was not included since the performance varied by at most 0.1%, i.e. WRF is not sensitive to latency. It can be seen that WRF is mostly sensitive to the processing rate up to 4K nodes (i.e. compute bound in this range), and mostly sensitive to the communication bandwidth at higher scales (i.e. bandwidth bound). Improvements to the processing performance would be most beneficial to improve performance of coarsely partitioned jobs (large cell counts per processor), whereas increased network bandwidth would be most beneficial for jobs executing on larger processor counts. 96

10 (a) Sensitivity to compute performance (b) Sensitivity to MPI bandwidth Figure 5. WRF sensitivity analysis on Blue Gene/P 6 Conclusions We have developed and validated an analytic performance model for the Weather Research and Forecasting (WRF) application and used it, in conjunction with empirical data, to quantitatively study application performance on two current generation and one near-term future generation supercomputer. Our analytic performance model was developed through careful study of the dynamic execution behaviour of the WRF application and subsequently validated using performance measurements on two current systems: a 256-node (1,024 core) AMD Opteron cluster using a 4x SDR Infiniband interconnection network, and a 1,024-node (2,048 core) IBM Blue Gene/L system utilizing a custom 3D torus network. In each case the average performance prediction error was less than 5%. With a validated performance model in place, we are able to extend our analysis to larger-scale current systems and near-term future machines. At small node count, we can see that overall application performance is tied most closely to single processor performance. At this scale, roughly four times as many Blue Gene/L nodes are required to match the performance of the Opteron/Infiniband cluster. At larger scale, communication performance becomes critical; in fact WRF performance on Blue Gene/L improves very slowly beyond roughly 10K nodes due to the communication contention caused by folding the logical 2D processor array onto the physical 3D network. This work is part of an ongoing project at Los Alamos to develop modelling techniques which facilitate analysis of workloads of interest to the scientific computing community on large-scale parallel systems 5. Acknowledgements This work was funded in part by the Department of Energy Accelerated Strategic Computing (ASC) program and by the Office of Science. Los Alamos National Laboratory is 97

11 operated by Los Alamos National Security LLC for the US Department of Energy under contract DE-AC52-06NA References 1. N. R. Adiga, et. al., An Overview of the Blue Gene/L Supercomputer, in: Proc. IEEE/ACM Supercomputing (SC 02), Baltimore, MD, (2002). 2. K. J. Barker and D. J. Kerbyson, A Performance Model and Scalability Analysis of the HYCOM Ocean Simulation Application, in: Proc. IASTED Int. Conf. on Parallel and Distributed Computing (PDCS), Las Vegas, NV, (2005). 3. G. Banot, A. Gara, P. Heidelberger, E. Lawless, J.C. Sexton, and R. Walkup, Optimizing Task Layout on the Blue Gene/L Supercomputer, IBM J. Research and Development, 49, , (2005). 4. K. Davis, A. Hoisie, G. Johnson, D. J. Kerbyson, M. Lang, S. Pakin, and F. Petrini, A Performance and Scalability Analysis of the Blue Gene/L Architecture, in: Proc. IEEE/ACM Supercomputing (SC 04), Pittsburgh, PA, (2004). 5. A. Hoisie, G. Johnson, D.J. Kerbyson, M. Lang, and S. Pakin, A Performance Compariosn through Benchmarking and Modeling of Three Leading Supercomputers: Blue Gene/L, Red Storm, and Purple, in: Proc. IEEE/ACM Supercomputing (SC 06), Tampa, FL, (2006). 6. G. Johnson, D.J. Kerbyson, and M. Lang, Application Specific Optimization of Infiniband Networks, Los Alamos Unclassified Report, LA-UR , (2006). 7. D.J. Kerbyson, and A. Hoisie, Performance Modeling of the Blue Gene Architecture, in: Proc. IEEE John Atanasoff Conf. on Modern Computing, Sofia, Bulgaria, (2006). 8. D.J. Kerbyson and P.W. Jones, A Performance Model of the Parallel Ocean Program, Int. J. of High Performance Computing Applications, 19, 1 16, (2005). 9. V. Salapura, R. Walkup and A. Gara, Exploiting Workload Parallelism for Performance and Power Optimization in Blue Gene, IEEE Micro 26, 67 81, (2006). 10. Weather Research and Forecasting (WRF) model, 98

ab initio Electronic Structure Calculations

ab initio Electronic Structure Calculations ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab

More information

Performance Evaluation of MPI on Weather and Hydrological Models

Performance Evaluation of MPI on Weather and Hydrological Models NCAR/RAL Performance Evaluation of MPI on Weather and Hydrological Models Alessandro Fanfarillo elfanfa@ucar.edu August 8th 2018 Cheyenne - NCAR Supercomputer Cheyenne is a 5.34-petaflops, high-performance

More information

Lecture 19. Architectural Directions

Lecture 19. Architectural Directions Lecture 19 Architectural Directions Today s lecture Advanced Architectures NUMA Blue Gene 2010 Scott B. Baden / CSE 160 / Winter 2010 2 Final examination Announcements Thursday, March 17, in this room:

More information

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012 Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

The next-generation supercomputer and NWP system of the JMA

The next-generation supercomputer and NWP system of the JMA The next-generation supercomputer and NWP system of the JMA Masami NARITA m_narita@naps.kishou.go.jp Numerical Prediction Division (NPD), Japan Meteorological Agency (JMA) Purpose of supercomputer & NWP

More information

NOAA Research and Development High Performance Compu3ng Office Craig Tierney, U. of Colorado at Boulder Leslie Hart, NOAA CIO Office

NOAA Research and Development High Performance Compu3ng Office Craig Tierney, U. of Colorado at Boulder Leslie Hart, NOAA CIO Office A survey of performance characteris3cs of NOAA s weather and climate codes across our HPC systems NOAA Research and Development High Performance Compu3ng Office Craig Tierney, U. of Colorado at Boulder

More information

Performance of WRF using UPC

Performance of WRF using UPC Performance of WRF using UPC Hee-Sik Kim and Jong-Gwan Do * Cray Korea ABSTRACT: The Weather Research and Forecasting (WRF) model is a next-generation mesoscale numerical weather prediction system. We

More information

The Performance Evolution of the Parallel Ocean Program on the Cray X1

The Performance Evolution of the Parallel Ocean Program on the Cray X1 The Performance Evolution of the Parallel Ocean Program on the Cray X1 Patrick H. Worley Oak Ridge National Laboratory John Levesque Cray Inc. 46th Cray User Group Conference May 18, 2003 Knoxville Marriott

More information

A framework for detailed multiphase cloud modeling on HPC systems

A framework for detailed multiphase cloud modeling on HPC systems Center for Information Services and High Performance Computing (ZIH) A framework for detailed multiphase cloud modeling on HPC systems ParCo 2009, 3. September 2009, ENS Lyon, France Matthias Lieber a,

More information

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017 HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher

More information

Scalable Systems for Computational Biology

Scalable Systems for Computational Biology John von Neumann Institute for Computing Scalable Systems for Computational Biology Ch. Pospiech published in From Computational Biophysics to Systems Biology (CBSB08), Proceedings of the NIC Workshop

More information

THE WEATHER RESEARCH AND FORECAST MODEL VERSION 2.0

THE WEATHER RESEARCH AND FORECAST MODEL VERSION 2.0 THE WEATHER RESEARCH AND FORECAST MODEL VERSION 2.0 J. MICHALAKES, J. DUDHIA, D. GILL J. KLEMP, W. SKAMAROCK, W. WANG Mesoscale and Microscale Meteorology National Center for Atmospheric Research Boulder,

More information

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia On the Paths to Exascale: Crossing the Chasm Presented by Mike Rezny, Monash University, Australia michael.rezny@monash.edu Crossing the Chasm meeting Reading, 24 th October 2016 Version 0.1 In collaboration

More information

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse

More information

2.5D algorithms for distributed-memory computing

2.5D algorithms for distributed-memory computing ntroduction for distributed-memory computing C Berkeley July, 2012 1/ 62 ntroduction Outline ntroduction Strong scaling 2.5D factorization 2/ 62 ntroduction Strong scaling Solving science problems faster

More information

An Overview of HPC at the Met Office

An Overview of HPC at the Met Office An Overview of HPC at the Met Office Paul Selwood Crown copyright 2006 Page 1 Introduction The Met Office National Weather Service for the UK Climate Prediction (Hadley Centre) Operational and Research

More information

BENCHMARK STUDY OF A 3D PARALLEL CODE FOR THE PROPAGATION OF LARGE SUBDUCTION EARTHQUAKES

BENCHMARK STUDY OF A 3D PARALLEL CODE FOR THE PROPAGATION OF LARGE SUBDUCTION EARTHQUAKES BENCHMARK STUDY OF A D PARALLEL CODE FOR THE PROPAGATION OF LARGE SUBDUCTION EARTHQUAKES Mario Chavez,2, Eduardo Cabrera, Raúl Madariaga 2, Narciso Perea, Charles Moulinec 4, David Emerson 4, Mike Ashworth

More information

A Data Communication Reliability and Trustability Study for Cluster Computing

A Data Communication Reliability and Trustability Study for Cluster Computing A Data Communication Reliability and Trustability Study for Cluster Computing Speaker: Eduardo Colmenares Midwestern State University Wichita Falls, TX HPC Introduction Relevant to a variety of sciences,

More information

Quantum Chemical Calculations by Parallel Computer from Commodity PC Components

Quantum Chemical Calculations by Parallel Computer from Commodity PC Components Nonlinear Analysis: Modelling and Control, 2007, Vol. 12, No. 4, 461 468 Quantum Chemical Calculations by Parallel Computer from Commodity PC Components S. Bekešienė 1, S. Sėrikovienė 2 1 Institute of

More information

First, a look at using OpenACC on WRF subroutine advance_w dynamics routine

First, a look at using OpenACC on WRF subroutine advance_w dynamics routine First, a look at using OpenACC on WRF subroutine advance_w dynamics routine Second, an estimate of WRF multi-node performance on Cray XK6 with GPU accelerators Based on performance of WRF kernels, what

More information

NOAA Supercomputing Directions and Challenges. Frank Indiviglio GFDL MRC Workshop June 1, 2017

NOAA Supercomputing Directions and Challenges. Frank Indiviglio GFDL MRC Workshop June 1, 2017 NOAA Supercomputing Directions and Challenges Frank Indiviglio GFDL frank.indiviglio@noaa.gov MRC Workshop June 1, 2017 2 NOAA Is Vital to American Economy A quarter of the GDP ($4 trillion) is reliant

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville Performance of the fusion code GYRO on three four generations of Crays Mark Fahey mfahey@utk.edu University of Tennessee, Knoxville Contents Introduction GYRO Overview Benchmark Problem Test Platforms

More information

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic

More information

Acceleration of WRF on the GPU

Acceleration of WRF on the GPU Acceleration of WRF on the GPU Daniel Abdi, Sam Elliott, Iman Gohari Don Berchoff, Gene Pache, John Manobianco TempoQuest 1434 Spruce Street Boulder, CO 80302 720 726 9032 TempoQuest.com THE WORLD S FASTEST

More information

WRF benchmark for regional applications

WRF benchmark for regional applications WRF benchmark for regional applications D. Arnold, 3D. Morton, I. Schicker, 4O. Jorba, 3K. Harrison, 5J. Zabloudil, 3G. Newby, P. Seibert,2 Institute of Meteorology, University of Natural Resources and

More information

WRF- Hydro Development and Performance Tes9ng

WRF- Hydro Development and Performance Tes9ng WRF- Hydro Development and Performance Tes9ng Wei Yu, David Gochis, David Yates Research Applica9ons Laboratory Na9onal Center for Atmospheric Research Boulder, CO USA Scien9fic Mo9va9on How does terrain

More information

The Blue Gene/P at Jülich Case Study & Optimization. W.Frings, Forschungszentrum Jülich,

The Blue Gene/P at Jülich Case Study & Optimization. W.Frings, Forschungszentrum Jülich, The Blue Gene/P at Jülich Case Study & Optimization W.Frings, Forschungszentrum Jülich, 26.08.2008 Jugene Case-Studies: Overview Case Study: PEPC Case Study: racoon Case Study: QCD CPU0CPU3 CPU1CPU2 2

More information

Forecasting of Optical Turbulence in Support of Realtime Optical Imaging and Communication Systems

Forecasting of Optical Turbulence in Support of Realtime Optical Imaging and Communication Systems Forecasting of Optical Turbulence in Support of Realtime Optical Imaging and Communication Systems Randall J. Alliss and Billy Felton Northrop Grumman Corporation, 15010 Conference Center Drive, Chantilly,

More information

Reflecting on the Goal and Baseline of Exascale Computing

Reflecting on the Goal and Baseline of Exascale Computing Reflecting on the Goal and Baseline of Exascale Computing Thomas C. Schulthess!1 Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b!2 Tracking supercomputer performance over

More information

ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS

ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS Alan Gray, Developer Technology Engineer, NVIDIA ECMWF 18th Workshop on high performance computing in meteorology, 28 th September 2018 ESCAPE NVIDIA s

More information

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations!

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations! Parallel Numerics Scope: Revise standard numerical methods considering parallel computations! Required knowledge: Numerics Parallel Programming Graphs Literature: Dongarra, Du, Sorensen, van der Vorst:

More information

Using Aziz Supercomputer

Using Aziz Supercomputer The Center of Excellence for Climate Change Research Using Aziz Supercomputer Mansour Almazroui Director, Center of Excellence for Climate Change Research (CECCR) Head, Department of Meteorology King Abdulaziz

More information

608 SENSITIVITY OF TYPHOON PARMA TO VARIOUS WRF MODEL CONFIGURATIONS

608 SENSITIVITY OF TYPHOON PARMA TO VARIOUS WRF MODEL CONFIGURATIONS 608 SENSITIVITY OF TYPHOON PARMA TO VARIOUS WRF MODEL CONFIGURATIONS Phillip L. Spencer * and Brent L. Shaw Weather Decision Technologies, Norman, OK, USA Bonifacio G. Pajuelas Philippine Atmospheric,

More information

Marla Meehl Manager of NCAR/UCAR Networking and Front Range GigaPoP (FRGP)

Marla Meehl Manager of NCAR/UCAR Networking and Front Range GigaPoP (FRGP) Big Data at the National Center for Atmospheric Research (NCAR) & expanding network bandwidth to NCAR over Pacific Wave and Western Regional Network (WRN) Marla Meehl Manager of NCAR/UCAR Networking and

More information

PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM

PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM Proceedings of ALGORITMY 25 pp. 22 211 PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM GABRIEL OKŠA AND MARIÁN VAJTERŠIC Abstract. One way, how to speed up the computation of the singular value

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer VCPC European Centre for Parallel Computing at Vienna Liechtensteinstraße 22, A-19 Vienna, Austria http://www.vcpc.univie.ac.at/qc/

More information

Performance Analysis of Parallel Alternating Directions Algorithm for Time Dependent Problems

Performance Analysis of Parallel Alternating Directions Algorithm for Time Dependent Problems Performance Analysis of Parallel Alternating Directions Algorithm for Time Dependent Problems Ivan Lirkov 1, Marcin Paprzycki 2, and Maria Ganzha 2 1 Institute of Information and Communication Technologies,

More information

Swedish Meteorological and Hydrological Institute

Swedish Meteorological and Hydrological Institute Swedish Meteorological and Hydrological Institute Norrköping, Sweden 1. Summary of highlights HIRLAM at SMHI is run on a CRAY T3E with 272 PEs at the National Supercomputer Centre (NSC) organised together

More information

WRF performance tuning for the Intel Woodcrest Processor

WRF performance tuning for the Intel Woodcrest Processor WRF performance tuning for the Intel Woodcrest Processor A. Semenov, T. Kashevarova, P. Mankevich, D. Shkurko, K. Arturov, N. Panov Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {alexander.l.semenov,tamara.p.kashevarova,pavel.v.mankevich,

More information

Parallel Eigensolver Performance on High Performance Computers 1

Parallel Eigensolver Performance on High Performance Computers 1 Parallel Eigensolver Performance on High Performance Computers 1 Andrew Sunderland STFC Daresbury Laboratory, Warrington, UK Abstract Eigenvalue and eigenvector computations arise in a wide range of scientific

More information

Preconditioned Parallel Block Jacobi SVD Algorithm

Preconditioned Parallel Block Jacobi SVD Algorithm Parallel Numerics 5, 15-24 M. Vajteršic, R. Trobec, P. Zinterhof, A. Uhl (Eds.) Chapter 2: Matrix Algebra ISBN 961-633-67-8 Preconditioned Parallel Block Jacobi SVD Algorithm Gabriel Okša 1, Marián Vajteršic

More information

Cyclops Tensor Framework

Cyclops Tensor Framework Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r

More information

Improvement of MPAS on the Integration Speed and the Accuracy

Improvement of MPAS on the Integration Speed and the Accuracy ICAS2017 Annecy, France Improvement of MPAS on the Integration Speed and the Accuracy Wonsu Kim, Ji-Sun Kang, Jae Youp Kim, and Minsu Joh Disaster Management HPC Technology Research Center, Korea Institute

More information

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,

More information

Logistics. Goof up P? R? Can you log in? Requests for: Teragrid yes? NCSA no? Anders Colberg Syrowski Curtis Rastogi Yang Chiu

Logistics. Goof up P? R? Can you log in? Requests for: Teragrid yes? NCSA no? Anders Colberg Syrowski Curtis Rastogi Yang Chiu Logistics Goof up P? R? Can you log in? Teragrid yes? NCSA no? Requests for: Anders Colberg Syrowski Curtis Rastogi Yang Chiu Introduction to Numerical Weather Prediction Thanks: Tom Warner, NCAR A bit

More information

WRF Modeling System Overview

WRF Modeling System Overview WRF Modeling System Overview Jimy Dudhia What is WRF? WRF: Weather Research and Forecasting Model Used for both research and operational forecasting It is a supported community model, i.e. a free and shared

More information

WRF Modeling System Overview

WRF Modeling System Overview WRF Modeling System Overview Jimy Dudhia What is WRF? WRF: Weather Research and Forecasting Model Used for both research and operational forecasting It is a supported community model, i.e. a free and shared

More information

Practical performance portability in the Parallel Ocean Program (POP)

Practical performance portability in the Parallel Ocean Program (POP) See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/227606785 Practical performance portability in the Parallel Ocean Program (POP) Article in

More information

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting Thomas C. Schulthess 1 Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Piz Daint Compute node:

More information

Fortran program + Partial data layout specifications Data Layout Assistant.. regular problems. dynamic remapping allowed Invoked only a few times Not part of the compiler Can use expensive techniques HPF

More information

MSC HPC Infrastructure Update. Alain St-Denis Canadian Meteorological Centre Meteorological Service of Canada

MSC HPC Infrastructure Update. Alain St-Denis Canadian Meteorological Centre Meteorological Service of Canada MSC HPC Infrastructure Update Alain St-Denis Canadian Meteorological Centre Meteorological Service of Canada Outline HPC Infrastructure Overview Supercomputer Configuration Scientific Direction 2 IT Infrastructure

More information

Simulation of storm surge and overland flows using geographical information system applications

Simulation of storm surge and overland flows using geographical information system applications Coastal Processes 97 Simulation of storm surge and overland flows using geographical information system applications S. Aliabadi, M. Akbar & R. Patel Northrop Grumman Center for High Performance Computing

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer September 9, 23 PPAM 23 1 Ian Glendinning / September 9, 23 Outline Introduction Quantum Bits, Registers

More information

How to deal with uncertainties and dynamicity?

How to deal with uncertainties and dynamicity? How to deal with uncertainties and dynamicity? http://graal.ens-lyon.fr/ lmarchal/scheduling/ 19 novembre 2012 1/ 37 Outline 1 Sensitivity and Robustness 2 Analyzing the sensitivity : the case of Backfilling

More information

Performance Results for the Weather Research and Forecast (WRF) Model on AHPCRC HPC Systems

Performance Results for the Weather Research and Forecast (WRF) Model on AHPCRC HPC Systems Performance Results for the Weather Research and Forecast (WRF) Model on AHPCRC HPC Systems Tony Meys, Army High Performance Computing Research Center / Network Computing Services, Inc. ABSTRACT: The Army

More information

Reliability at Scale

Reliability at Scale Reliability at Scale Intelligent Storage Workshop 5 James Nunez Los Alamos National lab LA-UR-07-0828 & LA-UR-06-0397 May 15, 2007 A Word about scale Petaflop class machines LLNL Blue Gene 350 Tflops 128k

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

Developing a High-Resolution Texas Water and Climate Prediction Model

Developing a High-Resolution Texas Water and Climate Prediction Model Developing a High-Resolution Texas Water and Climate Prediction Model Zong-Liang Yang (512) 471-3824 liang@jsg.utexas.edu Water Forum II on Texas Drought and Beyond, Austin, Texas, 22-23 October, 2012

More information

Egyptian Meteorological Authority Cairo Numerical Weather prediction centre

Egyptian Meteorological Authority Cairo Numerical Weather prediction centre JOINT WMO TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA PROCESSING AND FORECASTING SYSTEM AND NUMERICAL WEATHER PREDICTION RESEARCH ACTIVITIES FOR 2016 Egyptian Meteorological Authority Cairo Numerical

More information

Parallel Performance Studies for a Numerical Simulator of Atomic Layer Deposition Michael J. Reid

Parallel Performance Studies for a Numerical Simulator of Atomic Layer Deposition Michael J. Reid Section 1: Introduction Parallel Performance Studies for a Numerical Simulator of Atomic Layer Deposition Michael J. Reid During the manufacture of integrated circuits, a process called atomic layer deposition

More information

Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI

Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Charles Lo and Paul Chow {locharl1, pc}@eecg.toronto.edu Department of Electrical and Computer Engineering

More information

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge González-Domínguez Parallel and Distributed Architectures Group Johannes Gutenberg University of Mainz, Germany j.gonzalez@uni-mainz.de

More information

Optimization strategy for MASNUM surface wave model

Optimization strategy for MASNUM surface wave model Hillsboro, September 27, 2018 Optimization strategy for MASNUM surface wave model Zhenya Song *, + * First Institute of Oceanography (FIO), State Oceanic Administrative (SOA), China + Intel Parallel Computing

More information

High Performance Computing

High Performance Computing Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),

More information

Cactus Tools for Petascale Computing

Cactus Tools for Petascale Computing Cactus Tools for Petascale Computing Erik Schnetter Reno, November 2007 Gamma Ray Bursts ~10 7 km He Protoneutron Star Accretion Collapse to a Black Hole Jet Formation and Sustainment Fe-group nuclei Si

More information

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and

More information

Exploring the Use of Dynamical Weather and Climate Models for Risk Assessment

Exploring the Use of Dynamical Weather and Climate Models for Risk Assessment Exploring the Use of Dynamical Weather and Climate Models for Risk Assessment James Done Willis Research Network Fellow National Center for Atmospheric Research Boulder CO, US Leverages resources in the

More information

PI SERVER 2012 Do. More. Faster. Now! Copyr i g h t 2012 O S Is o f t, L L C. 1

PI SERVER 2012 Do. More. Faster. Now! Copyr i g h t 2012 O S Is o f t, L L C. 1 PI SERVER 2012 Do. More. Faster. Now! Copyr i g h t 2012 O S Is o f t, L L C. 1 AUGUST 7, 2007 APRIL 14, 2010 APRIL 24, 2012 Copyr i g h t 2012 O S Is o f t, L L C. 2 PI Data Archive Security PI Asset

More information

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

AMPS Update June 2013

AMPS Update June 2013 AMPS Update June 2013 8th Antarc*c Meteorological Observa*ons, Modeling, and Forecas*ng Workshop 10 12 June 2013 Madison, WI Kevin W. Manning Jordan G. Powers Mesoscale and Microscale Meteorology Division

More information

GloMAP Mode on HECToR Phase2b (Cray XT6) Mark Richardson Numerical Algorithms Group

GloMAP Mode on HECToR Phase2b (Cray XT6) Mark Richardson Numerical Algorithms Group GloMAP Mode on HECToR Phase2b (Cray XT6) Mark Richardson Numerical Algorithms Group 1 Acknowledgements NERC, NCAS Research Councils UK, HECToR Resource University of Leeds School of Earth and Environment

More information

Convection permitting seasonal latitude-belt simulation using the Weather Research and Forecasting (WRF) model

Convection permitting seasonal latitude-belt simulation using the Weather Research and Forecasting (WRF) model Convection permitting seasonal latitude-belt simulation using the Weather Research and Forecasting (WRF) model T. Schwitalla, K. Warrach-Sagi, and V. Wulfmeyer Institute of Physics and Meteorology University

More information

Cloud-based WRF Downscaling Simulations at Scale using Community Reanalysis and Climate Datasets

Cloud-based WRF Downscaling Simulations at Scale using Community Reanalysis and Climate Datasets Cloud-based WRF Downscaling Simulations at Scale using Community Reanalysis and Climate Datasets Luke Madaus -- 26 June 2018 luke.madaus@jupiterintel.com 2018 Unidata Users Workshop Outline What is Jupiter?

More information

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT

More information

Simulation Laboratories at JSC

Simulation Laboratories at JSC Mitglied der Helmholtz-Gemeinschaft Simulation Laboratories at JSC Paul Gibbon Jülich Supercomputing Centre Jülich Supercomputing Centre Supercomputer operation for Centre FZJ Regional JARA Helmholtz &

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.

More information

Red Sky. Pushing Toward Petascale with Commodity Systems. Matthew Bohnsack. Sandia National Laboratories Albuquerque, New Mexico USA

Red Sky. Pushing Toward Petascale with Commodity Systems. Matthew Bohnsack. Sandia National Laboratories Albuquerque, New Mexico USA Red Sky Pushing Toward Petascale with Commodity Systems Matthew Bohnsack Sandia National Laboratories Albuquerque, New Mexico USA mpbohns@sandia.gov Tuesday March 9, 2010 Matthew Bohnsack (Sandia Nat l

More information

Figure 1 - Resources trade-off. Image of Jim Kinter (COLA)

Figure 1 - Resources trade-off. Image of Jim Kinter (COLA) CLIMATE CHANGE RESEARCH AT THE EXASCALE Giovanni Aloisio *,, Italo Epicoco *,, Silvia Mocavero and Mark Taylor^ (*) University of Salento, Lecce, Italy ( ) Euro-Mediterranean Centre for Climate Change

More information

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information

Lecture 2: Metrics to Evaluate Systems

Lecture 2: Metrics to Evaluate Systems Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video

More information

Weather Research and Forecasting Model

Weather Research and Forecasting Model Weather Research and Forecasting Model Goals: Develop an advanced mesoscale forecast and assimilation system, and accelerate research advances into operations 36h WRF Precip Forecast Collaborative partnership,

More information

Next Genera*on Compu*ng: Needs and Opportuni*es for Weather, Climate, and Atmospheric Sciences. David Randall

Next Genera*on Compu*ng: Needs and Opportuni*es for Weather, Climate, and Atmospheric Sciences. David Randall Next Genera*on Compu*ng: Needs and Opportuni*es for Weather, Climate, and Atmospheric Sciences David Randall Way back I first modified, ran, and analyzed results from an atmospheric GCM in 1972. The model

More information

Timing Results of a Parallel FFTsynth

Timing Results of a Parallel FFTsynth Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1994 Timing Results of a Parallel FFTsynth Robert E. Lynch Purdue University, rel@cs.purdue.edu

More information

Performance of machines for lattice QCD simulations

Performance of machines for lattice QCD simulations Performance of machines for lattice QCD simulations Tilo Wettig Institute for Theoretical Physics University of Regensburg Lattice 2005, 30 July 05 Tilo Wettig Performance of machines for lattice QCD simulations

More information

Schwarz-type methods and their application in geomechanics

Schwarz-type methods and their application in geomechanics Schwarz-type methods and their application in geomechanics R. Blaheta, O. Jakl, K. Krečmer, J. Starý Institute of Geonics AS CR, Ostrava, Czech Republic E-mail: stary@ugn.cas.cz PDEMAMIP, September 7-11,

More information

Performance Analysis of Lattice QCD Application with APGAS Programming Model

Performance Analysis of Lattice QCD Application with APGAS Programming Model Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models

More information

Practical Combustion Kinetics with CUDA

Practical Combustion Kinetics with CUDA Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Practical Combustion Kinetics with CUDA GPU Technology Conference March 20, 2015 Russell Whitesides

More information

Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints

Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints Emre Salman and Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester

More information

High-performance Technical Computing with Erlang

High-performance Technical Computing with Erlang High-performance Technical Computing with Erlang Alceste Scalas Giovanni Casu Piero Pili Center for Advanced Studies, Research and Development in Sardinia ACM ICFP 2008 Erlang Workshop September 27th,

More information

Advanced Computing Systems for Scientific Research

Advanced Computing Systems for Scientific Research Undergraduate Review Volume 10 Article 13 2014 Advanced Computing Systems for Scientific Research Jared Buckley Jason Covert Talia Martin Recommended Citation Buckley, Jared; Covert, Jason; and Martin,

More information

Climate Variability Experiments on Cray XT4 and NERSC. John Dennis and Richard Loft National Center for Atmospheric Research Boulder, Colorado

Climate Variability Experiments on Cray XT4 and NERSC. John Dennis and Richard Loft National Center for Atmospheric Research Boulder, Colorado Optimizing High-Resolution Climate Variability Experiments on Cray XT4 and Cray XT5 Systems at NICS and NERSC John Dennis and Richard Loft National Center for Atmospheric Research Boulder, Colorado August

More information

Enabling Multi-Scale Simulations in WRF Through Vertical Grid Nesting

Enabling Multi-Scale Simulations in WRF Through Vertical Grid Nesting 2 1 S T S Y M P O S I U M O N B O U N D A R Y L A Y E R S A N D T U R B U L E N C E Enabling Multi-Scale Simulations in WRF Through Vertical Grid Nesting DAVID J. WIERSEMA University of California, Berkeley

More information

Operational weather Prediction(SEECOP)

Operational weather Prediction(SEECOP) South-East European Consortium for Operational weather Prediction(SEECOP) S. Nickovic Republic Hydrometeorological Service of Serbia (RHMSS), Belgrade, Serbia 37 th EWGLAM and 22 th SRNWP Meeting, 5-8

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 Parallel Computer Architecture Lecture Interconnection Network Performance Performance Analysis of Interconnection Networks Bandwidth Latency Proportional to diameter Latency with contention Processor

More information

A Fast and Scalable Low Dimensional Solver for Charged Particle Dynamics in Large Particle Accelerators

A Fast and Scalable Low Dimensional Solver for Charged Particle Dynamics in Large Particle Accelerators A Fast and Scalable Low Dimensional Solver for Charged Particle Dynamics in Large Particle Accelerators Yves Ineichen 1,2,3, Andreas Adelmann 2, Costas Bekas 3, Alessandro Curioni 3, Peter Arbenz 1 1 ETH

More information

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

The model simulation of the architectural micro-physical outdoors environment

The model simulation of the architectural micro-physical outdoors environment The model simulation of the architectural micro-physical outdoors environment sb08 Chiag Che-Ming, De-En Lin, Po-Cheng Chou and Yen-Yi Li Archilife research foundation, Taipei, Taiwa, archilif@ms35.hinet.net

More information

Post Von Neumann Computing

Post Von Neumann Computing Post Von Neumann Computing Matthias Kaiserswerth Hasler Stiftung (formerly IBM Research) 1 2014 IBM Corporation Foundation Purpose Support information and communication technologies (ICT) to advance Switzerland

More information