WRF performance tuning for the Intel Woodcrest Processor

Size: px
Start display at page:

Download "WRF performance tuning for the Intel Woodcrest Processor"

Transcription

1 WRF performance tuning for the Intel Woodcrest Processor A. Semenov, T. Kashevarova, P. Mankevich, D. Shkurko, K. Arturov, N. Panov Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, {alexander.l.semenov,tamara.p.kashevarova,pavel.v.mankevich,

2 Notations and abbreviations Woodcrest processor Dual Core Intel Xeon Processor model 5160 ppn processes per node (the number of cores used on a node) MVAPICH an MPI implementation from the Ohio State University WRF CONUS 12km 48-hour, 12km resolution case over the Continental U.S. (CONUS) domain October 24, 2001 that uses the Eulerian Mass dynamics WS8, WS9 benchmarks based on WRF 1.2 and correspondingly and described on

3 WRF based codes that we worked on WRF v NOAA WRF based benchmark workstreams WRF v WS8: WRF chemistry module WS9: WRF 2.0.2

4 Hardware and software 256 nodes Woodcrest, 2 sockets / 2 cores each, 3.0 GHz; 4 MB L2; 8GB RAM; Infiniband Interconnection Red Hat Enterprise Linux 3.0. Intel Fortran and C compilers v9.1 for Linux. Intel MPI 2.0, MVAPICH WRF v Workload: CONUS 12km; 48 hours forecast (2400 time steps) The results of running WRF CONUS 12km on 32 cores with ppn = 4: Options -O3 -O3 ip -O3 -xt -O3 -ip -xt -O3 -ip xt no-prec-div no-prec-sqrt Compile time (secs) Time (secs) Speed up to base base 1.3% 4.6% 6.8% 10.3%

5 Why use Intel Compilers? Efficiency: Inherent ability to highly optimize codes for all Intel processors Ease of Use: Automatic optimization features make it easier to obtain highly optimized target code Intel Premier Support: training best known methods problem fixes & workarounds Intel compilers use: Speculative memory accesses Advanced branch prediction Software pipelining for Intel Itanium There are other useful Intel software tools: Performance Analyzer VTune Threading Tools Cluster Tools Specific optimization for Woodcrest

6 Useful Intel Compiler Options for Woodcrest -O2 Turns on default optimizations for speed -O3 Enables -O2 optimization level and performs more aggressive optimizations, in particular, loop transformations -ip/ipo Enables single multi ile interprocedural optimizations -no-prec-div, -no-prec-sqrt Enable use of faster but slightly less accurate algorithms for division and square root (it may affect floating-point accuracy) -xt Enables use of specific optimization for Woodcrest -unroll0 Disables unrolling the loops in the file

7 Profiling with Intel Performance Analyzer VTune Performs exhaustive data collection Has multiple useful display options that help a developer quickly locate hotspot parts of the code and determine the strategy of performance improvement Multiple data views Very intuitive user interface Easy switching to assembly view and assembly instruction events

8 Decompositions of WRF2.1.1 CONUS 12km Options: -O3 ip xt; ppn=4; 0 I/O servers Number of cores Decomposition Wall time (sec) Speed up to default decompositions 16 Default(4x4) x % 32 Default(4x8) x % 64 Default (8x8) x % 128 Default (8x16) x % 256 Default (16x16) x %

9 Timings for WRF CONUS 12km 2400 time steps WRF CONUS 12km Time of full run (secs) Number of cores

10 Scalability for WRF CONUS 12km 2400 time steps WRF CONUS 12km Scalability for average time per step Number of cores

11 Comparison of Woodcrest and a previous Xeon processor Irwindale 8 7 Average time per step (secs) Woodcrest 3.0GHz Xeon DP 3.6GHz 2MB L2 8GB RAM Number of cpus

12 NOAA WRF based codes Workstream 8: WRF 5KM CHEM. This benchmark utilizes the WRF under development with cooperation from multiple government and academic agencies. This version of WRF is based on the Advanced Research WRF Eulerian mass coordinate. The benchmark includes code to produce chemical tracers and incorporates cloud chemistry code to predict chemical interaction and dispersion. Workstream 9: WRF 5KM SI. This benchmark is a test of the WRF Advanced Research version (ARW). The test contains six individual WRF tests with sample output and results for each. These six tests are: squall2d_x, squall2d_y, 3D quarter-circle shear supercell simulation, 2D flow over a bellshaped hill, 3D baroclinic wave, and 2D gravity current.

13 Timings for NOAA WS8 & WS9 benchmarks WS8 36 hours simulation runs WS9 6 hours simulation runs Time (secs) Number of cores Time (secs) Number of cores

14 Conclusions: Running WRF and WRF-based applications on the Woodcrest processor showed very high efficiency of the processor both in computations and scalability. Working with WRF Woodcrest beats previous Intel Xeon processors All benchmarks passed validation without any special efforts There is a number of hot-spots that have not be processed yet We are going to explore more intensively Intel MKL for WRF optimization: to use not only its vectorized math functions but more complex routines and solvers

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012 Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

Performance Evaluation of MPI on Weather and Hydrological Models

Performance Evaluation of MPI on Weather and Hydrological Models NCAR/RAL Performance Evaluation of MPI on Weather and Hydrological Models Alessandro Fanfarillo elfanfa@ucar.edu August 8th 2018 Cheyenne - NCAR Supercomputer Cheyenne is a 5.34-petaflops, high-performance

More information

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017 HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher

More information

Optimization strategy for MASNUM surface wave model

Optimization strategy for MASNUM surface wave model Hillsboro, September 27, 2018 Optimization strategy for MASNUM surface wave model Zhenya Song *, + * First Institute of Oceanography (FIO), State Oceanic Administrative (SOA), China + Intel Parallel Computing

More information

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017 Quantum ESPRESSO Performance Benchmark and Profiling February 2017 2 Note The following research was performed under the HPC Advisory Council activities Compute resource - HPC Advisory Council Cluster

More information

One Optimized I/O Configuration per HPC Application

One Optimized I/O Configuration per HPC Application One Optimized I/O Configuration per HPC Application Leveraging I/O Configurability of Amazon EC2 Cloud Mingliang Liu, Jidong Zhai, Yan Zhai Tsinghua University Xiaosong Ma North Carolina State University

More information

Parallelization of the Molecular Orbital Program MOS-F

Parallelization of the Molecular Orbital Program MOS-F Parallelization of the Molecular Orbital Program MOS-F Akira Asato, Satoshi Onodera, Yoshie Inada, Elena Akhmatskaya, Ross Nobes, Azuma Matsuura, Atsuya Takahashi November 2003 Fujitsu Laboratories of

More information

Some notes on efficient computing and setting up high performance computing environments

Some notes on efficient computing and setting up high performance computing environments Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient

More information

Performance Analysis of Lattice QCD Application with APGAS Programming Model

Performance Analysis of Lattice QCD Application with APGAS Programming Model Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models

More information

Presentation Outline

Presentation Outline Parallel Multi-Zone Methods for Large- Scale Multidisciplinary Computational Physics Simulations Ding Li, Guoping Xia and Charles L. Merkle Purdue University The 6th International Conference on Linux Clusters

More information

SPECIAL PROJECT PROGRESS REPORT

SPECIAL PROJECT PROGRESS REPORT SPECIAL PROJECT PROGRESS REPORT Progress Reports should be 2 to 10 pages in length, depending on importance of the project. All the following mandatory information needs to be provided. Reporting year

More information

Recent Progress of Parallel SAMCEF with MUMPS MUMPS User Group Meeting 2013

Recent Progress of Parallel SAMCEF with MUMPS MUMPS User Group Meeting 2013 Recent Progress of Parallel SAMCEF with User Group Meeting 213 Jean-Pierre Delsemme Product Development Manager Summary SAMCEF, a brief history Co-simulation, a good candidate for parallel processing MAAXIMUS,

More information

Performance of WRF using UPC

Performance of WRF using UPC Performance of WRF using UPC Hee-Sik Kim and Jong-Gwan Do * Cray Korea ABSTRACT: The Weather Research and Forecasting (WRF) model is a next-generation mesoscale numerical weather prediction system. We

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

Acceleration of WRF on the GPU

Acceleration of WRF on the GPU Acceleration of WRF on the GPU Daniel Abdi, Sam Elliott, Iman Gohari Don Berchoff, Gene Pache, John Manobianco TempoQuest 1434 Spruce Street Boulder, CO 80302 720 726 9032 TempoQuest.com THE WORLD S FASTEST

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

A Data Communication Reliability and Trustability Study for Cluster Computing

A Data Communication Reliability and Trustability Study for Cluster Computing A Data Communication Reliability and Trustability Study for Cluster Computing Speaker: Eduardo Colmenares Midwestern State University Wichita Falls, TX HPC Introduction Relevant to a variety of sciences,

More information

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Some thoughts about energy efficient application execution on NEC LX Series compute clusters Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science

More information

VMware VMmark V1.1 Results

VMware VMmark V1.1 Results Vendor and Hardware Platform: IBM System x3950 M2 Virtualization Platform: VMware ESX 3.5.0 U2 Build 110181 Performance VMware VMmark V1.1 Results Tested By: IBM Inc., RTP, NC Test Date: 2008-09-20 Performance

More information

Cluster Computing: Updraft. Charles Reid Scientific Computing Summer Workshop June 29, 2010

Cluster Computing: Updraft. Charles Reid Scientific Computing Summer Workshop June 29, 2010 Cluster Computing: Updraft Charles Reid Scientific Computing Summer Workshop June 29, 2010 Updraft Cluster: Hardware 256 Dual Quad-Core Nodes 2048 Cores 2.8 GHz Intel Xeon Processors 16 GB memory per

More information

The next-generation supercomputer and NWP system of the JMA

The next-generation supercomputer and NWP system of the JMA The next-generation supercomputer and NWP system of the JMA Masami NARITA m_narita@naps.kishou.go.jp Numerical Prediction Division (NPD), Japan Meteorological Agency (JMA) Purpose of supercomputer & NWP

More information

Advanced Computing Systems for Scientific Research

Advanced Computing Systems for Scientific Research Undergraduate Review Volume 10 Article 13 2014 Advanced Computing Systems for Scientific Research Jared Buckley Jason Covert Talia Martin Recommended Citation Buckley, Jared; Covert, Jason; and Martin,

More information

Quantum Chemical Calculations by Parallel Computer from Commodity PC Components

Quantum Chemical Calculations by Parallel Computer from Commodity PC Components Nonlinear Analysis: Modelling and Control, 2007, Vol. 12, No. 4, 461 468 Quantum Chemical Calculations by Parallel Computer from Commodity PC Components S. Bekešienė 1, S. Sėrikovienė 2 1 Institute of

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

THE WEATHER RESEARCH AND FORECAST MODEL VERSION 2.0

THE WEATHER RESEARCH AND FORECAST MODEL VERSION 2.0 THE WEATHER RESEARCH AND FORECAST MODEL VERSION 2.0 J. MICHALAKES, J. DUDHIA, D. GILL J. KLEMP, W. SKAMAROCK, W. WANG Mesoscale and Microscale Meteorology National Center for Atmospheric Research Boulder,

More information

Benchmarking program performance evaluation of Parallel programming language XcalableMP on Many core processor

Benchmarking program performance evaluation of Parallel programming language XcalableMP on Many core processor XcalableMP 1 2 2 2 Xeon Phi Xeon XcalableMP HIMENO L Phi XL 16 Xeon 1 16 Phi XcalableMP MPI XcalableMP OpenMP 16 2048 Benchmarking program performance evaluation of Parallel programming language XcalableMP

More information

Benchmark of the CPMD code on CRESCO HPC Facilities for Numerical Simulation of a Magnesium Nanoparticle.

Benchmark of the CPMD code on CRESCO HPC Facilities for Numerical Simulation of a Magnesium Nanoparticle. Benchmark of the CPMD code on CRESCO HPC Facilities for Numerical Simulation of a Magnesium Nanoparticle. Simone Giusepponi a), Massimo Celino b), Salvatore Podda a), Giovanni Bracco a), Silvio Migliori

More information

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville Performance of the fusion code GYRO on three four generations of Crays Mark Fahey mfahey@utk.edu University of Tennessee, Knoxville Contents Introduction GYRO Overview Benchmark Problem Test Platforms

More information

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic

More information

Domain Decomposition-based contour integration eigenvalue solvers

Domain Decomposition-based contour integration eigenvalue solvers Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM

More information

Hellenic National Meteorological Service (HNMS) GREECE

Hellenic National Meteorological Service (HNMS) GREECE WWW TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA- PROCESSING AND FORECASTING SYSTEM (GDPFS), AND THE ANNUAL NUMERICAL WEATHER PREDICTION (NWP) PROGRESS REPORT FOR THE YEAR 2005 Hellenic National Meteorological

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Katharina Kormann 1 Klaus Reuter 2 Markus Rampp 2 Eric Sonnendrücker 1 1 Max Planck Institut für Plasmaphysik 2 Max Planck Computing

More information

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

WRF Modeling System Overview

WRF Modeling System Overview WRF Modeling System Overview Jimy Dudhia What is WRF? WRF: Weather Research and Forecasting Model Used for both research and operational forecasting It is a supported community model, i.e. a free and shared

More information

Scalable and Power-Efficient Data Mining Kernels

Scalable and Power-Efficient Data Mining Kernels Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the

More information

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing

More information

Parallel Polynomial Evaluation

Parallel Polynomial Evaluation Parallel Polynomial Evaluation Jan Verschelde joint work with Genady Yoffe University of Illinois at Chicago Department of Mathematics, Statistics, and Computer Science http://www.math.uic.edu/ jan jan@math.uic.edu

More information

Software optimization for petaflops/s scale Quantum Monte Carlo simulations

Software optimization for petaflops/s scale Quantum Monte Carlo simulations Software optimization for petaflops/s scale Quantum Monte Carlo simulations A. Scemama 1, M. Caffarel 1, E. Oseret 2, W. Jalby 2 1 Laboratoire de Chimie et Physique Quantiques / IRSAMC, Toulouse, France

More information

First, a look at using OpenACC on WRF subroutine advance_w dynamics routine

First, a look at using OpenACC on WRF subroutine advance_w dynamics routine First, a look at using OpenACC on WRF subroutine advance_w dynamics routine Second, an estimate of WRF multi-node performance on Cray XK6 with GPU accelerators Based on performance of WRF kernels, what

More information

Research of the new Intel Xeon Phi architecture for solving a wide range of scientific problems at JINR

Research of the new Intel Xeon Phi architecture for solving a wide range of scientific problems at JINR Research of the new Intel Xeon Phi architecture for solving a wide range of scientific problems at JINR Podgainy D.V., Streltsova O.I., Zuev M.I. on behalf of Heterogeneous Computations team HybriLIT LIT,

More information

DIMACS Workshop on Parallelism: A 2020 Vision Lattice Basis Reduction and Multi-Core

DIMACS Workshop on Parallelism: A 2020 Vision Lattice Basis Reduction and Multi-Core DIMACS Workshop on Parallelism: A 2020 Vision Lattice Basis Reduction and Multi-Core Werner Backes and Susanne Wetzel Stevens Institute of Technology 29th March 2011 Work supported through NSF Grant DUE

More information

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory MAX PLANCK INSTITUTE November 5, 2010 MPI at MPI Jens Saak Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory FOR DYNAMICS OF COMPLEX TECHNICAL

More information

INITIAL INTEGRATION AND EVALUATION

INITIAL INTEGRATION AND EVALUATION INITIAL INTEGRATION AND EVALUATION OF SLATE PARALLEL BLAS IN LATTE Marc Cawkwell, Danny Perez, Arthur Voter Asim YarKhan, Gerald Ragghianti, Jack Dongarra, Introduction The aim of the joint milestone STMS10-52

More information

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015 Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done

More information

arxiv: v1 [hep-lat] 10 Jul 2012

arxiv: v1 [hep-lat] 10 Jul 2012 Hybrid Monte Carlo with Wilson Dirac operator on the Fermi GPU Abhijit Chakrabarty Electra Design Automation, SDF Building, SaltLake Sec-V, Kolkata - 700091. Pushan Majumdar Dept. of Theoretical Physics,

More information

High-Performance Computing and Groundbreaking Applications

High-Performance Computing and Groundbreaking Applications INSTITUTE OF INFORMATION AND COMMUNICATION TECHNOLOGIES BULGARIAN ACADEMY OF SCIENCE High-Performance Computing and Groundbreaking Applications Svetozar Margenov Institute of Information and Communication

More information

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information

Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors

Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr) Principal Researcher / Korea Institute of Science and Technology

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

Using AmgX to accelerate a PETSc-based immersed-boundary method code

Using AmgX to accelerate a PETSc-based immersed-boundary method code 29th International Conference on Parallel Computational Fluid Dynamics May 15-17, 2017; Glasgow, Scotland Using AmgX to accelerate a PETSc-based immersed-boundary method code Olivier Mesnard, Pi-Yueh Chuang,

More information

WRF Modeling System Overview

WRF Modeling System Overview WRF Modeling System Overview Wei Wang & Jimy Dudhia Nansha, Guangdong, China December 2015 What is WRF? WRF: Weather Research and Forecasting Model Used for both research and operational forecasting It

More information

Performance Results for the Weather Research and Forecast (WRF) Model on AHPCRC HPC Systems

Performance Results for the Weather Research and Forecast (WRF) Model on AHPCRC HPC Systems Performance Results for the Weather Research and Forecast (WRF) Model on AHPCRC HPC Systems Tony Meys, Army High Performance Computing Research Center / Network Computing Services, Inc. ABSTRACT: The Army

More information

GPU Acceleration of Weather Forecasting and Meteorological Satellite Data Assimilation, Processing and Applications http://www.tempoquest.com Allen Huang, Ph.D. allen@tempoquest.com CTO, Tempo Quest Inc.

More information

NHM Tutorial Part I. Brief Usage of the NHM

NHM Tutorial Part I. Brief Usage of the NHM 1 / 22 NHM Tutorial Part I. Brief Usage of the NHM Syugo HAYASHI (Forecast Research Department / Meteorological Research Institute) 2 / 18 Overall Index (Tutorial_0~3) 0. What is the NHM? NHM_Tutorial_0.ppt

More information

Scaling the Software and Advancing the Science of Global Modeling and Assimilation Systems at NASA. Bill Putman

Scaling the Software and Advancing the Science of Global Modeling and Assimilation Systems at NASA. Bill Putman Global Modeling and Assimilation Office Scaling the Software and Advancing the Science of Global Modeling and Assimilation Systems at NASA Bill Putman Max Suarez, Lawrence Takacs, Atanas Trayanov and Hamid

More information

Schwarz-type methods and their application in geomechanics

Schwarz-type methods and their application in geomechanics Schwarz-type methods and their application in geomechanics R. Blaheta, O. Jakl, K. Krečmer, J. Starý Institute of Geonics AS CR, Ostrava, Czech Republic E-mail: stary@ugn.cas.cz PDEMAMIP, September 7-11,

More information

A framework for detailed multiphase cloud modeling on HPC systems

A framework for detailed multiphase cloud modeling on HPC systems Center for Information Services and High Performance Computing (ZIH) A framework for detailed multiphase cloud modeling on HPC systems ParCo 2009, 3. September 2009, ENS Lyon, France Matthias Lieber a,

More information

Preconditioned Parallel Block Jacobi SVD Algorithm

Preconditioned Parallel Block Jacobi SVD Algorithm Parallel Numerics 5, 15-24 M. Vajteršic, R. Trobec, P. Zinterhof, A. Uhl (Eds.) Chapter 2: Matrix Algebra ISBN 961-633-67-8 Preconditioned Parallel Block Jacobi SVD Algorithm Gabriel Okša 1, Marián Vajteršic

More information

WRF Modeling System Overview

WRF Modeling System Overview WRF Modeling System Overview Jimy Dudhia What is WRF? WRF: Weather Research and Forecasting Model Used for both research and operational forecasting It is a supported community model, i.e. a free and shared

More information

Perm State University Research-Education Center Parallel and Distributed Computing

Perm State University Research-Education Center Parallel and Distributed Computing Perm State University Research-Education Center Parallel and Distributed Computing A 25-minute Talk (S4493) at the GPU Technology Conference (GTC) 2014 MARCH 24-27, 2014 SAN JOSE, CA GPU-accelerated modeling

More information

HPMPC - A new software package with efficient solvers for Model Predictive Control

HPMPC - A new software package with efficient solvers for Model Predictive Control - A new software package with efficient solvers for Model Predictive Control Technical University of Denmark CITIES Second General Consortium Meeting, DTU, Lyngby Campus, 26-27 May 2015 Introduction Model

More information

Optimizing Time Integration of Chemical-Kinetic Networks for Speed and Accuracy

Optimizing Time Integration of Chemical-Kinetic Networks for Speed and Accuracy Paper # 070RK-0363 Topic: Reaction Kinetics 8 th U. S. National Combustion Meeting Organized by the Western States Section of the Combustion Institute and hosted by the University of Utah May 19-22, 2013

More information

Giac and GeoGebra: improved Gröbner basis computations

Giac and GeoGebra: improved Gröbner basis computations Giac and GeoGebra: improved Gröbner basis computations Z. Kovács, B. Parisse JKU Linz, University of Grenoble I November 25, 2013 Two parts talk 1 GeoGebra (Z. Kovács) 2 (B. Parisse) History of used CAS

More information

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters --

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters -- Parallel Processing for Energy Efficiency October 3, 2013 NTNU, Trondheim, Norway Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer

More information

Open-source finite element solver for domain decomposition problems

Open-source finite element solver for domain decomposition problems 1/29 Open-source finite element solver for domain decomposition problems C. Geuzaine 1, X. Antoine 2,3, D. Colignon 1, M. El Bouajaji 3,2 and B. Thierry 4 1 - University of Liège, Belgium 2 - University

More information

Stochastic Modelling of Electron Transport on different HPC architectures

Stochastic Modelling of Electron Transport on different HPC architectures Stochastic Modelling of Electron Transport on different HPC architectures www.hp-see.eu E. Atanassov, T. Gurov, A. Karaivan ova Institute of Information and Communication Technologies Bulgarian Academy

More information

STCE. Adjoint Code Design Patterns. Uwe Naumann. RWTH Aachen University, Germany. QuanTech Conference, London, April 2016

STCE. Adjoint Code Design Patterns. Uwe Naumann. RWTH Aachen University, Germany. QuanTech Conference, London, April 2016 Adjoint Code Design Patterns Uwe Naumann RWTH Aachen University, Germany QuanTech Conference, London, April 2016 Outline Why Adjoints? What Are Adjoints? Software Tool Support: dco/c++ Adjoint Code Design

More information

S8241 VERSIONING GPU- ACCLERATED WRF TO Jeff Adie, 26 March, 2018 (Presented by Stan Posey, NVIDIA)

S8241 VERSIONING GPU- ACCLERATED WRF TO Jeff Adie, 26 March, 2018 (Presented by Stan Posey, NVIDIA) S8241 VERSIONING GPU- ACCLERATED WRF TO 3.7.1 Jeff Adie, 26 March, 2018 (Presented by Stan Posey, NVIDIA) 1 ACKNOWLEDGEMENT The work presented here today would not have been possible without the efforts

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago

More information

FPGA Implementation of a Predictive Controller

FPGA Implementation of a Predictive Controller FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

More information

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0 NEC PerforCache Influence on M-Series Disk Array Behavior and Performance. Version 1.0 Preface This document describes L2 (Level 2) Cache Technology which is a feature of NEC M-Series Disk Array implemented

More information

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is

More information

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,

More information

Parallel Performance Studies for a Numerical Simulator of Atomic Layer Deposition Michael J. Reid

Parallel Performance Studies for a Numerical Simulator of Atomic Layer Deposition Michael J. Reid Section 1: Introduction Parallel Performance Studies for a Numerical Simulator of Atomic Layer Deposition Michael J. Reid During the manufacture of integrated circuits, a process called atomic layer deposition

More information

Accelerating Model Reduction of Large Linear Systems with Graphics Processors

Accelerating Model Reduction of Large Linear Systems with Graphics Processors Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex

More information

Lattice QCD with Domain Decomposition on Intel R Xeon Phi TM

Lattice QCD with Domain Decomposition on Intel R Xeon Phi TM Lattice QCD with Domain Decomposition on Intel R Xeon Phi TM Co-Processors Simon Heybrock, Bálint Joó, Dhiraj D. Kalamkar, Mikhail Smelyanskiy, Karthikeyan Vaidyanathan, Tilo Wettig, and Pradeep Dubey

More information

Lenstool-HPC. From scratch to supercomputers: building a large-scale strong lensing computational software bottom-up. HPC Advisory Council, April 2018

Lenstool-HPC. From scratch to supercomputers: building a large-scale strong lensing computational software bottom-up. HPC Advisory Council, April 2018 LenstoolHPC From scratch to supercomputers: building a largescale strong lensing computational software bottomup HPC Advisory Council, April 2018 Christoph Schäfer and Markus Rexroth (LASTRO) Gilles Fourestey

More information

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Khramtsov D.P., Nekrasov D.A., Pokusaev B.G. Department of Thermodynamics, Thermal Engineering and Energy Saving Technologies,

More information

ArcGIS Deployment Pattern. Azlina Mahad

ArcGIS Deployment Pattern. Azlina Mahad ArcGIS Deployment Pattern Azlina Mahad Agenda Deployment Options Cloud Portal ArcGIS Server Data Publication Mobile System Management Desktop Web Device ArcGIS An Integrated Web GIS Platform Portal Providing

More information

B629 project - StreamIt MPI Backend. Nilesh Mahajan

B629 project - StreamIt MPI Backend. Nilesh Mahajan B629 project - StreamIt MPI Backend Nilesh Mahajan March 26, 2013 Abstract StreamIt is a language based on the dataflow model of computation. StreamIt consists of computation units called filters connected

More information

Introduction The Nature of High-Performance Computation

Introduction The Nature of High-Performance Computation 1 Introduction The Nature of High-Performance Computation The need for speed. Since the beginning of the era of the modern digital computer in the early 1940s, computing power has increased at an exponential

More information

Highly-scalable branch and bound for maximum monomial agreement

Highly-scalable branch and bound for maximum monomial agreement Highly-scalable branch and bound for maximum monomial agreement Jonathan Eckstein (Rutgers) William Hart Cynthia A. Phillips Sandia National Laboratories Sandia National Laboratories is a multi-program

More information

Weather Research and Forecasting Model

Weather Research and Forecasting Model Weather Research and Forecasting Model Goals: Develop an advanced mesoscale forecast and assimilation system, and accelerate research advances into operations 36h WRF Precip Forecast Collaborative partnership,

More information

Multi-GPU Simulations of the Infinite Universe

Multi-GPU Simulations of the Infinite Universe () Multi-GPU of the Infinite with with G. Rácz, I. Szapudi & L. Dobos Physics of Complex Systems Department Eötvös Loránd University, Budapest June 22, 2018, Budapest, Hungary Outline 1 () 2 () Concordance

More information

WRF Modeling System Overview

WRF Modeling System Overview WRF Modeling System Overview Jimy Dudhia What is WRF? WRF: Weather Research and Forecasting Model Used for both research and operational forecasting It is a supported community model, i.e. a free and shared

More information

On aggressive early deflation in parallel variants of the QR algorithm

On aggressive early deflation in parallel variants of the QR algorithm On aggressive early deflation in parallel variants of the QR algorithm Bo Kågström 1, Daniel Kressner 2, and Meiyue Shao 1 1 Department of Computing Science and HPC2N Umeå University, S-901 87 Umeå, Sweden

More information

pr,. U I V --- AFRL.SR-AR-T 12. DISTRIBUTION/AVAILABILITY STATEMENT REPORT DOCUMENTATION PAGE OMB No

pr,. U I V --- AFRL.SR-AR-T 12. DISTRIBUTION/AVAILABILITY STATEMENT REPORT DOCUMENTATION PAGE OMB No RO Form Approved REPORT DOCUMENTATION PAGE OMB No. 0704-0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing

More information

Binding Performance and Power of Dense Linear Algebra Operations

Binding Performance and Power of Dense Linear Algebra Operations 10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique

More information

On Boolean Encodings of Transition Relation for Parallel Compositions of Transition Systems

On Boolean Encodings of Transition Relation for Parallel Compositions of Transition Systems On Boolean Encodings of Transition Relation for Parallel Compositions of Transition Systems Extended abstract Andrzej Zbrzezny IMCS, Jan Długosz University in Częstochowa, Al. Armii Krajowej 13/15, 42-2

More information

Cactus Tools for Petascale Computing

Cactus Tools for Petascale Computing Cactus Tools for Petascale Computing Erik Schnetter Reno, November 2007 Gamma Ray Bursts ~10 7 km He Protoneutron Star Accretion Collapse to a Black Hole Jet Formation and Sustainment Fe-group nuclei Si

More information

Sparse Polynomial Multiplication and Division in Maple 14

Sparse Polynomial Multiplication and Division in Maple 14 Sparse Polynomial Multiplication and Division in Maple 4 Michael Monagan and Roman Pearce Department of Mathematics, Simon Fraser University Burnaby B.C. V5A S6, Canada October 5, 9 Abstract We report

More information

Acoustics Analysis of Speaker ANSYS, Inc. November 28, 2014

Acoustics Analysis of Speaker ANSYS, Inc. November 28, 2014 Acoustics Analysis of Speaker 1 Introduction ANSYS 14.0 offers many enhancements in the area of acoustics. In this presentation, an example speaker analysis will be shown to highlight some of the acoustics

More information

Multi-Approximate-Keyword Routing Query

Multi-Approximate-Keyword Routing Query Bin Yao 1, Mingwang Tang 2, Feifei Li 2 1 Department of Computer Science and Engineering Shanghai Jiao Tong University, P. R. China 2 School of Computing University of Utah, USA Outline 1 Introduction

More information

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information

Porting a sphere optimization program from LAPACK to ScaLAPACK

Porting a sphere optimization program from LAPACK to ScaLAPACK Porting a sphere optimization program from LAPACK to ScaLAPACK Mathematical Sciences Institute, Australian National University. For presentation at Computational Techniques and Applications Conference

More information

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Yuta Hirokawa Graduate School of Systems and Information Engineering, University of Tsukuba hirokawa@hpcs.cs.tsukuba.ac.jp

More information

The Performance Evolution of the Parallel Ocean Program on the Cray X1

The Performance Evolution of the Parallel Ocean Program on the Cray X1 The Performance Evolution of the Parallel Ocean Program on the Cray X1 Patrick H. Worley Oak Ridge National Laboratory John Levesque Cray Inc. 46th Cray User Group Conference May 18, 2003 Knoxville Marriott

More information