Efficient multigrid solvers for mixed finite element discretisations in NWP models

Size: px
Start display at page:

Download "Efficient multigrid solvers for mixed finite element discretisations in NWP models"

Transcription

1 1/20 Efficient multigrid solvers for mixed finite element discretisations in NWP models Colin Cotter, David Ham, Lawrence Mitchell, Eike Hermann Müller *, Robert Scheichl * * University of Bath, Imperial College London 16th ECMWF Workshop on High Performance Computing in Meteorology Oct 2014

2 2/20 Motivation Separation of concerns and high-level abstractions Equations, Algorithms discretisation Parallel Optimised Code Domain specialist Computational Scientist Equations, Algorithms High level code PETSc firedrake PyOP2 * Parallel Optimised Code Use case: complex iterative solver for pressure correction eqn. * All credit for developing firedrake/pyop2 goes to David Ham and his group at Imperial, I m giving this talk as a user

3 3/20 Motivation Computational challenge in numerical forecasts: Solver for elliptic pressure correction equation (algorithmic + parallel) scalability & performance Number of CPU cores Titan theoretical peak PetaFLOP Total solution time [s] 1 FLOPs single GPU peak TeraFLOP CG Titan [nvidia Kepler] 512 Multigrid Hector [AMD Opteron] Number of GPUs Weak scaling, total solution time CG 512 Multigrid 10 9 GigaFLOP Number of GPUs Absolute performance dofs on GPUs of TITAN, 0.78 PFLOPs, 20%-50% BW Finite volume discretisation, simpler geometry

4 4/20 Motivation Challenges Finite element discretisation (Met Office next generation dyn. core), unstructured grids more complicated Duplicate effort for CPU, GPU (Xeon Phi,... ) implementation & optimisation, reinvent the wheel? Mix two issues: algorithmic (domain specialist/numerical analyst) parallelisation & low level optimisation (computational scientist) Goal Separation of concerns (algorithmic computational) rapid algorithm development and testing at high abstraction level Performance portability Reuse existing, tested and optimised tools and libraries Choice of correct abstraction(s)

5 5/20 This talk Iterative solver for Helmholtz equation: φ + ω (φ u) = r φ u ω φ = r u Mixed finite element discretisation, icosahedral grid Matrix-free multigrid preconditioner for pressure correction Performance portable firedrake/pyop2 toolchain [Rathgeber et al., (2012 & 2014)] Python automatic generation of optimised C code Collaboration between Numerical algorithm developers (Colin, Rob, Eike) Computational scientists, Library developers (David, Lawrence, {PETSc,PyOP2,firedrake} teams)

6 6/20 Shallow water equations Model system: shallow water equations u t φ t + (φu) = 0 + (u ) u + g φ = 0 Implicit timestepping linear system for velocity u and height perturbation φ φ + ω (φ u) = r φ u ω φ = r u reference state φ 1 ω = c h t = gφ t

7 7/20 Mixed system Finite element discretisation: φ V 2, u V 1 Matrix system for dof-vectors Φ R n V 2, U R n V 1 A ( ) ( ) ( ) Φ Mφ ωb Φ = U ωb T = M u U ( Rφ R u ) Mixed finite elements DG 0 + RT 1 (lowest order) DG 1 + BDFM 1 (higher order) (M φ ) ij β i (x)β j (x) dx, Ω B ie v e (x)β i (x) dx Ω (M u ) ef v e (x) v f (x) dx Ω (derivative matrix) (mass matrix)

8 8/20 Iterative solver Iterative solver 1 Outer iteration (mixed system): GMRES, Richardson, BiCGStab,... Preconditioner: ( ) Mφ ωb P ωb T Mu lumped velocity mass matrix ( ) ( ) ( ) P H ωb(m = u ) 1 (Mu) 1 ωb T 1 0 (Mu) Elliptic positive definite Helmholtz operator H = M φ + ω 2 B (M u ) 1 B T 2 Inner iteration (pressure system Hφ = R φ ): CG with geometric multigrid preconditioner

9 9/20 Multigrid Multigrid V-cycle smooth smooth smooth x 1/4 smooth smooth smooth smooth x1 x 1/16 x 1/64 Computational bottleneck Smoother and operator application φ0 7 φ0 + ρrelax DH 1 Rφ0 H φ0 H = Mφ + ω2 B (Mu ) 1 BT Intrinsic length scale Λ/ x = cs t / x νcfl = O(10) grid spacings ( resolutions x) small number of multigrid levels/coarse smoother sufficient

10 10/20 Grid iteration in FEM codes Computational bottlenecks in finite element codes u m e (1) u m e (3) ɸ e um (2) e Example: Weak divergence φ = u Φ = BU φ e φ e + b (e) 1 u me(1) + b (e) 2 u me(2) + b (e) 3 u me(3) Loop over topological entities (e.g. cells). In each cell Access local dofs (e.g. pressure and velocity) Execute local kernel Write data back Manage halo exchanges, thread parallelism

11 11/20 Firedrake/PyOP2 framework Algorithms developers, num. analysts Eike, Rob, Colin Library developers, comp. scientists Lawrence, David, {PyOP2 firedrake, PETSc}developers Firedrake Interface Geometry, (non)linear solves assembly, compiled expressions PETSc4py (KSP, SNES, DMPlex) Meshes, matrices, vectors data structures (Set, Map, Dat) Parallel loops: kernels executed over mesh Unified Form Language (UFL) modified FFC PyOP2 Interface Parallel scheduling, code generation MPI parallel loop CPU GPU (OpenMP/ (PyCUDA / OpenCL) PyOpenCL) parallel loop Future arch. Problem definition in FEM weak form FIAT Local assembly kernels (AST) COFFEE AST optimizer Explicitly parallel hardwarespecific implementation Firedrake Performanceportable Finite-element computation framework PyOP2 Parallel unstructured mesh computation framework [Florian Rathgeber, PhD thesis, Imperical College (2014)]

12 12/20 UFL Unified Form Language [Alnæs et al. (2014)] Example: Bilinear form a(u, v) = u(x)v(x) dx Python code from firedrake import * mesh = UnitSquareMesh(8,8) V = FunctionSpace(mesh, CG,1) u = Function(V) v = Function(V) a_uv = assemble(u*v*dx) Ω

13 13/20 Generated code C-Code generated by Firedrake/PyOP2 Kernel (below) execution over grid (right) OpenMP backend

14 14/20 Putting it all together 1, 600 lines of highly modular python code (incl. comments) 1 Use UFL wherever possible φ( u) dx assemble(phi*div(u)*dx) (weak divergence operator) 2 Use PETSc for iterative solvers provide operators/preconditioners as callbacks 3 Escape hatch: Write direct PyOP2 parallel loops lumped mass matrix division: U (M u ) 1 U kernel_code = void lumped_mass_divide(double **m,double **U) { for (int i=0; i<4; ++i) U[i][0] /= m[0][i]; } kernel = op2.kernel(kernel_code,"lumped_mass_divide") op2.par_loop(kernel,facetset, m.dat(op2.read,self.facet2dof_map_facets), u.dat(op2.rw,self.facet2dof_map_bdfm1))

15 15/20 Helmholtz operator Helmholtz operator Hφ = M φ φ + ω 2 B(M u ) 1 B T φ Operator application class Operator(object): [...] def apply(self,phi): psi = TestFunction(V_pressure) w = TestFunction(V_velocity) B_phi = assemble(div(w)*phi*dx) self.velocity_mass_matrix.divide(b_phi) BT_B_phi = assemble(psi*div(b_phi)*dx) M_phi = assemble(psi*phi*dx) return assemble(m_phi + self.omega**2*bt_b_phi) Interface with PETSc petsc_op = PETSc.Mat().create() petsc_op.setpythoncontext(operator(omega)) petsc_op.setup() ksp = PETSc.KSP() ksp.create() ksp.setoperators(petsc_op)

16 16/20 Algorithmic performance Convergence history Inner Solve: 3 CG iterations with multigrid preconditioner Icosahedral grid, 327,680 cells 4 multigrid levels relative residual r 2 / r DG 0 +RT 0 Richardson DG 0 +RT 0 GMRES DG 1 +BDFM 1 Richardson DG 1 +BDFM 1 GMRES iteration

17 17/20 Efficiency Single node run on ARCHER, lowest order, cells [preliminary] Total solve Pressure solve Helmholtz operator 17.1GB/s 1.3GB/s 4.7GB/s 10.6GB/s time [s] STREAM triad: 74.1GB/s per node up to 23% of peak BW

18 18/20 Parallel scalability Weak scaling on ARCHER [preliminary] 10 2 Number of cells hypre BoomerAMG matrix-free MG DG 1 +BDFM 1 Solution time [s] mio 5.2mio DG 0 +RT mio 83.9mio 335.5mio Number of cores

19 19/20 Conclusion Summary Outlook Matrix-free multigrid-preconditioner for pressure correction equation Implementation in firedrake/pyop2/petsc framework: Performance portability Correct abstractions Separation of concerns Test on GPU backend Extend to 3d: Regular vertical grid Improved caching Improved memory BW Improve and extend parallel scaling Full 3d dynamical core (Colin Cotter)

20 20/20 References The firedrake project: F. Rathgeber et al.: Firedrake: automating the finite element method by composing abstractions. in preparation F. Rathgeber et al.: PyOP2: A High-Level Framework for Performance-Portable Simulations on Unstructured Meshes. HPC, Networking Storage and Analysis, SC Companion, p , Los Alamitos, CA, USA, IEEE Computer Society E. Müller et al.: Petascale elliptic solvers for anisotropic PDEs on GPU clusters. Submitted to Parallel Computing (2014) [arxiv: ] E. Müller, R. Scheichl: Massively parallel solvers for elliptic PDEs in Numerical Weather- and Climate Prediction. QJRMS (2013) [arxiv: ] Helmholtz solver code on github

A primal-dual mixed finite element method. for accurate and efficient atmospheric. modelling on massively parallel computers

A primal-dual mixed finite element method. for accurate and efficient atmospheric. modelling on massively parallel computers A primal-dual mixed finite element method for accurate and efficient atmospheric modelling on massively parallel computers John Thuburn (University of Exeter, UK) Colin Cotter (Imperial College, UK) AMMW03,

More information

Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems

Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems Pierre Jolivet, F. Hecht, F. Nataf, C. Prud homme Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann INRIA Rocquencourt

More information

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information

Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools

Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools Xiangjun Wu (1),Lilun Zhang (2),Junqiang Song (2) and Dehui Chen (1) (1) Center for Numerical Weather Prediction, CMA (2) School

More information

An Overview of HPC at the Met Office

An Overview of HPC at the Met Office An Overview of HPC at the Met Office Paul Selwood Crown copyright 2006 Page 1 Introduction The Met Office National Weather Service for the UK Climate Prediction (Hadley Centre) Operational and Research

More information

Universität Dortmund UCHPC. Performance. Computing for Finite Element Simulations

Universität Dortmund UCHPC. Performance. Computing for Finite Element Simulations technische universität dortmund Universität Dortmund fakultät für mathematik LS III (IAM) UCHPC UnConventional High Performance Computing for Finite Element Simulations S. Turek, Chr. Becker, S. Buijssen,

More information

RWTH Aachen University

RWTH Aachen University IPCC @ RWTH Aachen University Optimization of multibody and long-range solvers in LAMMPS Rodrigo Canales William McDoniel Markus Höhnerbach Ahmed E. Ismail Paolo Bientinesi IPCC Showcase November 2016

More information

A Hybrid Method for the Wave Equation. beilina

A Hybrid Method for the Wave Equation.   beilina A Hybrid Method for the Wave Equation http://www.math.unibas.ch/ beilina 1 The mathematical model The model problem is the wave equation 2 u t 2 = (a 2 u) + f, x Ω R 3, t > 0, (1) u(x, 0) = 0, x Ω, (2)

More information

Progress in Numerical Methods at ECMWF

Progress in Numerical Methods at ECMWF Progress in Numerical Methods at ECMWF EWGLAM / SRNWP October 2016 W. Deconinck, G. Mengaldo, C. Kühnlein, P.K. Smolarkiewicz, N.P. Wedi, P. Bauer willem.deconinck@ecmwf.int ECMWF November 7, 2016 2 The

More information

Distributed Memory Parallelization in NGSolve

Distributed Memory Parallelization in NGSolve Distributed Memory Parallelization in NGSolve Lukas Kogler June, 2017 Inst. for Analysis and Scientific Computing, TU Wien From Shared to Distributed Memory Shared Memory Parallelization via threads (

More information

Introduction to Benchmark Test for Multi-scale Computational Materials Software

Introduction to Benchmark Test for Multi-scale Computational Materials Software Introduction to Benchmark Test for Multi-scale Computational Materials Software Shun Xu*, Jian Zhang, Zhong Jin xushun@sccas.cn Computer Network Information Center Chinese Academy of Sciences (IPCC member)

More information

FEniCS Course. Lecture 0: Introduction to FEM. Contributors Anders Logg, Kent-Andre Mardal

FEniCS Course. Lecture 0: Introduction to FEM. Contributors Anders Logg, Kent-Andre Mardal FEniCS Course Lecture 0: Introduction to FEM Contributors Anders Logg, Kent-Andre Mardal 1 / 46 What is FEM? The finite element method is a framework and a recipe for discretization of mathematical problems

More information

Petascale Quantum Simulations of Nano Systems and Biomolecules

Petascale Quantum Simulations of Nano Systems and Biomolecules Petascale Quantum Simulations of Nano Systems and Biomolecules Emil Briggs North Carolina State University 1. Outline of real-space Multigrid (RMG) 2. Scalability and hybrid/threaded models 3. GPU acceleration

More information

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia On the Paths to Exascale: Crossing the Chasm Presented by Mike Rezny, Monash University, Australia michael.rezny@monash.edu Crossing the Chasm meeting Reading, 24 th October 2016 Version 0.1 In collaboration

More information

Discretization of PDEs and Tools for the Parallel Solution of the Resulting Systems

Discretization of PDEs and Tools for the Parallel Solution of the Resulting Systems Discretization of PDEs and Tools for the Parallel Solution of the Resulting Systems Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee Wednesday April 4,

More information

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters --

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters -- Parallel Processing for Energy Efficiency October 3, 2013 NTNU, Trondheim, Norway Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer

More information

Implicit Solution of Viscous Aerodynamic Flows using the Discontinuous Galerkin Method

Implicit Solution of Viscous Aerodynamic Flows using the Discontinuous Galerkin Method Implicit Solution of Viscous Aerodynamic Flows using the Discontinuous Galerkin Method Per-Olof Persson and Jaime Peraire Massachusetts Institute of Technology 7th World Congress on Computational Mechanics

More information

Toward black-box adaptive domain decomposition methods

Toward black-box adaptive domain decomposition methods Toward black-box adaptive domain decomposition methods Frédéric Nataf Laboratory J.L. Lions (LJLL), CNRS, Alpines Inria and Univ. Paris VI joint work with Victorita Dolean (Univ. Nice Sophia-Antipolis)

More information

Kasetsart University Workshop. Multigrid methods: An introduction

Kasetsart University Workshop. Multigrid methods: An introduction Kasetsart University Workshop Multigrid methods: An introduction Dr. Anand Pardhanani Mathematics Department Earlham College Richmond, Indiana USA pardhan@earlham.edu A copy of these slides is available

More information

- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline

- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline - Part 4 - Multicore and Manycore Technology: Chances and Challenges Vincent Heuveline 1 Numerical Simulation of Tropical Cyclones Goal oriented adaptivity for tropical cyclones ~10⁴km ~1500km ~100km 2

More information

Open-source finite element solver for domain decomposition problems

Open-source finite element solver for domain decomposition problems 1/29 Open-source finite element solver for domain decomposition problems C. Geuzaine 1, X. Antoine 2,3, D. Colignon 1, M. El Bouajaji 3,2 and B. Thierry 4 1 - University of Liège, Belgium 2 - University

More information

Cactus Tools for Petascale Computing

Cactus Tools for Petascale Computing Cactus Tools for Petascale Computing Erik Schnetter Reno, November 2007 Gamma Ray Bursts ~10 7 km He Protoneutron Star Accretion Collapse to a Black Hole Jet Formation and Sustainment Fe-group nuclei Si

More information

Algebraic Multigrid as Solvers and as Preconditioner

Algebraic Multigrid as Solvers and as Preconditioner Ò Algebraic Multigrid as Solvers and as Preconditioner Domenico Lahaye domenico.lahaye@cs.kuleuven.ac.be http://www.cs.kuleuven.ac.be/ domenico/ Department of Computer Science Katholieke Universiteit Leuven

More information

Finite element exterior calculus framework for geophysical fluid dynamics

Finite element exterior calculus framework for geophysical fluid dynamics Finite element exterior calculus framework for geophysical fluid dynamics Colin Cotter Department of Aeronautics Imperial College London Part of ongoing work on UK Gung-Ho Dynamical Core Project funded

More information

A diagnostic interface for ICON Coping with the challenges of high-resolution climate simulations

A diagnostic interface for ICON Coping with the challenges of high-resolution climate simulations DLR.de Chart 1 > Challenges of high-resolution climate simulations > Bastian Kern HPCN Workshop, Göttingen > 10.05.2016 A diagnostic interface for ICON Coping with the challenges of high-resolution climate

More information

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting Thomas C. Schulthess 1 Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Piz Daint Compute node:

More information

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems P.-O. Persson and J. Peraire Massachusetts Institute of Technology 2006 AIAA Aerospace Sciences Meeting, Reno, Nevada January 9,

More information

Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors

Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr) Principal Researcher / Korea Institute of Science and Technology

More information

Using AmgX to accelerate a PETSc-based immersed-boundary method code

Using AmgX to accelerate a PETSc-based immersed-boundary method code 29th International Conference on Parallel Computational Fluid Dynamics May 15-17, 2017; Glasgow, Scotland Using AmgX to accelerate a PETSc-based immersed-boundary method code Olivier Mesnard, Pi-Yueh Chuang,

More information

Multigrid Methods and their application in CFD

Multigrid Methods and their application in CFD Multigrid Methods and their application in CFD Michael Wurst TU München 16.06.2009 1 Multigrid Methods Definition Multigrid (MG) methods in numerical analysis are a group of algorithms for solving differential

More information

On the Paths to Exascale: Will We be Hungry?

On the Paths to Exascale: Will We be Hungry? On the Paths to Exascale: Will We be Hungry? Presentation by Mike Rezny, Monash University, Australia michael.rezny@monash.edu 4th ENES Workshop High Performance Computing for Climate and Weather Toulouse,

More information

The Fast Multipole Method in molecular dynamics

The Fast Multipole Method in molecular dynamics The Fast Multipole Method in molecular dynamics Berk Hess KTH Royal Institute of Technology, Stockholm, Sweden ADAC6 workshop Zurich, 20-06-2018 Slide BioExcel Slide Molecular Dynamics of biomolecules

More information

ECMWF Scalability Programme

ECMWF Scalability Programme ECMWF Scalability Programme Picture: Stan Tomov, ICL, University of Tennessee, Knoxville Peter Bauer, Mike Hawkins, Deborah Salmond, Stephan Siemen, Yannick Trémolet, and Nils Wedi Next generation science

More information

Targeting Extreme Scale Computational Challenges with Heterogeneous Systems

Targeting Extreme Scale Computational Challenges with Heterogeneous Systems Targeting Extreme Scale Computational Challenges with Heterogeneous Systems Oreste Villa, Antonino Tumeo Pacific Northwest Na/onal Laboratory (PNNL) 1 Introduction! PNNL Laboratory Directed Research &

More information

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)

More information

Optimal Left and Right Additive Schwarz Preconditioning for Minimal Residual Methods with Euclidean and Energy Norms

Optimal Left and Right Additive Schwarz Preconditioning for Minimal Residual Methods with Euclidean and Energy Norms Optimal Left and Right Additive Schwarz Preconditioning for Minimal Residual Methods with Euclidean and Energy Norms Marcus Sarkis Worcester Polytechnic Inst., Mass. and IMPA, Rio de Janeiro and Daniel

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

Multigrid solvers for equations arising in implicit MHD simulations

Multigrid solvers for equations arising in implicit MHD simulations Multigrid solvers for equations arising in implicit MHD simulations smoothing Finest Grid Mark F. Adams Department of Applied Physics & Applied Mathematics Columbia University Ravi Samtaney PPPL Achi Brandt

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

A User Friendly Toolbox for Parallel PDE-Solvers

A User Friendly Toolbox for Parallel PDE-Solvers A User Friendly Toolbox for Parallel PDE-Solvers Gundolf Haase Institut for Mathematics and Scientific Computing Karl-Franzens University of Graz Manfred Liebmann Mathematics in Sciences Max-Planck-Institute

More information

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017 HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher

More information

Julian Merten. GPU Computing and Alternative Architecture

Julian Merten. GPU Computing and Alternative Architecture Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg

More information

B. Reps 1, P. Ghysels 2, O. Schenk 4, K. Meerbergen 3,5, W. Vanroose 1,3. IHP 2015, Paris FRx

B. Reps 1, P. Ghysels 2, O. Schenk 4, K. Meerbergen 3,5, W. Vanroose 1,3. IHP 2015, Paris FRx Communication Avoiding and Hiding in preconditioned Krylov solvers 1 Applied Mathematics, University of Antwerp, Belgium 2 Future Technology Group, Berkeley Lab, USA 3 Intel ExaScience Lab, Belgium 4 USI,

More information

Empowering Scientists with Domain Specific Languages

Empowering Scientists with Domain Specific Languages Empowering Scientists with Domain Specific Languages Julian Kunkel, Nabeeh Jum ah Scientific Computing Department of Informatics University of Hamburg SciCADE2017 2017-09-13 Outline 1 Developing Scientific

More information

From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system. Oliver Fuhrer and Thomas C. Schulthess

From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system. Oliver Fuhrer and Thomas C. Schulthess From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system Oliver Fuhrer and Thomas C. Schulthess 1 Piz Daint Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Compute node:

More information

Solving PDEs: the Poisson problem TMA4280 Introduction to Supercomputing

Solving PDEs: the Poisson problem TMA4280 Introduction to Supercomputing Solving PDEs: the Poisson problem TMA4280 Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF February 27. 2017 1 The Poisson problem The Poisson equation is an elliptic partial

More information

PETSc for Python. Lisandro Dalcin

PETSc for Python.  Lisandro Dalcin PETSc for Python http://petsc4py.googlecode.com Lisandro Dalcin dalcinl@gmail.com Centro Internacional de Métodos Computacionales en Ingeniería Consejo Nacional de Investigaciones Científicas y Técnicas

More information

Expressive Environments and Code Generation for High Performance Computing

Expressive Environments and Code Generation for High Performance Computing Expressive Environments and Code Generation for High Performance Computing Garth N. Wells University of Cambridge SIAM Parallel Processing, 20 February 2014 Collaborators Martin Alnæs, Johan Hake, Richard

More information

Recent advances in HPC with FreeFem++

Recent advances in HPC with FreeFem++ Recent advances in HPC with FreeFem++ Pierre Jolivet Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann Fourth workshop on FreeFem++ December 6, 2012 With F. Hecht, F. Nataf, C. Prud homme. Outline

More information

Towards effective shell modelling with the FEniCS project

Towards effective shell modelling with the FEniCS project Towards effective shell modelling with the FEniCS project *, P. M. Baiz Department of Aeronautics 19th March 2013 Shells and FEniCS - FEniCS Workshop 13 1 Outline Introduction Shells: chart shear-membrane-bending

More information

Lazy matrices for contraction-based algorithms

Lazy matrices for contraction-based algorithms Lazy matrices for contraction-based algorithms Michael F. Herbst michael.herbst@iwr.uni-heidelberg.de https://michael-herbst.com Interdisziplinäres Zentrum für wissenschaftliches Rechnen Ruprecht-Karls-Universität

More information

Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6

Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6 Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6 Sangwon Joo, Yoonjae Kim, Hyuncheol Shin, Eunhee Lee, Eunjung Kim (Korea Meteorological Administration) Tae-Hun

More information

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013

More information

An Algebraic Multigrid Method for Eigenvalue Problems

An Algebraic Multigrid Method for Eigenvalue Problems An Algebraic Multigrid Method for Eigenvalue Problems arxiv:1503.08462v1 [math.na] 29 Mar 2015 Xiaole Han, Yunhui He, Hehu Xie and Chunguang You Abstract An algebraic multigrid method is proposed to solve

More information

Scalability Programme at ECMWF

Scalability Programme at ECMWF Scalability Programme at ECMWF Picture: Stan Tomov, ICL, University of Tennessee, Knoxville Peter Bauer, Mike Hawkins, George Mozdzynski, Tiago Quintino, Deborah Salmond, Stephan Siemen, Yannick Trémolet

More information

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM

More information

Weather and Climate Modeling on GPU and Xeon Phi Accelerated Systems

Weather and Climate Modeling on GPU and Xeon Phi Accelerated Systems Weather and Climate Modeling on GPU and Xeon Phi Accelerated Systems Mike Ashworth, Rupert Ford, Graham Riley, Stephen Pickles Scientific Computing Department & STFC Hartree Centre STFC Daresbury Laboratory

More information

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,

More information

Finite Element Solver for Flux-Source Equations

Finite Element Solver for Flux-Source Equations Finite Element Solver for Flux-Source Equations Weston B. Lowrie A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Aeronautics Astronautics University

More information

Solving PDEs with Multigrid Methods p.1

Solving PDEs with Multigrid Methods p.1 Solving PDEs with Multigrid Methods Scott MacLachlan maclachl@colorado.edu Department of Applied Mathematics, University of Colorado at Boulder Solving PDEs with Multigrid Methods p.1 Support and Collaboration

More information

FAS and Solver Performance

FAS and Solver Performance FAS and Solver Performance Matthew Knepley Mathematics and Computer Science Division Argonne National Laboratory Fall AMS Central Section Meeting Chicago, IL Oct 05 06, 2007 M. Knepley (ANL) FAS AMS 07

More information

MARCH 24-27, 2014 SAN JOSE, CA

MARCH 24-27, 2014 SAN JOSE, CA MARCH 24-27, 2014 SAN JOSE, CA Sparse HPC on modern architectures Important scientific applications rely on sparse linear algebra HPCG a new benchmark proposal to complement Top500 (HPL) To solve A x =

More information

The Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems

The Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems The Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems Wu Feng and Balaji Subramaniam Metrics for Energy Efficiency Energy- Delay Product (EDP) Used primarily in circuit design

More information

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Katharina Kormann 1 Klaus Reuter 2 Markus Rampp 2 Eric Sonnendrücker 1 1 Max Planck Institut für Plasmaphysik 2 Max Planck Computing

More information

Multipole-Based Preconditioners for Sparse Linear Systems.

Multipole-Based Preconditioners for Sparse Linear Systems. Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation. Overview Summary of Contributions Generalized Stokes Problem Solenoidal

More information

Integration of PETSc for Nonlinear Solves

Integration of PETSc for Nonlinear Solves Integration of PETSc for Nonlinear Solves Ben Jamroz, Travis Austin, Srinath Vadlamani, Scott Kruger Tech-X Corporation jamroz@txcorp.com http://www.txcorp.com NIMROD Meeting: Aug 10, 2010 Boulder, CO

More information

Performance of WRF using UPC

Performance of WRF using UPC Performance of WRF using UPC Hee-Sik Kim and Jong-Gwan Do * Cray Korea ABSTRACT: The Weather Research and Forecasting (WRF) model is a next-generation mesoscale numerical weather prediction system. We

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

ERLANGEN REGIONAL COMPUTING CENTER

ERLANGEN REGIONAL COMPUTING CENTER ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,

More information

A Finite-Element based Navier-Stokes Solver for LES

A Finite-Element based Navier-Stokes Solver for LES A Finite-Element based Navier-Stokes Solver for LES W. Wienken a, J. Stiller b and U. Fladrich c. a Technische Universität Dresden, Institute of Fluid Mechanics (ISM) b Technische Universität Dresden,

More information

Efficient Multigrid Preconditioners for Atmospheric Flow Simulations at High Aspect Ratio. February 11, 2015

Efficient Multigrid Preconditioners for Atmospheric Flow Simulations at High Aspect Ratio. February 11, 2015 Efficient Multigrid Preconditioners for Atmospheric Flow Simulations at High Aspect Ratio Andreas Dedner 1, Eike Müller *,2 and Robert Scheichl 2 1 Mathematics Institute, Zeeman Building, University of

More information

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work

More information

S3D Direct Numerical Simulation: Preparation for the PF Era

S3D Direct Numerical Simulation: Preparation for the PF Era S3D Direct Numerical Simulation: Preparation for the 10 100 PF Era Ray W. Grout, Scientific Computing SC 12 Ramanan Sankaran ORNL John Levesque Cray Cliff Woolley, Stan Posey nvidia J.H. Chen SNL NREL

More information

The FEniCS Project Philosophy, current status and future plans

The FEniCS Project Philosophy, current status and future plans The FEniCS Project Philosophy, current status and future plans Anders Logg logg@simula.no Simula Research Laboratory FEniCS 06 in Delft, November 8-9 2006 Outline Philosophy Recent updates Future plans

More information

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers Applied and Computational Mathematics 2017; 6(4): 202-207 http://www.sciencepublishinggroup.com/j/acm doi: 10.11648/j.acm.20170604.18 ISSN: 2328-5605 (Print); ISSN: 2328-5613 (Online) A Robust Preconditioned

More information

Solving PDEs on Supercomputers I: modern supercomputer architecture

Solving PDEs on Supercomputers I: modern supercomputer architecture Supercomputer architecture Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell MMSC: Python in Scientific Computing May 17, 2015 P. E. Farrell (Oxford) SPS I May 17, 2015

More information

Case Study: Quantum Chromodynamics

Case Study: Quantum Chromodynamics Case Study: Quantum Chromodynamics Michael Clark Harvard University with R. Babich, K. Barros, R. Brower, J. Chen and C. Rebbi Outline Primer to QCD QCD on a GPU Mixed Precision Solvers Multigrid solver

More information

hypre MG for LQFT Chris Schroeder LLNL - Physics Division

hypre MG for LQFT Chris Schroeder LLNL - Physics Division hypre MG for LQFT Chris Schroeder LLNL - Physics Division This work performed under the auspices of the U.S. Department of Energy by under Contract DE-??? Contributors hypre Team! Rob Falgout (project

More information

GloMAP Mode on HECToR Phase2b (Cray XT6) Mark Richardson Numerical Algorithms Group

GloMAP Mode on HECToR Phase2b (Cray XT6) Mark Richardson Numerical Algorithms Group GloMAP Mode on HECToR Phase2b (Cray XT6) Mark Richardson Numerical Algorithms Group 1 Acknowledgements NERC, NCAS Research Councils UK, HECToR Resource University of Leeds School of Earth and Environment

More information

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling 2019 Intel extreme Performance Users Group (IXPUG) meeting Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr)

More information

Parallel Preconditioning Methods for Ill-conditioned Problems

Parallel Preconditioning Methods for Ill-conditioned Problems Parallel Preconditioning Methods for Ill-conditioned Problems Kengo Nakajima Information Technology Center, The University of Tokyo 2014 Conference on Advanced Topics and Auto Tuning in High Performance

More information

c 2013 Society for Industrial and Applied Mathematics

c 2013 Society for Industrial and Applied Mathematics SIAM J. SCI. COMPUT. Vol. 35, No. 4, pp. C369 C393 c 2013 Society for Industrial and Applied Mathematics AUTOMATED DERIVATION OF THE ADJOINT OF HIGH-LEVEL TRANSIENT FINITE ELEMENT PROGRAMS P. E. FARRELL,

More information

Multigrid and Iterative Strategies for Optimal Control Problems

Multigrid and Iterative Strategies for Optimal Control Problems Multigrid and Iterative Strategies for Optimal Control Problems John Pearson 1, Stefan Takacs 1 1 Mathematical Institute, 24 29 St. Giles, Oxford, OX1 3LB e-mail: john.pearson@worc.ox.ac.uk, takacs@maths.ox.ac.uk

More information

From GungHo to LFRic. Replacing the Met Office Unified Model Steve Mullerworth. Crown copyright Met Office

From GungHo to LFRic. Replacing the Met Office Unified Model Steve Mullerworth. Crown copyright Met Office From GungHo to LFRic Replacing the Met Office Unified Model Steve Mullerworth Summary the perspective from a Computational Scientist point of view Where are we now What are Gung Ho and LFRic LFRic plan

More information

High Performance Python Libraries

High Performance Python Libraries High Performance Python Libraries Matthew Knepley Computation Institute University of Chicago Department of Mathematics and Computer Science Argonne National Laboratory 4th Workshop on Python for High

More information

From here to there. peta exa-scale transient simulation. Lawrence Mitchell 1, 27th March

From here to there. peta exa-scale transient simulation. Lawrence Mitchell 1, 27th March From here to there challenges for peta exa-scale transient simulation Lawrence Mitchell 1, 27th March 2017 1 Departments of Computing and Mathematics, Imperial College London lawrence.mitchell@imperial.ac.uk

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Computers and Mathematics with Applications

Computers and Mathematics with Applications Computers and Mathematics with Applications 68 (2014) 1151 1160 Contents lists available at ScienceDirect Computers and Mathematics with Applications journal homepage: www.elsevier.com/locate/camwa A GPU

More information

Some Geometric and Algebraic Aspects of Domain Decomposition Methods

Some Geometric and Algebraic Aspects of Domain Decomposition Methods Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition

More information

Introduction to numerical computations on the GPU

Introduction to numerical computations on the GPU Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming

More information

A note on accurate and efficient higher order Galerkin time stepping schemes for the nonstationary Stokes equations

A note on accurate and efficient higher order Galerkin time stepping schemes for the nonstationary Stokes equations A note on accurate and efficient higher order Galerkin time stepping schemes for the nonstationary Stokes equations S. Hussain, F. Schieweck, S. Turek Abstract In this note, we extend our recent work for

More information

Multilevel spectral coarse space methods in FreeFem++ on parallel architectures

Multilevel spectral coarse space methods in FreeFem++ on parallel architectures Multilevel spectral coarse space methods in FreeFem++ on parallel architectures Pierre Jolivet Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann DD 21, Rennes. June 29, 2012 In collaboration with

More information

ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS

ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS Alan Gray, Developer Technology Engineer, NVIDIA ECMWF 18th Workshop on high performance computing in meteorology, 28 th September 2018 ESCAPE NVIDIA s

More information

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method NUCLEAR SCIENCE AND TECHNIQUES 25, 0501 (14) Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method XU Qi ( 徐琪 ), 1, YU Gang-Lin ( 余纲林 ), 1 WANG Kan ( 王侃 ),

More information

A Strategy for the Development of Coupled Ocean- Atmosphere Discontinuous Galerkin Models

A Strategy for the Development of Coupled Ocean- Atmosphere Discontinuous Galerkin Models A Strategy for the Development of Coupled Ocean- Atmosphere Discontinuous Galerkin Models Frank Giraldo Department of Applied Math, Naval Postgraduate School, Monterey CA 93943 Collaborators: Jim Kelly

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,

More information

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT

More information

Arches Part 1: Introduction to the Uintah Computational Framework. Charles Reid Scientific Computing Summer Workshop July 14, 2010

Arches Part 1: Introduction to the Uintah Computational Framework. Charles Reid Scientific Computing Summer Workshop July 14, 2010 Arches Part 1: Introduction to the Uintah Computational Framework Charles Reid Scientific Computing Summer Workshop July 14, 2010 Arches Uintah Computational Framework Cluster Node Node Node Node Node

More information

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners José I. Aliaga Leveraging task-parallelism in energy-efficient ILU preconditioners Universidad Jaime I (Castellón, Spain) José I. Aliaga

More information