Efficient multigrid solvers for mixed finite element discretisations in NWP models
|
|
- Melina Welch
- 5 years ago
- Views:
Transcription
1 1/20 Efficient multigrid solvers for mixed finite element discretisations in NWP models Colin Cotter, David Ham, Lawrence Mitchell, Eike Hermann Müller *, Robert Scheichl * * University of Bath, Imperial College London 16th ECMWF Workshop on High Performance Computing in Meteorology Oct 2014
2 2/20 Motivation Separation of concerns and high-level abstractions Equations, Algorithms discretisation Parallel Optimised Code Domain specialist Computational Scientist Equations, Algorithms High level code PETSc firedrake PyOP2 * Parallel Optimised Code Use case: complex iterative solver for pressure correction eqn. * All credit for developing firedrake/pyop2 goes to David Ham and his group at Imperial, I m giving this talk as a user
3 3/20 Motivation Computational challenge in numerical forecasts: Solver for elliptic pressure correction equation (algorithmic + parallel) scalability & performance Number of CPU cores Titan theoretical peak PetaFLOP Total solution time [s] 1 FLOPs single GPU peak TeraFLOP CG Titan [nvidia Kepler] 512 Multigrid Hector [AMD Opteron] Number of GPUs Weak scaling, total solution time CG 512 Multigrid 10 9 GigaFLOP Number of GPUs Absolute performance dofs on GPUs of TITAN, 0.78 PFLOPs, 20%-50% BW Finite volume discretisation, simpler geometry
4 4/20 Motivation Challenges Finite element discretisation (Met Office next generation dyn. core), unstructured grids more complicated Duplicate effort for CPU, GPU (Xeon Phi,... ) implementation & optimisation, reinvent the wheel? Mix two issues: algorithmic (domain specialist/numerical analyst) parallelisation & low level optimisation (computational scientist) Goal Separation of concerns (algorithmic computational) rapid algorithm development and testing at high abstraction level Performance portability Reuse existing, tested and optimised tools and libraries Choice of correct abstraction(s)
5 5/20 This talk Iterative solver for Helmholtz equation: φ + ω (φ u) = r φ u ω φ = r u Mixed finite element discretisation, icosahedral grid Matrix-free multigrid preconditioner for pressure correction Performance portable firedrake/pyop2 toolchain [Rathgeber et al., (2012 & 2014)] Python automatic generation of optimised C code Collaboration between Numerical algorithm developers (Colin, Rob, Eike) Computational scientists, Library developers (David, Lawrence, {PETSc,PyOP2,firedrake} teams)
6 6/20 Shallow water equations Model system: shallow water equations u t φ t + (φu) = 0 + (u ) u + g φ = 0 Implicit timestepping linear system for velocity u and height perturbation φ φ + ω (φ u) = r φ u ω φ = r u reference state φ 1 ω = c h t = gφ t
7 7/20 Mixed system Finite element discretisation: φ V 2, u V 1 Matrix system for dof-vectors Φ R n V 2, U R n V 1 A ( ) ( ) ( ) Φ Mφ ωb Φ = U ωb T = M u U ( Rφ R u ) Mixed finite elements DG 0 + RT 1 (lowest order) DG 1 + BDFM 1 (higher order) (M φ ) ij β i (x)β j (x) dx, Ω B ie v e (x)β i (x) dx Ω (M u ) ef v e (x) v f (x) dx Ω (derivative matrix) (mass matrix)
8 8/20 Iterative solver Iterative solver 1 Outer iteration (mixed system): GMRES, Richardson, BiCGStab,... Preconditioner: ( ) Mφ ωb P ωb T Mu lumped velocity mass matrix ( ) ( ) ( ) P H ωb(m = u ) 1 (Mu) 1 ωb T 1 0 (Mu) Elliptic positive definite Helmholtz operator H = M φ + ω 2 B (M u ) 1 B T 2 Inner iteration (pressure system Hφ = R φ ): CG with geometric multigrid preconditioner
9 9/20 Multigrid Multigrid V-cycle smooth smooth smooth x 1/4 smooth smooth smooth smooth x1 x 1/16 x 1/64 Computational bottleneck Smoother and operator application φ0 7 φ0 + ρrelax DH 1 Rφ0 H φ0 H = Mφ + ω2 B (Mu ) 1 BT Intrinsic length scale Λ/ x = cs t / x νcfl = O(10) grid spacings ( resolutions x) small number of multigrid levels/coarse smoother sufficient
10 10/20 Grid iteration in FEM codes Computational bottlenecks in finite element codes u m e (1) u m e (3) ɸ e um (2) e Example: Weak divergence φ = u Φ = BU φ e φ e + b (e) 1 u me(1) + b (e) 2 u me(2) + b (e) 3 u me(3) Loop over topological entities (e.g. cells). In each cell Access local dofs (e.g. pressure and velocity) Execute local kernel Write data back Manage halo exchanges, thread parallelism
11 11/20 Firedrake/PyOP2 framework Algorithms developers, num. analysts Eike, Rob, Colin Library developers, comp. scientists Lawrence, David, {PyOP2 firedrake, PETSc}developers Firedrake Interface Geometry, (non)linear solves assembly, compiled expressions PETSc4py (KSP, SNES, DMPlex) Meshes, matrices, vectors data structures (Set, Map, Dat) Parallel loops: kernels executed over mesh Unified Form Language (UFL) modified FFC PyOP2 Interface Parallel scheduling, code generation MPI parallel loop CPU GPU (OpenMP/ (PyCUDA / OpenCL) PyOpenCL) parallel loop Future arch. Problem definition in FEM weak form FIAT Local assembly kernels (AST) COFFEE AST optimizer Explicitly parallel hardwarespecific implementation Firedrake Performanceportable Finite-element computation framework PyOP2 Parallel unstructured mesh computation framework [Florian Rathgeber, PhD thesis, Imperical College (2014)]
12 12/20 UFL Unified Form Language [Alnæs et al. (2014)] Example: Bilinear form a(u, v) = u(x)v(x) dx Python code from firedrake import * mesh = UnitSquareMesh(8,8) V = FunctionSpace(mesh, CG,1) u = Function(V) v = Function(V) a_uv = assemble(u*v*dx) Ω
13 13/20 Generated code C-Code generated by Firedrake/PyOP2 Kernel (below) execution over grid (right) OpenMP backend
14 14/20 Putting it all together 1, 600 lines of highly modular python code (incl. comments) 1 Use UFL wherever possible φ( u) dx assemble(phi*div(u)*dx) (weak divergence operator) 2 Use PETSc for iterative solvers provide operators/preconditioners as callbacks 3 Escape hatch: Write direct PyOP2 parallel loops lumped mass matrix division: U (M u ) 1 U kernel_code = void lumped_mass_divide(double **m,double **U) { for (int i=0; i<4; ++i) U[i][0] /= m[0][i]; } kernel = op2.kernel(kernel_code,"lumped_mass_divide") op2.par_loop(kernel,facetset, m.dat(op2.read,self.facet2dof_map_facets), u.dat(op2.rw,self.facet2dof_map_bdfm1))
15 15/20 Helmholtz operator Helmholtz operator Hφ = M φ φ + ω 2 B(M u ) 1 B T φ Operator application class Operator(object): [...] def apply(self,phi): psi = TestFunction(V_pressure) w = TestFunction(V_velocity) B_phi = assemble(div(w)*phi*dx) self.velocity_mass_matrix.divide(b_phi) BT_B_phi = assemble(psi*div(b_phi)*dx) M_phi = assemble(psi*phi*dx) return assemble(m_phi + self.omega**2*bt_b_phi) Interface with PETSc petsc_op = PETSc.Mat().create() petsc_op.setpythoncontext(operator(omega)) petsc_op.setup() ksp = PETSc.KSP() ksp.create() ksp.setoperators(petsc_op)
16 16/20 Algorithmic performance Convergence history Inner Solve: 3 CG iterations with multigrid preconditioner Icosahedral grid, 327,680 cells 4 multigrid levels relative residual r 2 / r DG 0 +RT 0 Richardson DG 0 +RT 0 GMRES DG 1 +BDFM 1 Richardson DG 1 +BDFM 1 GMRES iteration
17 17/20 Efficiency Single node run on ARCHER, lowest order, cells [preliminary] Total solve Pressure solve Helmholtz operator 17.1GB/s 1.3GB/s 4.7GB/s 10.6GB/s time [s] STREAM triad: 74.1GB/s per node up to 23% of peak BW
18 18/20 Parallel scalability Weak scaling on ARCHER [preliminary] 10 2 Number of cells hypre BoomerAMG matrix-free MG DG 1 +BDFM 1 Solution time [s] mio 5.2mio DG 0 +RT mio 83.9mio 335.5mio Number of cores
19 19/20 Conclusion Summary Outlook Matrix-free multigrid-preconditioner for pressure correction equation Implementation in firedrake/pyop2/petsc framework: Performance portability Correct abstractions Separation of concerns Test on GPU backend Extend to 3d: Regular vertical grid Improved caching Improved memory BW Improve and extend parallel scaling Full 3d dynamical core (Colin Cotter)
20 20/20 References The firedrake project: F. Rathgeber et al.: Firedrake: automating the finite element method by composing abstractions. in preparation F. Rathgeber et al.: PyOP2: A High-Level Framework for Performance-Portable Simulations on Unstructured Meshes. HPC, Networking Storage and Analysis, SC Companion, p , Los Alamitos, CA, USA, IEEE Computer Society E. Müller et al.: Petascale elliptic solvers for anisotropic PDEs on GPU clusters. Submitted to Parallel Computing (2014) [arxiv: ] E. Müller, R. Scheichl: Massively parallel solvers for elliptic PDEs in Numerical Weather- and Climate Prediction. QJRMS (2013) [arxiv: ] Helmholtz solver code on github
A primal-dual mixed finite element method. for accurate and efficient atmospheric. modelling on massively parallel computers
A primal-dual mixed finite element method for accurate and efficient atmospheric modelling on massively parallel computers John Thuburn (University of Exeter, UK) Colin Cotter (Imperial College, UK) AMMW03,
More informationScalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems
Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems Pierre Jolivet, F. Hecht, F. Nataf, C. Prud homme Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann INRIA Rocquencourt
More informationLattice Boltzmann simulations on heterogeneous CPU-GPU clusters
Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts
More informationPreliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools
Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools Xiangjun Wu (1),Lilun Zhang (2),Junqiang Song (2) and Dehui Chen (1) (1) Center for Numerical Weather Prediction, CMA (2) School
More informationAn Overview of HPC at the Met Office
An Overview of HPC at the Met Office Paul Selwood Crown copyright 2006 Page 1 Introduction The Met Office National Weather Service for the UK Climate Prediction (Hadley Centre) Operational and Research
More informationUniversität Dortmund UCHPC. Performance. Computing for Finite Element Simulations
technische universität dortmund Universität Dortmund fakultät für mathematik LS III (IAM) UCHPC UnConventional High Performance Computing for Finite Element Simulations S. Turek, Chr. Becker, S. Buijssen,
More informationRWTH Aachen University
IPCC @ RWTH Aachen University Optimization of multibody and long-range solvers in LAMMPS Rodrigo Canales William McDoniel Markus Höhnerbach Ahmed E. Ismail Paolo Bientinesi IPCC Showcase November 2016
More informationA Hybrid Method for the Wave Equation. beilina
A Hybrid Method for the Wave Equation http://www.math.unibas.ch/ beilina 1 The mathematical model The model problem is the wave equation 2 u t 2 = (a 2 u) + f, x Ω R 3, t > 0, (1) u(x, 0) = 0, x Ω, (2)
More informationProgress in Numerical Methods at ECMWF
Progress in Numerical Methods at ECMWF EWGLAM / SRNWP October 2016 W. Deconinck, G. Mengaldo, C. Kühnlein, P.K. Smolarkiewicz, N.P. Wedi, P. Bauer willem.deconinck@ecmwf.int ECMWF November 7, 2016 2 The
More informationDistributed Memory Parallelization in NGSolve
Distributed Memory Parallelization in NGSolve Lukas Kogler June, 2017 Inst. for Analysis and Scientific Computing, TU Wien From Shared to Distributed Memory Shared Memory Parallelization via threads (
More informationIntroduction to Benchmark Test for Multi-scale Computational Materials Software
Introduction to Benchmark Test for Multi-scale Computational Materials Software Shun Xu*, Jian Zhang, Zhong Jin xushun@sccas.cn Computer Network Information Center Chinese Academy of Sciences (IPCC member)
More informationFEniCS Course. Lecture 0: Introduction to FEM. Contributors Anders Logg, Kent-Andre Mardal
FEniCS Course Lecture 0: Introduction to FEM Contributors Anders Logg, Kent-Andre Mardal 1 / 46 What is FEM? The finite element method is a framework and a recipe for discretization of mathematical problems
More informationPetascale Quantum Simulations of Nano Systems and Biomolecules
Petascale Quantum Simulations of Nano Systems and Biomolecules Emil Briggs North Carolina State University 1. Outline of real-space Multigrid (RMG) 2. Scalability and hybrid/threaded models 3. GPU acceleration
More informationCrossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia
On the Paths to Exascale: Crossing the Chasm Presented by Mike Rezny, Monash University, Australia michael.rezny@monash.edu Crossing the Chasm meeting Reading, 24 th October 2016 Version 0.1 In collaboration
More informationDiscretization of PDEs and Tools for the Parallel Solution of the Resulting Systems
Discretization of PDEs and Tools for the Parallel Solution of the Resulting Systems Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee Wednesday April 4,
More informationOpen-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters --
Parallel Processing for Energy Efficiency October 3, 2013 NTNU, Trondheim, Norway Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer
More informationImplicit Solution of Viscous Aerodynamic Flows using the Discontinuous Galerkin Method
Implicit Solution of Viscous Aerodynamic Flows using the Discontinuous Galerkin Method Per-Olof Persson and Jaime Peraire Massachusetts Institute of Technology 7th World Congress on Computational Mechanics
More informationToward black-box adaptive domain decomposition methods
Toward black-box adaptive domain decomposition methods Frédéric Nataf Laboratory J.L. Lions (LJLL), CNRS, Alpines Inria and Univ. Paris VI joint work with Victorita Dolean (Univ. Nice Sophia-Antipolis)
More informationKasetsart University Workshop. Multigrid methods: An introduction
Kasetsart University Workshop Multigrid methods: An introduction Dr. Anand Pardhanani Mathematics Department Earlham College Richmond, Indiana USA pardhan@earlham.edu A copy of these slides is available
More information- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline
- Part 4 - Multicore and Manycore Technology: Chances and Challenges Vincent Heuveline 1 Numerical Simulation of Tropical Cyclones Goal oriented adaptivity for tropical cyclones ~10⁴km ~1500km ~100km 2
More informationOpen-source finite element solver for domain decomposition problems
1/29 Open-source finite element solver for domain decomposition problems C. Geuzaine 1, X. Antoine 2,3, D. Colignon 1, M. El Bouajaji 3,2 and B. Thierry 4 1 - University of Liège, Belgium 2 - University
More informationCactus Tools for Petascale Computing
Cactus Tools for Petascale Computing Erik Schnetter Reno, November 2007 Gamma Ray Bursts ~10 7 km He Protoneutron Star Accretion Collapse to a Black Hole Jet Formation and Sustainment Fe-group nuclei Si
More informationAlgebraic Multigrid as Solvers and as Preconditioner
Ò Algebraic Multigrid as Solvers and as Preconditioner Domenico Lahaye domenico.lahaye@cs.kuleuven.ac.be http://www.cs.kuleuven.ac.be/ domenico/ Department of Computer Science Katholieke Universiteit Leuven
More informationFinite element exterior calculus framework for geophysical fluid dynamics
Finite element exterior calculus framework for geophysical fluid dynamics Colin Cotter Department of Aeronautics Imperial College London Part of ongoing work on UK Gung-Ho Dynamical Core Project funded
More informationA diagnostic interface for ICON Coping with the challenges of high-resolution climate simulations
DLR.de Chart 1 > Challenges of high-resolution climate simulations > Bastian Kern HPCN Workshop, Göttingen > 10.05.2016 A diagnostic interface for ICON Coping with the challenges of high-resolution climate
More informationPiz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess
Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting Thomas C. Schulthess 1 Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Piz Daint Compute node:
More informationAn Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems
An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems P.-O. Persson and J. Peraire Massachusetts Institute of Technology 2006 AIAA Aerospace Sciences Meeting, Reno, Nevada January 9,
More informationLarge-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors
Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr) Principal Researcher / Korea Institute of Science and Technology
More informationUsing AmgX to accelerate a PETSc-based immersed-boundary method code
29th International Conference on Parallel Computational Fluid Dynamics May 15-17, 2017; Glasgow, Scotland Using AmgX to accelerate a PETSc-based immersed-boundary method code Olivier Mesnard, Pi-Yueh Chuang,
More informationMultigrid Methods and their application in CFD
Multigrid Methods and their application in CFD Michael Wurst TU München 16.06.2009 1 Multigrid Methods Definition Multigrid (MG) methods in numerical analysis are a group of algorithms for solving differential
More informationOn the Paths to Exascale: Will We be Hungry?
On the Paths to Exascale: Will We be Hungry? Presentation by Mike Rezny, Monash University, Australia michael.rezny@monash.edu 4th ENES Workshop High Performance Computing for Climate and Weather Toulouse,
More informationThe Fast Multipole Method in molecular dynamics
The Fast Multipole Method in molecular dynamics Berk Hess KTH Royal Institute of Technology, Stockholm, Sweden ADAC6 workshop Zurich, 20-06-2018 Slide BioExcel Slide Molecular Dynamics of biomolecules
More informationECMWF Scalability Programme
ECMWF Scalability Programme Picture: Stan Tomov, ICL, University of Tennessee, Knoxville Peter Bauer, Mike Hawkins, Deborah Salmond, Stephan Siemen, Yannick Trémolet, and Nils Wedi Next generation science
More informationTargeting Extreme Scale Computational Challenges with Heterogeneous Systems
Targeting Extreme Scale Computational Challenges with Heterogeneous Systems Oreste Villa, Antonino Tumeo Pacific Northwest Na/onal Laboratory (PNNL) 1 Introduction! PNNL Laboratory Directed Research &
More informationClaude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique
Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)
More informationOptimal Left and Right Additive Schwarz Preconditioning for Minimal Residual Methods with Euclidean and Energy Norms
Optimal Left and Right Additive Schwarz Preconditioning for Minimal Residual Methods with Euclidean and Energy Norms Marcus Sarkis Worcester Polytechnic Inst., Mass. and IMPA, Rio de Janeiro and Daniel
More informationWelcome to MCS 572. content and organization expectations of the course. definition and classification
Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson
More informationMultigrid solvers for equations arising in implicit MHD simulations
Multigrid solvers for equations arising in implicit MHD simulations smoothing Finest Grid Mark F. Adams Department of Applied Physics & Applied Mathematics Columbia University Ravi Samtaney PPPL Achi Brandt
More informationSPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS
More informationA User Friendly Toolbox for Parallel PDE-Solvers
A User Friendly Toolbox for Parallel PDE-Solvers Gundolf Haase Institut for Mathematics and Scientific Computing Karl-Franzens University of Graz Manfred Liebmann Mathematics in Sciences Max-Planck-Institute
More informationHYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017
HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher
More informationJulian Merten. GPU Computing and Alternative Architecture
Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg
More informationB. Reps 1, P. Ghysels 2, O. Schenk 4, K. Meerbergen 3,5, W. Vanroose 1,3. IHP 2015, Paris FRx
Communication Avoiding and Hiding in preconditioned Krylov solvers 1 Applied Mathematics, University of Antwerp, Belgium 2 Future Technology Group, Berkeley Lab, USA 3 Intel ExaScience Lab, Belgium 4 USI,
More informationEmpowering Scientists with Domain Specific Languages
Empowering Scientists with Domain Specific Languages Julian Kunkel, Nabeeh Jum ah Scientific Computing Department of Informatics University of Hamburg SciCADE2017 2017-09-13 Outline 1 Developing Scientific
More informationFrom Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system. Oliver Fuhrer and Thomas C. Schulthess
From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system Oliver Fuhrer and Thomas C. Schulthess 1 Piz Daint Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Compute node:
More informationSolving PDEs: the Poisson problem TMA4280 Introduction to Supercomputing
Solving PDEs: the Poisson problem TMA4280 Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF February 27. 2017 1 The Poisson problem The Poisson equation is an elliptic partial
More informationPETSc for Python. Lisandro Dalcin
PETSc for Python http://petsc4py.googlecode.com Lisandro Dalcin dalcinl@gmail.com Centro Internacional de Métodos Computacionales en Ingeniería Consejo Nacional de Investigaciones Científicas y Técnicas
More informationExpressive Environments and Code Generation for High Performance Computing
Expressive Environments and Code Generation for High Performance Computing Garth N. Wells University of Cambridge SIAM Parallel Processing, 20 February 2014 Collaborators Martin Alnæs, Johan Hake, Richard
More informationRecent advances in HPC with FreeFem++
Recent advances in HPC with FreeFem++ Pierre Jolivet Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann Fourth workshop on FreeFem++ December 6, 2012 With F. Hecht, F. Nataf, C. Prud homme. Outline
More informationTowards effective shell modelling with the FEniCS project
Towards effective shell modelling with the FEniCS project *, P. M. Baiz Department of Aeronautics 19th March 2013 Shells and FEniCS - FEniCS Workshop 13 1 Outline Introduction Shells: chart shear-membrane-bending
More informationLazy matrices for contraction-based algorithms
Lazy matrices for contraction-based algorithms Michael F. Herbst michael.herbst@iwr.uni-heidelberg.de https://michael-herbst.com Interdisziplinäres Zentrum für wissenschaftliches Rechnen Ruprecht-Karls-Universität
More informationPerformance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6
Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6 Sangwon Joo, Yoonjae Kim, Hyuncheol Shin, Eunhee Lee, Eunjung Kim (Korea Meteorological Administration) Tae-Hun
More informationParallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2
1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013
More informationAn Algebraic Multigrid Method for Eigenvalue Problems
An Algebraic Multigrid Method for Eigenvalue Problems arxiv:1503.08462v1 [math.na] 29 Mar 2015 Xiaole Han, Yunhui He, Hehu Xie and Chunguang You Abstract An algebraic multigrid method is proposed to solve
More informationScalability Programme at ECMWF
Scalability Programme at ECMWF Picture: Stan Tomov, ICL, University of Tennessee, Knoxville Peter Bauer, Mike Hawkins, George Mozdzynski, Tiago Quintino, Deborah Salmond, Stephan Siemen, Yannick Trémolet
More informationTowards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters
Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM
More informationWeather and Climate Modeling on GPU and Xeon Phi Accelerated Systems
Weather and Climate Modeling on GPU and Xeon Phi Accelerated Systems Mike Ashworth, Rupert Ford, Graham Riley, Stephen Pickles Scientific Computing Department & STFC Hartree Centre STFC Daresbury Laboratory
More informationScalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver
Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,
More informationFinite Element Solver for Flux-Source Equations
Finite Element Solver for Flux-Source Equations Weston B. Lowrie A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Aeronautics Astronautics University
More informationSolving PDEs with Multigrid Methods p.1
Solving PDEs with Multigrid Methods Scott MacLachlan maclachl@colorado.edu Department of Applied Mathematics, University of Colorado at Boulder Solving PDEs with Multigrid Methods p.1 Support and Collaboration
More informationFAS and Solver Performance
FAS and Solver Performance Matthew Knepley Mathematics and Computer Science Division Argonne National Laboratory Fall AMS Central Section Meeting Chicago, IL Oct 05 06, 2007 M. Knepley (ANL) FAS AMS 07
More informationMARCH 24-27, 2014 SAN JOSE, CA
MARCH 24-27, 2014 SAN JOSE, CA Sparse HPC on modern architectures Important scientific applications rely on sparse linear algebra HPCG a new benchmark proposal to complement Top500 (HPL) To solve A x =
More informationThe Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems
The Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems Wu Feng and Balaji Subramaniam Metrics for Energy Efficiency Energy- Delay Product (EDP) Used primarily in circuit design
More informationMassively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem
Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Katharina Kormann 1 Klaus Reuter 2 Markus Rampp 2 Eric Sonnendrücker 1 1 Max Planck Institut für Plasmaphysik 2 Max Planck Computing
More informationMultipole-Based Preconditioners for Sparse Linear Systems.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation. Overview Summary of Contributions Generalized Stokes Problem Solenoidal
More informationIntegration of PETSc for Nonlinear Solves
Integration of PETSc for Nonlinear Solves Ben Jamroz, Travis Austin, Srinath Vadlamani, Scott Kruger Tech-X Corporation jamroz@txcorp.com http://www.txcorp.com NIMROD Meeting: Aug 10, 2010 Boulder, CO
More informationPerformance of WRF using UPC
Performance of WRF using UPC Hee-Sik Kim and Jong-Gwan Do * Cray Korea ABSTRACT: The Weather Research and Forecasting (WRF) model is a next-generation mesoscale numerical weather prediction system. We
More informationAccelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric
More informationERLANGEN REGIONAL COMPUTING CENTER
ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,
More informationA Finite-Element based Navier-Stokes Solver for LES
A Finite-Element based Navier-Stokes Solver for LES W. Wienken a, J. Stiller b and U. Fladrich c. a Technische Universität Dresden, Institute of Fluid Mechanics (ISM) b Technische Universität Dresden,
More informationEfficient Multigrid Preconditioners for Atmospheric Flow Simulations at High Aspect Ratio. February 11, 2015
Efficient Multigrid Preconditioners for Atmospheric Flow Simulations at High Aspect Ratio Andreas Dedner 1, Eike Müller *,2 and Robert Scheichl 2 1 Mathematics Institute, Zeeman Building, University of
More informationMultilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota
Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work
More informationS3D Direct Numerical Simulation: Preparation for the PF Era
S3D Direct Numerical Simulation: Preparation for the 10 100 PF Era Ray W. Grout, Scientific Computing SC 12 Ramanan Sankaran ORNL John Levesque Cray Cliff Woolley, Stan Posey nvidia J.H. Chen SNL NREL
More informationThe FEniCS Project Philosophy, current status and future plans
The FEniCS Project Philosophy, current status and future plans Anders Logg logg@simula.no Simula Research Laboratory FEniCS 06 in Delft, November 8-9 2006 Outline Philosophy Recent updates Future plans
More informationA Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers
Applied and Computational Mathematics 2017; 6(4): 202-207 http://www.sciencepublishinggroup.com/j/acm doi: 10.11648/j.acm.20170604.18 ISSN: 2328-5605 (Print); ISSN: 2328-5613 (Online) A Robust Preconditioned
More informationSolving PDEs on Supercomputers I: modern supercomputer architecture
Supercomputer architecture Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell MMSC: Python in Scientific Computing May 17, 2015 P. E. Farrell (Oxford) SPS I May 17, 2015
More informationCase Study: Quantum Chromodynamics
Case Study: Quantum Chromodynamics Michael Clark Harvard University with R. Babich, K. Barros, R. Brower, J. Chen and C. Rebbi Outline Primer to QCD QCD on a GPU Mixed Precision Solvers Multigrid solver
More informationhypre MG for LQFT Chris Schroeder LLNL - Physics Division
hypre MG for LQFT Chris Schroeder LLNL - Physics Division This work performed under the auspices of the U.S. Department of Energy by under Contract DE-??? Contributors hypre Team! Rob Falgout (project
More informationGloMAP Mode on HECToR Phase2b (Cray XT6) Mark Richardson Numerical Algorithms Group
GloMAP Mode on HECToR Phase2b (Cray XT6) Mark Richardson Numerical Algorithms Group 1 Acknowledgements NERC, NCAS Research Councils UK, HECToR Resource University of Leeds School of Earth and Environment
More informationMassively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling
2019 Intel extreme Performance Users Group (IXPUG) meeting Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr)
More informationParallel Preconditioning Methods for Ill-conditioned Problems
Parallel Preconditioning Methods for Ill-conditioned Problems Kengo Nakajima Information Technology Center, The University of Tokyo 2014 Conference on Advanced Topics and Auto Tuning in High Performance
More informationc 2013 Society for Industrial and Applied Mathematics
SIAM J. SCI. COMPUT. Vol. 35, No. 4, pp. C369 C393 c 2013 Society for Industrial and Applied Mathematics AUTOMATED DERIVATION OF THE ADJOINT OF HIGH-LEVEL TRANSIENT FINITE ELEMENT PROGRAMS P. E. FARRELL,
More informationMultigrid and Iterative Strategies for Optimal Control Problems
Multigrid and Iterative Strategies for Optimal Control Problems John Pearson 1, Stefan Takacs 1 1 Mathematical Institute, 24 29 St. Giles, Oxford, OX1 3LB e-mail: john.pearson@worc.ox.ac.uk, takacs@maths.ox.ac.uk
More informationFrom GungHo to LFRic. Replacing the Met Office Unified Model Steve Mullerworth. Crown copyright Met Office
From GungHo to LFRic Replacing the Met Office Unified Model Steve Mullerworth Summary the perspective from a Computational Scientist point of view Where are we now What are Gung Ho and LFRic LFRic plan
More informationHigh Performance Python Libraries
High Performance Python Libraries Matthew Knepley Computation Institute University of Chicago Department of Mathematics and Computer Science Argonne National Laboratory 4th Workshop on Python for High
More informationFrom here to there. peta exa-scale transient simulation. Lawrence Mitchell 1, 27th March
From here to there challenges for peta exa-scale transient simulation Lawrence Mitchell 1, 27th March 2017 1 Departments of Computing and Mathematics, Imperial College London lawrence.mitchell@imperial.ac.uk
More informationTR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems
TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a
More informationComputers and Mathematics with Applications
Computers and Mathematics with Applications 68 (2014) 1151 1160 Contents lists available at ScienceDirect Computers and Mathematics with Applications journal homepage: www.elsevier.com/locate/camwa A GPU
More informationSome Geometric and Algebraic Aspects of Domain Decomposition Methods
Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition
More informationIntroduction to numerical computations on the GPU
Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming
More informationA note on accurate and efficient higher order Galerkin time stepping schemes for the nonstationary Stokes equations
A note on accurate and efficient higher order Galerkin time stepping schemes for the nonstationary Stokes equations S. Hussain, F. Schieweck, S. Turek Abstract In this note, we extend our recent work for
More informationMultilevel spectral coarse space methods in FreeFem++ on parallel architectures
Multilevel spectral coarse space methods in FreeFem++ on parallel architectures Pierre Jolivet Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann DD 21, Rennes. June 29, 2012 In collaboration with
More informationACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS
ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS Alan Gray, Developer Technology Engineer, NVIDIA ECMWF 18th Workshop on high performance computing in meteorology, 28 th September 2018 ESCAPE NVIDIA s
More informationResearch on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method
NUCLEAR SCIENCE AND TECHNIQUES 25, 0501 (14) Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method XU Qi ( 徐琪 ), 1, YU Gang-Lin ( 余纲林 ), 1 WANG Kan ( 王侃 ),
More informationA Strategy for the Development of Coupled Ocean- Atmosphere Discontinuous Galerkin Models
A Strategy for the Development of Coupled Ocean- Atmosphere Discontinuous Galerkin Models Frank Giraldo Department of Applied Math, Naval Postgraduate School, Monterey CA 93943 Collaborators: Jim Kelly
More informationLinear Solvers. Andrew Hazel
Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction
More informationA Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures
A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationQR Factorization of Tall and Skinny Matrices in a Grid Computing Environment
QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT
More informationArches Part 1: Introduction to the Uintah Computational Framework. Charles Reid Scientific Computing Summer Workshop July 14, 2010
Arches Part 1: Introduction to the Uintah Computational Framework Charles Reid Scientific Computing Summer Workshop July 14, 2010 Arches Uintah Computational Framework Cluster Node Node Node Node Node
More informationLeveraging Task-Parallelism in Energy-Efficient ILU Preconditioners
Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners José I. Aliaga Leveraging task-parallelism in energy-efficient ILU preconditioners Universidad Jaime I (Castellón, Spain) José I. Aliaga
More information