Scalable Non-Linear Compact Schemes

Similar documents
A Scalable, Parallel Implementation of Weighted, Non-Linear Compact Schemes

The Nonhydrostatic Unified Model of the Atmosphere (NUMA): CG Dynamical Core

A Central Compact-Reconstruction WENO Method for Hyperbolic Conservation Laws

A Strategy for the Development of Coupled Ocean- Atmosphere Discontinuous Galerkin Models

The Nonhydrostatic Unified Model of the Atmosphere (NUMA): CG Dynamical Core

A general well-balanced finite volume scheme for Euler equations with gravity

of Nebraska - Lincoln. National Aeronautics and Space Administration

A numerical study of SSP time integration methods for hyperbolic conservation laws

Strong Stability-Preserving (SSP) High-Order Time Discretization Methods

Weighted Essentially Non-Oscillatory limiters for Runge-Kutta Discontinuous Galerkin Methods

UNCORRECTED PROOF. A Fully Implicit Compressible Euler Solver for 2 Atmospheric Flows * 3. 1 Introduction 9. Chao Yang 1,2 and Xiao-Chuan Cai 2 4

A High-Order Discontinuous Galerkin Method for the Unsteady Incompressible Navier-Stokes Equations

Weighted ENO Schemes

High Order Accurate Runge Kutta Nodal Discontinuous Galerkin Method for Numerical Solution of Linear Convection Equation

A New Fourth-Order Non-Oscillatory Central Scheme For Hyperbolic Conservation Laws

Inverse Lax-Wendroff Procedure for Numerical Boundary Conditions of. Conservation Laws 1. Abstract

Recent Developments in the Flux Reconstruction Method

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems

A high-order discontinuous Galerkin solver for 3D aerodynamic turbulent flows

First, Second, and Third Order Finite-Volume Schemes for Diffusion

Implementation of a symmetry-preserving discretization in Gerris

Computational issues and algorithm assessment for shock/turbulence interaction problems

Open boundary conditions in numerical simulations of unsteady incompressible flow

Implicit Solution of Viscous Aerodynamic Flows using the Discontinuous Galerkin Method

Newton-Krylov-Schwarz Method for a Spherical Shallow Water Model

Improvement of convergence to steady state solutions of Euler equations with. the WENO schemes. Abstract

AN OPTIMALLY ACCURATE SPECTRAL VOLUME FORMULATION WITH SYMMETRY PRESERVATION

Formulation and performance of the Variable-Cubic Atmospheric Model

SMOOTHNESS INDICATORS FOR WENO SCHEME USING UNDIVIDED DIFFERENCES

AProofoftheStabilityoftheSpectral Difference Method For All Orders of Accuracy

Numerical Techniques and Cloud-Scale Processes for High-Resolution Models

Finite volumes for complex applications In this paper, we study finite-volume methods for balance laws. In particular, we focus on Godunov-type centra

Edwin van der Weide and Magnus Svärd. I. Background information for the SBP-SAT scheme

Experience with DNS of particulate flow using a variant of the immersed boundary method

Introduction to Finite Volume projection methods. On Interfaces with non-zero mass flux

The hybridized DG methods for WS1, WS2, and CS2 test cases

Development and stability analysis of the inverse Lax-Wendroff boundary. treatment for central compact schemes 1

Strong Stability Preserving Time Discretizations

Application of the Kurganov Levy semi-discrete numerical scheme to hyperbolic problems with nonlinear source terms

Finite Differences for Differential Equations 28 PART II. Finite Difference Methods for Differential Equations

Detonation Simulations with a Fifth-Order TENO Scheme

Assessment of Implicit Implementation of the AUSM + Method and the SST Model for Viscous High Speed Flow

A STUDY OF MULTIGRID SMOOTHERS USED IN COMPRESSIBLE CFD BASED ON THE CONVECTION DIFFUSION EQUATION

Improving Dynamical Core Scalability, Accuracy, and Limi:ng Flexibility with the ADER- DT Time Discre:za:on

Diffusion / Parabolic Equations. PHY 688: Numerical Methods for (Astro)Physics

Thomas Pierro, Donald Slinn, Kraig Winters

TAU Solver Improvement [Implicit methods]

Fluid Dynamics: Theory, Computation, and Numerical Simulation Second Edition

Hierarchical Reconstruction with up to Second Degree Remainder for Solving Nonlinear Conservation Laws

PDE Solvers for Fluid Flow

Level Set and Phase Field Methods: Application to Moving Interfaces and Two-Phase Fluid Flows

Fourier analysis for discontinuous Galerkin and related methods. Abstract

A recovery-assisted DG code for the compressible Navier-Stokes equations

Francis X. Giraldo,

Multi-D MHD and B = 0

A semi-implicit non-hydrostatic covariant dynamical kernel using spectral representation in the horizontal and a height based vertical coordinate

arxiv: v4 [physics.flu-dyn] 23 Mar 2019

ARTICLE IN PRESS Mathematical and Computer Modelling ( )

Solving the Euler Equations!

Semi-implicit Krylov Deferred Correction Methods for Ordinary Differential Equations

A Stable Spectral Difference Method for Triangles

Development of the Nonhydrostatic Unified Model of the Atmosphere (NUMA): Limited-Area Mode

Numerical Mathematics

ENO and WENO schemes. Further topics and time Integration

Solution of Two-Dimensional Riemann Problems for Gas Dynamics without Riemann Problem Solvers

ICES REPORT A Multilevel-WENO Technique for Solving Nonlinear Conservation Laws

A Fifth Order Flux Implicit WENO Method

Positivity-preserving high order schemes for convection dominated equations

Adaptive WENO Schemes for Singular in Space and Time Solutions of Nonlinear Degenerate Reaction-Diffusion Problems

Numerical Solutions for Hyperbolic Systems of Conservation Laws: from Godunov Method to Adaptive Mesh Refinement

Improved Seventh-Order WENO Scheme

Hierarchical Reconstruction with up to Second Degree Remainder for Solving Nonlinear Conservation Laws

A class of the fourth order finite volume Hermite weighted essentially non-oscillatory schemes

Comparison of (Some) Algorithms for Edge Gyrokinetics

Convergence Behavior of a Two-Level Optimized Schwarz Preconditioner

Synchronization of Weighted Essentially Non-Oscillatory Methods

Partial differential equations

Numerical simulation of wave breaking in turbulent two-phase Couette flow

Modelling and numerical methods for the diffusion of impurities in a gas

A Finite-Element based Navier-Stokes Solver for LES

A parametrized maximum principle preserving flux limiter for finite difference RK-WENO schemes with applications in incompressible flows.

The behaviour of high Reynolds flows in a driven cavity

Shock Capturing for Discontinuous Galerkin Methods using Finite Volume Sub-cells

Advanced numerical methods for nonlinear advectiondiffusion-reaction. Peter Frolkovič, University of Heidelberg

Continuous and Discontinuous Galerkin Methods for Atmospheric Modeling

Lecture 16. Wednesday, May 25, φ t + V φ = 0 (1) In upwinding, we choose the discretization of the spatial derivative based on the sign of u.

Part 1. The diffusion equation

The RAMSES code and related techniques I. Hydro solvers

New variables in spherical geometry. David G. Dritschel. Mathematical Institute University of St Andrews.

Nonlinear Wave Theory for Transport Phenomena

An Improved Non-linear Weights for Seventh-Order WENO Scheme

An evaluation of a conservative fourth order DNS code in turbulent channel flow

A finite-volume algorithm for all speed flows

A finite difference Poisson solver for irregular geometries

CHAPTER 7 SEVERAL FORMS OF THE EQUATIONS OF MOTION

On the positivity of linear weights in WENO approximations. Abstract

AA214B: NUMERICAL METHODS FOR COMPRESSIBLE FLOWS

Nonlinear Frequency Domain Methods Applied to the Euler and Navier-Stokes Equations p.1/50

Colloquium FLUID DYNAMICS 2012 Institute of Thermomechanics AS CR, v.v.i., Prague, October 24-26, 2012 p.

Krylov single-step implicit integration factor WENO methods for advection-diffusion-reaction equations

Transcription:

Scalable Non-Linear Compact Schemes Debojyoti Ghosh Emil M. Constantinescu Jed Brown Mathematics Computer Science Argonne National Laboratory International Conference on Spectral and High Order Methods Salt Lake City, Utah, June 23 27, 2014

Motivation Numerical Solution of Compressible Turbulent Flows Atmospheric flows, Aircraft and Rotorcraft wake flows Characterized by large range of length scales Convection and interaction of eddies Compressibility à Shock waves Shocklets Thin shear layers à High gradients in flow High order accurate finite-difference solver Shock-Turbulence Interaction (Stanford University) High spectral resolution for accurate capturing of smaller length scales Non-oscillatory solution across shock waves and shear layers Low dissipation errors for preservation of flow structures over large distances Shock-Turbulent B o u n d a r y L a y e r Interaction (Brandon Morgan, Stanford University and Nagi Mansour, NASA) Rising Thermal Bubble in Hydrostatically Balanced Atmosphere 2

Background Cell interfaces Cell centers 0 j j+1 Δx N j-1 j-1/2 j+1/2 Conservative finite-difference discretization of a Hyperbolic Conservation Law u t + f (u) x = 0; f '(u) R du j dt + 1 $ Δx f (x,t) f (x,t) % j+1/2 j1/2 = 0 Weighted Essentially N o n - O s c i l l a t o r y (WENO) Schemes Compressible Turbulent Flows Compact Finite- Difference Schemes Hybrid Compact- WENO Schemes Disadvantages: loss of spectral resolution near discontinuities Weighted Compact Non-Linear Schemes Disadvantage: Relatively Poor spectral resolution Compact-Reconstruction W E N O ( C RW E N O ) Schemes 3

Compact-Reconstruction WENO (CRWENO) Schemes 1/2 3/2 j-1/2 j+1/2 j+3/2 N-1/2 N+1/2 0 1 j-1 j j+1 N-1 N N+1 General form of a conservative compact scheme: ( ) = B f jn,, f j,, f j+n A ˆf j+1/2m,, ˆf j+1/2,, ˆf j+1/2+m ( ) A [ ] ˆf = B [ ]f At each interface, r possible (r)-th order compact interpolations, combined using optimal weights c k to yield (2r-1)-th order compact interpolation scheme: r ( ) r c k A k ˆf j+1/2m,, ˆf r j+1/2+m = c k B k k=1 r k=1 ( f jn,, f j+n ) A 2r1 ( ˆf j+1/2m,, ˆf j+1/2+m ) = B 2r1 f jn,, f j+n ( ) Apply WENO algorithm on the optimal weights c k scale them according to local smoothness r r ω α k = k A k ˆf j+1/2m,, ˆf r r j+1/2+m = ω k B k ( f jn,, f j+n ) k=1 ( ) k=1 c k β k +ε ( ) ; ω p k = α k / α k k 4

5 th Order CRWENO Scheme (CRWENO5) j j+1/2 2 3 f j1/2 + 1 3 f j+1/2 = 1 6 f j1 + 5 6 f j 1 3 f j1/2 + 2 3 f j+1/2 = 5 6 f j + 1 6 f j+1 2 3 f j+1/2 + 1 3 f j+3/2 = 1 6 f j + 5 6 f j+1 c 1 = 2 10 c 2 = 5 10 c 3 = 3 10 3 10 f + 6 j1/2 10 f + 1 j+1/2 10 f = 1 j+3/2 30 f + 19 j1 30 f + 10 j 30 f j+1 5

5 th Order CRWENO Scheme (CRWENO5) j j+1/2! " 2 3 ω 1 + 1 3 ω 2 2 3 f j1/2 + 1 3 f j+1/2 = 1 6 f j1 + 5 6 f j 1 3 f j1/2 + 2 3 f j+1/2 = 5 6 f j + 1 6 f j+1 2 3 f j+1/2 + 1 3 f j+3/2 = 1 6 f j + 5 6 f j+1 c 1 = 2 10 c 2 = 5 10 c 3 = 3 10 ω 1 ω 2 ω 3 $ % f +! 1 j1/2 3 ω + 2 1 3 (ω +ω ) $ " 2 f 3 j+1/2 + 1 % 3 ω 3 f j+3/2 = ω 1 6 f j1 + 5(ω +ω ) 1 2 6 f j + ω 2 + 5ω 3 6 f j+1 6

Numerical Analysis Taylor Series Fourier analysis WENO5 1 6 f 60 x 6 j Δx 5 CRWENO5 1 6 f 600 x 6 j Δx 5 è WENO5 requires ~ 1.5 times more grid points per dimension to yield a solution of comparable accuracy as the CRWENO5 scheme. è Time step size limit is ~ 1.6 times smaller for CRWENO than WENO5 7

Preliminary Results doi:10.1137/110857659 doi:10.1007/s10915-014-9818-0 doi:10.2514/6.2012-2832 Periodic density wave advection Shu-Osher Problem Flow around Harrington Rotor WENO5 Lower absolute errors for smooth problems for the same order of convergence Resolution of small length scales improved Better preservation of f l o w f e a t u r e s ( s h e d vortices) CRWENO5 Vortex shedding from a flapping wing WENO5 CRWENO5 8

Scalable Parallel Implementation CRWENO5 needs a tridiagonal solution at each time-integration stage/step Treat sub-domain boundary as physical boundary Decouple system of equations across processors (biased compact or non-compact schemes at MPI boundaries) Drawback: Numerical properties of the scheme function of number of processors Good for small number of processors, numerical errors grow or spectral resolution falls as number of processors increase for same problem size Data Transposition Transpose pencils of data such that entire system of equations is collected on one processor. Huge communication cost Parallel Implementation of the Thomas Algorithm Pipelined Thomas Algorithm (PTA) (Povitsky Morris, JCP, 2000): Used a complicated static schedule to use idle times of processors to carry out computations Trade-off between computation communication efficiencies Parallel Diagonally Dominant (PDD) (Sun Moitra, NASA Tech. Rep., 1996): Solve a perturbed linear system that introduces an error due to assumption of diagonal dominance Other implementations of tridiagonal solvers not applied to compact schemes Increased mathematical complexity compared to the serial Thomas algorithm Existing approaches do not scale well! 9

Parallel Tridiagonal Solver 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 4! b 0 c $! 0 a 1 b 1 c 1 a 2 b 2 c 2 Processor 0 a 3 b 3 c 3 b 5 c 5 a 5 a 6 b 6 c 6 Processor 1 a 7 b 7 c 7 b 9 c 9 a 9 a 10 b 10 c 10 Processor 2 a 11 b 11 c 11 b 13 c 13 a 13 a 14 b 14 c 14 Processor 3 a 15 b 15 c 15 b 17 c 17 a 17 a 18 b 18 c 18 Processor 4 a 19 b 19 c 19 a 20 b 20 a 4 c 4 b 4 a 8 c 8 b 8 a 12 c 12 b 12 " a 16 c 16 b 16 %" Processor 0 Processor 1 Processor 2 Processor 3 Processor 4 x 0 x 1 x 2 x 3 x 5 x 6 x 7 x 9 x 10 x 11 x 13 x 14 x 15 x 17 x 18 x 19 x 20 x 4 x 8 x 12 x 16 $! = % " Solution of reduced system is critical to scalability One row on each processor à Communication intensive r 0 r 1 r 2 r 3 r 5 r 6 r 7 r 9 r 10 r 11 r 13 r 14 r 15 r 17 r 18 r 19 r 20 r 4 r 8 r 12 r 16 $ %! " b 0 c 0 ˆb 1 c 1 ˆb 2 c 2 ˆb 3 c 3 Processor 3 Processor 4 b 5 c 5 a 5 ˆb 6 c 6 â 6 ˆb 7 â 7 c 7 Reduced Tridiagonal System Direct/Exact solutions to the reduced system (Cyclic Reduction / Recursive-Doubling Algorithm, Gather-and-Solve) - Do not scale well! Processor 0 Processor 1 b 9 c 9 a 9 ˆb 10 c 10 â 10 Processor 2 ˆb 11 â 11 c 11 $!a! 12 b12!c 12!a! 16 b16 % b 13 c 13 a 13 ˆb 14 c 14 â 14 ˆb 15 â 15 c 15 b 17 c 17 a 17 ˆb 18 c 18 â 18 Iterative Subs t r u c t u r i n g Method ˆb 19 c 19 â 19 ˆb 20 â 20!b 4!c 4!a 8! b8!c 8 10

Iterative Solution of the Reduced System Reduced system represents the coupling between the first interface of every subdomain through an approximation to a hyperbolic flux. 16 64 For large subdomain sizes, very diagonally dominant Diagonal dominance decreases as sub-domain size grows smaller (Number of processors increase for same problem size) 128 256 N=65,536, Number of processors: 32-16384 Non-machine-zero elements of an arbitrarily chosen column of the inverse of the reduced system (N=1024) Number of Jacobi iterations required for an exact solution increases as subdomain size grows smaller. à Cost of the tridiagonal solver increases 11

Performance Analysis Governing Equations: Inviscid Euler Equations Platform: ALCF Vesta (IBM BG/Q) CRWENO5 scheme on N points per dimension WENO5 scheme on fn points per dimension On a single processor, CRWENO5 is f > 1 (f ~ 1.5 for smooth more efficient (error vs. wall time). solutions) WENO5 yields Cost of the tridiagonal solver increases as number of processors increases solutions of comparable accuracy/resolution with f times more points At a critical sub-domain size, the CRWENO5 becomes less efficient than the WENO5 scheme. Non-compact scheme, so almost ideal scalability expected. Same number of processors for the CRWENO5 scheme (on N points) and the WENO5 scheme (on fn points) Given p processors, is it faster to obtain a solution of given accuracy/ resolution with the WENO5 or CRWENO5 scheme? 12

Performance Comparison for a Smooth Problem Periodic Advection of a Sinusoidal Density Wave 1D WENO5 yields solutions with comparable accuracy with ~ 1.5 times as many points 2D WENO5 yields solutions with comparable accuracy with ~1.5 2 = 2.25 times as many points Time step size Δt is taken the same for the CRWENO5 scheme on N points and the WENO5 scheme on fn points because of the linear stability limit. Solutions are obtained after one cycle over the periodic domain with the SSP-RK3 scheme It is verified that the errors are exactly the same for the various number of processors considered à There are no parallelization errors (Number of Jacobi iterations are specified a priori to ensure this) Effect of Dimensionality The factor by which the grid needs for WENO5 needs to be refined to yield comparable solutions as CRWENO5 is f D (f times per dimension) Several tridiagonal systems are solved for multi-dimensional problems à Higher arithmetic density à Cost increases sub-linearly with number of systems 13

Performance Analysis for a Smooth Problem 1D Cases: Number of grid points and number of processors considered 64 (96) 1 2 4 8 128 (192) 1 2 4 8 16 256 (384) 1 2 4 8 16 32 2D Cases: Number of grid points and number of processors considered 64 2 (96 2 ) 4 16 64 256 128 2 (192 2 ) 16 64 256 1024 256 2 (384 2 ) 64 256 1024 4096 Note: Critical sub-domain size is insensitive to global problem size 14

Scalability Results for Benchmark Flow Problems Isentropic Vortex Convection Vortex convects 1000x its core radius Density Verified that WENO5 yields a solution of comparable accuracy on a grid with ~ 1.5 3 x (~ 3.4x) more grid points ALCF/Mira (IBM BG/Q) (~ 8k to 500k cores) 8192x64x64 CRWENO5 8192x64x64 WENO5 12288x96x96 Strong Scaling: At very small subdomain sizes, CRWENO5 does not scale as well, yet is more efficient / has lower absolute walltime Weak Scaling: CRWENO5 shows excellent weak scaling 64 (4 3 ) points per processor 15

Application: Atmospheric Modeling Non-hydrostatic Unified Model of the Atmosphere (NUMA) Giraldo, Restelli, Lauter, SIAM J. Sci. Comp. (2010) Non-Stiff Stiff Explicit time-integration schemes (SSPRK3, RK4) Implicit-Explicit (IMEX) time-integration schemes (ARKIMEX) 16

IMEX Formulation with WENO5/CRWENO5 Semi-discrete ODE Even though L(q) is linear, the spatial discretization scheme is non-linear Linearizing the WENO5/ CRWENO5 operator Consistency of L and R Example: 2-stage ARKIMEX method PETSc TSARKIMEX Non-Oscillatory? (for large time steps) 17

Inertia Gravity Waves Periodic channel 300 km x 10 km No-flux boundary conditions at top and bottom boundaries Mean horizontal velocity of 20 m/s in a uniformly stratified atmosphere Initial solution Potential temperature perturbation Solutions obtained at 3000 s Solution obtained with CRWENO5 and ARKIMEX 2C (CFL ~ 8.5) Cross-sectional potential temperature perturbation (y = 5000) Reference solution: Giraldo, Restelli, J. Comput. Phys. (2008) Spectral element method (10 th order polynomials and 250 m resolution) Grid: 1200 x 50 points Explicit time-integration schemes: SSPRK3 and RK4 (at CFL ~ 0.4) IMEX time-integration schemes: ARKIMEX 2C, 3, and 4 (at CFL ~ 8.5) Spatial schemes: WENO5 and CRWENO5 Excellent agreement with reference solution CRWENO5 and WENO5 give similar resolution of perturbations 18

Inertia Gravity Wave: Error Analysis Grid: 8192 x 256 (to minimize spatial discretization errors) Optimal orders of convergence observed for IMEX schemes IMEX methods stable over a large range of time step sizes Reference solution: SSPRK3 with time step size 0.0005 IMEX methods conserve mass for all time steps considered WENO5 CRWENO5 O(10-11 ) 2 nd order 3 rd order 4 th order 19

Inertia Gravity Waves: Computational Efficiency Numerical cost vs. errors IMEX and Explicit schemes Current implementation has no preconditioning wall times stagnate for large time steps Optimize implementation: Evaluation of Jacobian, calculation of WENO weights Giraldo, Kelly, Constantinescu, SIAM J. Sci. Comp. (2013) Efficiency demonstrated for a spectral element code in 2D and 3D ARKIMEX4 shown to be more efficient than RK35 z (m) 1000 800 600 400 200 0 0 200 400 600 800 1000 x (m) Rising Thermal Bubble 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 20

Scalability and Adaptive Time-Stepping (Preliminary) Strong Scaling: (ALCF/Mira) Grid: 8192 x 256 points RK4 at CFL ~ 0.4 ARKIMEX2c at CFL ~ 8.5 Number of cores: 512 to 32,768 Adaptive Time Stepping: Grid: 1200 x 50 points Tolerance: 1e-10 Constant Time Step (Δt = 0.006) Adaptive Time Step 64 (8 2 ) points per processor Error 1.245E-07 6.727E-08 ~ 0.5x Time Steps 1653 1653 Function Calls 36366 38896 ~1.1x 0.033 Time step size vs. time GMRES Iterations 79674 87983 ~1.1x Δt = 0.006 0.0026 21

Conclusions Compressible Turbulent Flows Non-Oscillatory (WENO Schemes) High Spectral Resolution (Compact schemes) Compact-Reconstruction WENO Schemes High spectral resolution Improved resolution of small length scales Non-oscillatory interpolation across discontinuities and steep gradients No parallelization-induced errors (however, need a priori estimate on the number of Jacobi iterations for the reduced system) Excellent strong and weak scaling compared to a non-compact scheme; at very small subdomain sizes, retains higher parallel efficiency despite relatively poorer scaling WENO5/CRWENO5 + Implicit-Explicit Time Integration - Achieves theoretical orders of convergence, stable for a large range of time step sizes - Need to optimize implementation (evaluation of Jacobian, application of pre-conditioner) - Preliminary scaling results encouraging Implicit in space and time scales well - Error control through adaptive time stepping preliminary results show promise 22

Thank you! Acknowledgements U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research (at Argonne National Laboratory) Argonne Leadership Computing Facility Dr. Paul Fischer (Argonne National Laboratory) Dr. Frank Giraldo (Naval Postgraduate School) 23