Newton-Krylov-Schwarz Method for a Spherical Shallow Water Model

Similar documents
UNCORRECTED PROOF. A Fully Implicit Compressible Euler Solver for 2 Atmospheric Flows * 3. 1 Introduction 9. Chao Yang 1,2 and Xiao-Chuan Cai 2 4

NKS Method for the Implicit Solution of a Coupled Allen-Cahn/Cahn-Hilliard System

2 CAI, KEYES AND MARCINKOWSKI proportional to the relative nonlinearity of the function; i.e., as the relative nonlinearity increases the domain of co

Domain-decomposed Fully Coupled Implicit Methods for a Magnetohydrodynamics Problem

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

Two-level multiplicative domain decomposition algorithm for recovering the Lamé coefficient in biological tissues

Downloaded 10/31/15 to Redistribution subject to SIAM license or copyright; see

Convergence Behavior of a Two-Level Optimized Schwarz Preconditioner

An Accelerated Block-Parallel Newton Method via Overlapped Partitioning

Newton s Method and Efficient, Robust Variants

A Discontinuous Galerkin Global Shallow Water Model

Additive Schwarz Methods for Hyperbolic Equations

Contents. Preface... xi. Introduction...

One-Shot Domain Decomposition Methods for Shape Optimization Problems 1 Introduction Page 565

Linear Solvers. Andrew Hazel

A Robust Preconditioner for the Hessian System in Elliptic Optimal Control Problems

Solving Large Nonlinear Sparse Systems

The hybridized DG methods for WS1, WS2, and CS2 test cases

Semi-implicit Krylov Deferred Correction Methods for Ordinary Differential Equations

A Fast Spherical Filter with Uniform Resolution

Lecture 8: Fast Linear Solvers (Part 7)

A Nodal High-Order Discontinuous Galerkin Dynamical Core for Climate Simulations

WHEN studying distributed simulations of power systems,

High Performance Nonlinear Solvers

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers

Implicit Solution of Viscous Aerodynamic Flows using the Discontinuous Galerkin Method

Scalable Non-Linear Compact Schemes

COMPARISON OF FINITE DIFFERENCE- AND PSEUDOSPECTRAL METHODS FOR CONVECTIVE FLOW OVER A SPHERE

Domain Decomposition Algorithms for an Indefinite Hypersingular Integral Equation in Three Dimensions

THE solution of the absolute value equation (AVE) of

Deflated Krylov Iterations in Domain Decomposition Methods

Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools

A class of the fourth order finite volume Hermite weighted essentially non-oscillatory schemes

33 RASHO: A Restricted Additive Schwarz Preconditioner with Harmonic Overlap

Some Geometric and Algebraic Aspects of Domain Decomposition Methods

ADAPTIVE ACCURACY CONTROL OF NONLINEAR NEWTON-KRYLOV METHODS FOR MULTISCALE INTEGRATED HYDROLOGIC MODELS

RESEARCH ARTICLE. A strategy of finding an initial active set for inequality constrained quadratic programming problems

A Multiscale Mortar Method And Two-Stage Preconditioner For Multiphase Flow Using A Global Jacobian Approach

High Order Accurate Runge Kutta Nodal Discontinuous Galerkin Method for Numerical Solution of Linear Convection Equation

Schur Complement Technique for Advection-Diffusion Equation using Matching Structured Finite Volumes

AMS Mathematics Subject Classification : 65F10,65F50. Key words and phrases: ILUS factorization, preconditioning, Schur complement, 1.

Newton additive and multiplicative Schwarz iterative methods

Overlapping Schwarz preconditioners for Fekete spectral elements

The solution of the discretized incompressible Navier-Stokes equations with iterative methods

A Parallel Domain Decomposition Method for 3D Unsteady Incompressible Flows at High Reynolds Number

Indefinite and physics-based preconditioning

A Numerical Study of Some Parallel Algebraic Preconditioners

XIAO-CHUAN CAI AND MAKSYMILIAN DRYJA. strongly elliptic equations discretized by the nite element methods.

514 YUNHAI WU ET AL. rate is shown to be asymptotically independent of the time and space mesh parameters and the number of subdomains, provided that

Acceleration of Time Integration

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems

Inexactness and flexibility in linear Krylov solvers

Residual iterative schemes for largescale linear systems

Finding Rightmost Eigenvalues of Large, Sparse, Nonsymmetric Parameterized Eigenvalue Problems

A high order adaptive finite element method for solving nonlinear hyperbolic conservation laws

On the Preconditioning of the Block Tridiagonal Linear System of Equations

On the performance of the Algebraic Optimized Schwarz Methods with applications

New Streamfunction Approach for Magnetohydrodynamics

Interior-Point Methods as Inexact Newton Methods. Silvia Bonettini Università di Modena e Reggio Emilia Italy

A numerical study of SSP time integration methods for hyperbolic conservation laws

Atmospheric Dynamics with Polyharmonic Spline RBFs

Optimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations

Downloaded 08/04/16 to Redistribution subject to SIAM license or copyright; see

Machine Learning Applied to 3-D Reservoir Simulation

Numerical Methods for Large-Scale Nonlinear Equations

Short title: Total FETI. Corresponding author: Zdenek Dostal, VŠB-Technical University of Ostrava, 17 listopadu 15, CZ Ostrava, Czech Republic

Optimal Left and Right Additive Schwarz Preconditioning for Minimal Residual Methods with Euclidean and Energy Norms

A spectral finite volume transport scheme on the cubed-sphere

Multigrid solvers for equations arising in implicit MHD simulations

M.A. Botchev. September 5, 2014

arxiv: v1 [hep-lat] 19 Jul 2009

Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices

The Deflation Accelerated Schwarz Method for CFD

A primal-dual mixed finite element method. for accurate and efficient atmospheric. modelling on massively parallel computers

Additive Average Schwarz Method for a Crouzeix-Raviart Finite Volume Element Discretization of Elliptic Problems

Development of Yin-Yang Grid Global Model Using a New Dynamical Core ASUCA.

Integration of Sequential Quadratic Programming and Domain Decomposition Methods for Nonlinear Optimal Control Problems

A Fully Implicit Domain Decomposition Based ALE Framework for Three-dimensional Fluid-structure Interaction with Application in Blood Flow Computation

Preface to the Second Edition. Preface to the First Edition

2 integration methods estimate and control the temporal error through adapting both the temporal step size and the order of the temporal approximation

Dedicated to the 70th birthday of Professor Lin Qun

A Parallel Scalable PETSc-Based Jacobi-Davidson Polynomial Eigensolver with Application in Quantum Dot Simulation

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems

Progress in Parallel Implicit Methods For Tokamak Edge Plasma Modeling

Domain Decomposition Preconditioners for Spectral Nédélec Elements in Two and Three Dimensions

20. A Dual-Primal FETI Method for solving Stokes/Navier-Stokes Equations

Fine-grained Parallel Incomplete LU Factorization

Preconditioned inverse iteration and shift-invert Arnoldi method

Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning

Implementation of Implicit Solution Techniques for Non-equilibrium Hypersonic Flows

Structure preserving preconditioner for the incompressible Navier-Stokes equations

NEWTON-GMRES PRECONDITIONING FOR DISCONTINUOUS GALERKIN DISCRETIZATIONS OF THE NAVIER-STOKES EQUATIONS

A Proposed Test Suite for Atmospheric Model Dynamical Cores

Iterative Methods for Solving A x = b

Compressible Navier-Stokes (Euler) Solver based on Deal.II Library

A Novel Approach for Solving the Power Flow Equations

Department of Computer Science, University of Illinois at Urbana-Champaign

Is there life after the Lanczos method? What is LOBPCG?

An Iterative Substructuring Method for Mortar Nonconforming Discretization of a Fourth-Order Elliptic Problem in two dimensions

An Efficient Graph Sparsification Approach to Scalable Harmonic Balance (HB) Analysis of Strongly Nonlinear RF Circuits

Transcription:

Newton-Krylov-Schwarz Method for a Spherical Shallow Water Model Chao Yang 1 and Xiao-Chuan Cai 2 1 Institute of Software, Chinese Academy of Sciences, Beijing 100190, P. R. China, yang@mail.rdcps.ac.cn 2 Department of Computer Science, University of Colorado at Boulder, Boulder, CO 80309, USA, cai@cs.colorado.edu 1 Introduction In this paper we study the application of Newton-Krylov-Schwarz method to fully implicit, fully coupled solution of a global shallow water model. In particular, we are interested in developing a scalable parallel solver when the shallow water equations (SWEs) are discretized on the cubed-sphere grid using a second-order finite volume method. 2 Governing Equations The cubed-sphere grid of gnomonic type [ 7, 8] is used in this study. The grid is generated by mapping the six faces of an inscribed cube to the sphere surface using gnomonic projection. The six expanded patches are continuously attached together with proper boundary conditions. On each patch, the expressions of the SWEs in local curvilinear coordinates (x, y) [ π/4,π/4] 2 are identical. When no bottom topography is involved, the SWEs can be written in the following conservative form: Q t + 1 (ΛF ) + 1 (ΛG) + S =0, (1) Λ x Λ y with Q = h hu hv hu,f= huu + 1 2 gg11 h 2,G= huv + 1 hv huv + 1 2 gg12 h 2 2 gg12 h 2,S= 0 S 1, hvv + 1 2 gg22 h 2 The first author was supported in part by NSFC grant 10801125, in part by 973 China grant 2005CB321702, and in part by 863 China grants 2006AA01A125. The second author was supported in part by DOE under DE-FC-02-06ER25784, and in part by NSF under grants CCF-0634894 and DMS 0913089. S 2

150 C. Yang and X.-C. Cai and S 1 = Γ11(huu)+2Γ 1 12(huv)+fΛ 1 ( g 12 hu g 11 hv ), S 2 =2Γ12 2 (huv)+γ 22 2 (hvv)+fλ( g 22 hu g 12 hv ). Here h is the fluid thickness, (u, v) are contravariant components of the fluid velocity, g is the gravitational constant and f is the Coriolis parameter due to the rotation of the sphere. The variable coefficients g mn, Λ and Γmn l are only dependent on the curvilinear coordinates [12]. 3 Discretizations A uniform rectangular N N grid is used on each patch. Grid cell C ij is centered in (x i,y j ), i, j =1,,N, with grid size Δx = Δy = π/2n. The approximate solution in cell C ij at time t is defined as Q ij 1 Λ ij ΔxΔy yj+δy/2 xi+δx/2 y j Δy/2 x i Δx/2 Λ(x, y)q(x, y, t) d x d y, where Λ ij is evaluated at the cell center of C ij. Then we have the following semidiscrete system of the SWEs: Q ij t + (ΛF ) i+ 1 2,j (ΛF ) i 1 2,j + (ΛG) i,j+ 1 (ΛG) 2 i,j 1 2 Λ ij Λ ij + S ij =0. (2) Here the numerical fluxes are approximated using the Osher s Riemann solver [ 5, 6], i.e., (ΛF ) i+ 1 2,j = Λ i+ 1 2,j F (o) (Q i+ 1 2,j,Q+ i+ 1,j) =Λ i+ 1 2 2,j F (Q i+ 1,j), 2 with h = 1 [ 1 ( u 4gg 11 u +) + gg 2 11 h + ] 2 gg 11 h +, u = 1 ( u + u +) + gg 2 11 h gg 11 h +, v + g12 v g (u u ), if u 0 = 11 v + + g12 g (u u + ), otherwise, 11 where we assume u < gg 11 h. The calculation of G follows an analogous way, see [12] for details. The following two reconstruction methods for constant states are considered in this study: Piecewise constant method (first order): Q i+ 1 2,j = Q+ i 1 2,j = Q ij. (3)

Newton-Krylov-Schwarz Method for a Spherical Shallow Water Model 151 Piecewise linear method (second order): Q = Q i+ 1 2,j ij + Q i+1,j Q i 1,j, Q + 4 = Q i 1 2,j ij Q i+1,j Q i 1,j 4 (4) On each patch interface, one layer of ghost cells is needed and the numerical fluxes are calculated symmetrically across the interface to insure the numerical conservation of total mass, see [11] for details. Given a semi-discrete system Q + L(Q) =0, t we use the following second-order backward differentiation formula (BDF-2) for the temporal integration: 1 ( 3Q (m) 4Q (m 1) + Q (m 2)) + L(Q (m) )=0. (5) 2Δt Here Q (m) denotes Q evaluated at m-th time step with a fixed time step size Δt. Only at the first time step, a first-order backward Euler (BDF-1) is used. 4 Nonlinear Solver Fully implicit method enjoys an advantage that the time-step size is no longer constrained by the CFL condition. The price to pay is that a large sparse nonlinear algebraic system has to be solved at each time step. In this study, we use Newton-Krylov- Schwarz (NKS) algorithm as the nonlinear solver. In the NKS algorithm, to solve a nonlinear system F(X) =0, an inexact Newton s method is used in the outer loop. Let X n be the approximate solution for the n-th Newton iterate, we find the next solution X n+1 as X n+1 = X n + λ n s n, n =0, 1,... (6) where λ n is the steplength decided by a linesearch procedure and s n is the Newton correction. We then use the right-preconditioned GMRES (restarted every 30 iterations) method to solve the Jacobian system J n M 1 (Ms n )= F(X n ), J n = F (X n ) until the linear residual r n = J n s n + F(X n ) satisfies r n η F(X n ). We implement a hand-coded analytic method to generate the Jacobian J n in the calculation. The accuracy (relative tolerance) of the Jacobian solver is determined uniformly by the nonlinear forcing terms η =10 3. Some more flexible methods

152 C. Yang and X.-C. Cai such as that of [2] may be used to get more efficient or more robust solutions. The Newton iteration (6) ends when the following stopping condition is satisfied F(X n+1 ) max{ε r F(X 0 ),ε a }, where ε r,ε a 0 are nonlinear tolerances. To achieve uniform residual error at each time step, we use adaptive stopping conditions with both lower and upper adjustments in the NKS method. To do a lower adjustment, we do not let the iteration stop until F(X n+1 ) 1.0 10 5 even when the relative tolerance of ε r =10 7 is satisfied. An upper adjustment can be done by setting the absolute tolerance to be ε (0) a =10 8 at the first time step and then lettting it adaptively be decided by ε (m) a max{ε (m 1) a, F(X (m 1) ) }, where X (m 1) is the converged solution of previous time step. The preconditioner M 1 is obtained by using the restricted additive Schwarz (RAS, [1, 9]) method based on the domain decomposition of the cubed-sphere described briefly here. The six patches of the cubed-sphere can be either simultaneously [12] or independently divided into non-overlapping subdomains. In this study, the six patches are treated in a separated way, i.e., the six patches are respectively decomposed into p non-overlapping rectangular subdomains. Each subdomain is then mapped onto one processor. Thus 6p is the total number of processors and subdomains as well. An overlapping decomposition can be obtained by extending each subdomain with δ layers of grid points in all directions. It should be noted that the overlapping area might lie on other patches and directions might also change. In practice we use a point-wise ordering for both unknowns and the nonlinear equations, resulting in Jacobian matrices with 3 3-block entries. Subdomain solves are done by LU factorizations or incomplete LU (ILU) factorizations with fill-in level l. Here the LU and the ILU factorizations are done in a point-wise manner, i.e., fill-ins are always 3 3 blocks rather than scalars. 5 Numerical Results Our numerical tests are carried out on an IBM BlueGene/L supercomputer with 4,096 nodes. Each node has a dual-core IBM PowerPC 440 processor running at 700 MHz and with 512 MB of memory. We use the 4-wave Rossby Haurwitz problem in [10] as the test case in this study. The characteristic time and length scale is one day and the Earth s radius. The result on day 14 is provided in Fig. 1, consistent with the reference solutions in [3, 4]. To test the performance of the preconditioner, we use 192 processor cores to run a fixed size problem on a 512 512 6 grid for 10 time steps with Δt =0.1 days

Newton-Krylov-Schwarz Method for a Spherical Shallow Water Model 153 Fig. 1. Height field of Rossby Haurwitz problem on day 14, grid size 128 128 6, time step size Δt =0.1 days. The contour levels are from 8,300 to 10,500 m with an interval of 100 m. The four innermost lines near to the equators are at 10,500 m. repeatedly with various levels of overlaps and fill-in ratios. First we try to use RAS preconditioner obtained directly from the Jacobian matrix J n. In this case the ILU factorizations of the subdomain problems results in many GMRES iterations. If we use LU factorization instead, however, the factorization may fail due to insufficient memories and the performance is very poor even when the factorization succeeds. Thus we use Jacobian matrix with first-order spatial discretization to construct the RAS preconditioner even when a high order scheme is used in the nonlinear function evaluation. This is based on the fact that Jacobian matrices are all related to the original SWEs no matter what spatial discretization is used. As it can be seen in Tables 1 and 2, the RAS precoditioner works in the NKS algorithm. Larger overlaps or subdomain ILU fill-ins help in reducing the number of GMRES iteration. However, the per-iteration work increases at the same time. The optimal choice in terms of computing time for this test is ILU(3) subdomain solvers with overlapping factor 2. Using the optimal parameters, we run a set of large-scale tests with the same problem on a 1,024 1,024 6 grid with gradually increased number of processor cores. As seen from Fig. 2, our solver scales up to 6,144 processor cores almost linearly with parallel efficiency 73.8%.

154 C. Yang and X.-C. Cai Table 1. The number of GMRES iterations per Newton iteration, averaged over the first 10 time steps. Overlap 0 1 2 3 4 ILU(0) 309.5 299.2 294.4 292.7 291.4 ILU(1) 199.6 178.0 171.7 168.4 166.0 ILU(2) 194.1 150.2 141.0 139.3 137.8 ILU(3) 188.1 134.5 125.3 122.3 120.4 ILU(4) 184.9 127.0 116.1 111.8 110.9 LU 139.9 87.8 78.7 76.2 75.4 Table 2. The averaged compute time (in seconds) over the first 10 time steps. Overlap 0 1 2 3 4 ILU(0) 10.13 11.13 11.38 11.65 11.90 ILU(1) 7.99 8.36 8.43 8.58 8.67 ILU(2) 8.13 7.84 7.79 7.94 8.09 ILU(3) 8.51 7.80 7.74 7.84 7.99 ILU(4) 8.90 7.96 7.83 7.87 8.04 LU 10.97 9.75 9.88 10.25 10.69 100 44.41 Actual time Ideal time 21.74 Time (seconds) 10 11.47 6.14 3.49 1.88 1 192 384 768 1536 3072 6144 Number of processor cores Fig. 2. Compute time curve on the Rossby Haurwitz problem. References 1. X.-C. Cai and M. Sarkis. A restricted additive Schwarz preconditioner for general sparse linear systems. SIAM J. Sci. Comput., 21:792 797, 1999. 2. S.C. Eisenstat and H.F. Walker. Choosing the forcing terms in an inexact Newton method. SIAM J. Sci. Comput., 17:1064 8275, 1996.

Newton-Krylov-Schwarz Method for a Spherical Shallow Water Model 155 3. C. Jablonowski. Adaptive Grids in Weather and Climate Modeling. PhD thesis, University of Michigan, Ann Arbor, MI, 2004. 4. R. Jakob-Chien, J.J. Hack, and D.L. Williamson. Spectral transform solutions to the shallow water test set. J. Comput. Phys., 119:164 187, 1995. 5. S. Osher and S. Chakravarthy. Upwind schemes and boundary conditions with applications to Euler equations in general geometries. J. Comput. Phys., 50:447 481, 1983. 6. S. Osher and F. Solomon. Upwind difference schemes for hyperbolic systems of conservation laws. Math. Comput., 38:339 374, 1982. 7. C. Ronchi, R. Iacono, and P. Paolucci. The cubed sphere: A new method for the solution of partial differential equations in spherical geometry. J. Comput. Phys., 124:93 114, 1996. 8. R. Sadourny. Conservative finite-difference approximations of the primitive equations on quasi-uniform spherical grids. Mon. Wea. Rev., 100:211 224, 1972. 9. A. Toselli and O. Widlund. Domain Decomposition Methods Algorithms and Theory. Springer, Berlin, 2005. 10. D.L. Williamson, J.B. Drake, J.J. Hack, R. Jakob, and P.N. Swarztrauber. A standard test set for numerical approximations to the shallow water equations in spherical geometry. J. Comput. Phys., 102:211 224, 1992. 11. C. Yang and X.-C. Cai. A parallel well-balanced finite volume method for shallow water equations with topography on the cubed-sphere. J. Comput. Appl. Math., 2010. to appear. 12. C. Yang, J. Cao, and X.-C. Cai. A fully implicit domain decomposition algorithm for shallow water equations on the cubed-sphere. SIAM J. Sci. Comput., 32:418 438, 2010.