An Adjoint Approach for Stabilizing the Parareal Method

Similar documents
Stability of the Parareal Algorithm

PARAREAL TIME DISCRETIZATION FOR PARABOLIC CONTROL PROBLEM

Time-Parallel Algorithms for Weather Prediction and Climate Simulation

Parareal Methods. Scott Field. December 2, 2009

Time-Parallel Algorithms for Weather Prediction and Climate Simulation

New constructions of domain decomposition methods for systems of PDEs

Parallel in Time Algorithms for Multiscale Dynamical Systems using Interpolation and Neural Networks

Parallel-in-time integrators for Hamiltonian systems

Unconditionally stable scheme for Riccati equation

The Conjugate Gradient Method

Scalable solution of non-linear time-dependent systems

A method of Lagrange Galerkin of second order in time. Une méthode de Lagrange Galerkin d ordre deux en temps

Time Parallel Time Integration

Linear Solvers. Andrew Hazel

A generalization of Cramér large deviations for martingales

An example of nonuniqueness for the continuous static unilateral contact model with Coulomb friction

Expression of Dirichlet boundary conditions in terms of the strain tensor in linearized elasticity

PDE Solvers for Fluid Flow

Preconditioning for Nonsymmetry and Time-dependence

Sharp estimates of bounded solutions to some semilinear second order dissipative equations

Preconditioning Techniques Analysis for CG Method

Bounding Stability Constants for Affinely Parameter-Dependent Operators

Spacetime Computing. A Dynamical Systems Approach to Turbulence. Bruce M. Boghosian. Department of Mathematics, Tufts University

Evaluation of Chebyshev pseudospectral methods for third order differential equations

Kato s inequality when u is a measure. L inégalité de Kato lorsque u est une mesure

Dynamical Study of a Cylindrical Bar

Notes on PCG for Sparse Linear Systems

Design of optimal Runge-Kutta methods

RATIO BASED PARALLEL TIME INTEGRATION (RAPTI) FOR INITIAL VALUE PROBLEMS

Numerical Methods I Non-Square and Sparse Linear Systems

We recommend you cite the published version. The publisher s URL is:

On the uniform Poincaré inequality

A Balancing Algorithm for Mortar Methods

norges teknisk-naturvitenskapelige universitet Parallelization in time for thermo-viscoplastic problems in extrusion of aluminium

arxiv: v1 [math.oc] 21 Jan 2019

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers

A proximal approach to the inversion of ill-conditioned matrices

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Chapter 7 Iterative Techniques in Matrix Algebra

Entropy-based nonlinear viscosity for Fourier approximations of conservation laws

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Francisco de la Hoz 1. Introduction

Sébastien Chaumont a a Institut Élie Cartan, Université Henri Poincaré Nancy I, B. P. 239, Vandoeuvre-lès-Nancy Cedex, France. 1.

9.1 Preconditioned Krylov Subspace Methods

OPTIMAL CONTROL AND STRANGE TERM FOR A STOKES PROBLEM IN PERFORATED DOMAINS

L p MAXIMAL REGULARITY FOR SECOND ORDER CAUCHY PROBLEMS IS INDEPENDENT OF p

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Iterative methods for positive definite linear systems with a complex shift

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White

Parallel Algorithms for Four-Dimensional Variational Data Assimilation

Two-scale Dirichlet-Neumann preconditioners for boundary refinements

PARALLELIZATION IN TIME OF (STOCHASTIC) ORDINARY DIFFERENTIAL EQUATIONS

Implicitly Defined High-Order Operator Splittings for Parabolic and Hyperbolic Variable-Coefficient PDE Using Modified Moments

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

Generalized Ricci Bounds and Convergence of Metric Measure Spaces

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Toward black-box adaptive domain decomposition methods

Parallelizing large scale time domain electromagnetic inverse problem

Overlapping Schwarz Waveform Relaxation for Parabolic Problems

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Parallel-in-Time for Parabolic Optimal Control Problems Using PFASST

Existence of the free boundary in a diffusive ow in porous media

Glowinski Pironneau method for the 3D ω-ψ equations

2.29 Numerical Fluid Mechanics Spring 2015 Lecture 9

Optimal Interface Conditions for an Arbitrary Decomposition into Subdomains

Spectrum and Exact Controllability of a Hybrid System of Elasticity.

c 2007 Society for Industrial and Applied Mathematics

Spectral Methods for Reaction Diffusion Systems

Evolution equations with spectral methods: the case of the wave equation

Incomplete Cholesky preconditioners that exploit the low-rank property

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

HAMILTON-JACOBI EQUATIONS : APPROXIMATIONS, NUMERICAL ANALYSIS AND APPLICATIONS. CIME Courses-Cetraro August 29-September COURSES

Ordinary differential equations - Initial value problems

High-order ADI schemes for convection-diffusion equations with mixed derivative terms

A Central Compact-Reconstruction WENO Method for Hyperbolic Conservation Laws

A SEAMLESS REDUCED BASIS ELEMENT METHOD FOR 2D MAXWELL S PROBLEM: AN INTRODUCTION

Convergence Behavior of a Two-Level Optimized Schwarz Preconditioner

arxiv: v2 [gr-qc] 9 Sep 2010

Sliced-Time Computations with Re-scaling for Blowing-Up Solutions to Initial Value Differential Equations

Overlapping Schwarz preconditioners for Fekete spectral elements

On the choice of abstract projection vectors for second level preconditioners

Fine-grained Parallel Incomplete LU Factorization

Finite Differences for Differential Equations 28 PART II. Finite Difference Methods for Differential Equations

Sparse Matrix Techniques for MCAO

Acceleration of Time Integration

Kasetsart University Workshop. Multigrid methods: An introduction

A Time-Parallel Implicit Method for Accelerating the Solution of Nonlinear Structural Dynamics Problems

An asymptotic-parallel-in-time method for weather and climate prediction

Numerical solution of the Monge-Ampère equation by a Newton s algorithm

Fast Computation of Moore-Penrose Inverse Matrices

RESEARCH ARTICLE. A strategy of finding an initial active set for inequality constrained quadratic programming problems

1.6: 16, 20, 24, 27, 28

Optimal Runge Kutta Methods for First Order Pseudospectral Operators

Research Article A Two-Grid Method for Finite Element Solutions of Nonlinear Parabolic Equations

C. R. Acad. Sci. Paris, Ser. I

Quantum Systems: Dynamics and Control 1

Strong Stability-Preserving (SSP) High-Order Time Discretization Methods

Assignment on iterative solution methods and preconditioning

On the influence of eigenvalues on Bi-CG residual norms

Multipréconditionnement adaptatif pour les méthodes de décomposition de domaine. Nicole Spillane (CNRS, CMAP, École Polytechnique)

Transcription:

An Adjoint Approach for Stabilizing the Parareal Method Feng Chen a, Jan S Hesthaven b, Yvon Maday c, Allan Nielsen b a Department of Mathematics, Baruch College, New York, NY 10010, USA b Mathematics Institute of Computational Science and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland c Sorbonne Universités, UPMC Univ Paris 06, CNRS, UMR 7598, Laboratoire Jacques-Louis Lions, 4, place Jussieu 75005, Paris, France, Institut Universitaire de France and D.A.M., Brown University, Providence (RI) USA Received *****; accepted after revision +++++ Presented by????? Abstract The parareal algorithm seeks to extract parallelism in the time-integration direction of time-dependent differential equations. While it has been applied with success to a wide range of problems, it suffers from some stability issues when applied to non-dissipative problems. We express the method through an iteration matrix and show that the problematic behavior is related to the non-normal structure of the iteration matrix. To enforce monotone convergence we propose an adjoint parareal algorithm, accelerated by the Conjugate Gradient Method. Numerical experiments confirm the stability and suggest directions for further improving the performance. To cite this article: F. Chen, J.S. Hesthaven, Y. Maday, A. Nielsen, C. R. Acad. Sci. Paris, Ser. I??? (2015). Résumé L algorithme parareel en temps vise à proposer la parallélisation dans la direction temporelle de l intégration approchée déquations différentielles instationnaires. Cet algorithme a été appliqué avec succès sur une large gamme de problèmes mais souffre néanmoins de certains problèmes de stabilité lorsqu il est appliqué à des problèmes non dissipatifs. Nous exprimons l algorithme comme une méthode d itération matricielle et nous montrons que le comportement problématique est liée à la structure non-normale de la matrice d itération. Pour retrouver une convergence monotone nous proposons une version adjointe de l algorithme pararéel, accélérée par la méthode du gradient conjugué. Des expériences numériques confirment la stabilité et proposent des pistes pour améliorer les performances. Méthode adjointe pour la stabilisation de l algorithme pararéel. Pour citer cet article : F. Chen, J.S. Hesthaven, Y. Maday, A. Nielsen, C. R. Acad. Sci. Paris, Ser. I??? (2015). Email addresses: feng.chen@baruch.cuny.edu (Feng Chen), jan.hesthaven@epfl.ch (Jan S Hesthaven), maday@ann.jussieu.fr (Yvon Maday), allan.nielsen@epfl.ch (Allan Nielsen). Preprint submitted to the Académie des sciences August 8, 2015

1. Parareal in time algorithm The increasing number of processors available at large computers presents a substantial challenge for the development of applications that scale well. One potential path to increased parallelism is the development of parallel time-integration methods. Once some standard methods-of-lines approach have been applied, the problem is traditionally viewed as a strongly sequential process. Attempts to extract parallelism have nevertheless been explored : a simple example is parallel Runge-Kutta methods where independent stages allow for the introduction of parallelism [8]. The parareal method, first proposed in [4], is another and more elaborate approach to achieve a similar goal. This algorithm borrows ideas from spatial domain decomposition to construct an iterative approach for solving the temporal problem in a parallel global-in-time approach. To present the method, consider the problem u + A (t, u) = 0 t (1) u (T 0 ) = u 0 t [T 0, T ] where A : R V V is a general operator depending on u : Ω R + V with V being a Hilbert space and V its dual. Assume the existence of a unique solution u (t) to (1) and decompose the time domain of interest into N individual time slices T 0 < T 1 < < T N 1 < T N = T, where T n = n. Now, for any n N, we define the numerical solution operator F Tn that advances the solution from T n to T n+1 as F Tn (u (T n)) u (T n+1 ) (2) This allows, starting from U 0 = u 0 to define recursively approximations U 1,, U N to u(t 1 ),, u(t N ) by setting, for n = 0,..., N 1 : U n+1 = F Tn (U n). It is pertinent, for what follows, to realize that this numerical solution corresponds to the forward substitution solution method applied to the matricial system M F Ū = Ū0 where M F, Ū and Ū0 are defined as follows 1 U 0 u 0 M F = F T0......... F T N 1 1, Ū =.. U N 0, Ū 0 =. (3). 0 If we instead solve the system using a point-iterative approach, i.e., seeking the solution on form Ūk+1 = Ū k + ( Ū 0 M F Ū k), we may observe that Ū k is known at each iteration, allowing a complete decoupled computation of F T1,, F T N on all intervals. However, for any k, we clearly have to wait k iterations in order to get U k k = U k hence we have to wait N iterations of the above iterative approach to solve the problem. Consequently, the above approach does not provide any reduction in the time solution. In order to reduce the number of iterations K needed to reach convergence, we need to find an appropriate preconditioner to accelerate the iteration, a typical approach is to utilize an approximation M G M F, where M G is cheap to apply. We can readily create such an M G by defining another operator G that proposes a coarser approximation G Tn (u (T n)) u (T n+1 ) (4) by reducing accuracy and choosing a coarse grained model or an entirely different numerical model. Solving the system using standard preconditioned Richardson iterations we may write This recovers the parareal algorithm on its standard form Ū k+1 = Ūk + (M G ) 1 ( Ū 0 M F Ū k). (5) U k+1 n+1 = GTn Uk+1 n + F Tn Uk n G Tn Uk n with U 0 n+1 = G Tn U0 n and U k 0 = u(t 0 ). (6) 2

This approach has no inherent limits to the amount of parallelism that can be extracted. Important contributions to the analysis of the method can be found in [6,1,3]. 2. Instabilities As the parareal [( algorithm is )] essentially a preconditioned point iterative method, (5) can be written as Ūk+1 = I (M G ) 1 M F Ūk + M 1 G Ū0. It is clear that if G T0 is sufficiently accurate, then one essentially recovers the serial solution process as expected. However, of G Tn is far from accurate, the iteration matrice, (M G ) 1 M F develops a degree of non-normality that increases as the quality of G Tn decreases. When solving problems that have a natural dissipation, e.g., a parabolic problem, this does not pose a problem. However, when considering problems without dissipation, e.g., a wave dominated problem, the impact of this non-normality becomes very apparent. To demonstrate this problematic behavior, Figs. 1a and 1b illustrate convergence when the parareal algorithm is used to solve the test equation d dtu (t) = λu (t) over [0, 100], using a forward Euler method, with a small time step dt = 10 3, for the fine solver F and a varying time step dt = R10 3 for the coarse solver G (hence solved R-time faster than F). One clearly observes the problematic transient behavior for problems with small dissipation and recognize the intermittent behavior as characteristic of iteration with non-normal matrices [7]. It is also clear that strong intermittent behavior can be controlled by improving the accuracy of G Tn as one would expect. We refer to [2] for some ways to cure these problems based on some conservation properties for hyperbolic equations. U k N n=0 n u (Tn) 1 10 16 10 8 a = 0.1 a = 0.2 a = 0.5 a = 1.0 a = 2.0 10 8 N n=0 U n k u (Tn) 1 10 6 a = 0.1 a = 0.2 a = 0.5 a = 1.0 a = 2.0 Figure 1. Error as a function of iterations when solving d u (t) = λu (t) with lambda λ = a i and dt = 0.001, = 1. dt Ratio between timestep dt of G and dt of F R = 500 R = 50 3. An Adjoint Approach As observed above, the parareal method exhibits problematic behavior for non-dissipative problems due to the inherent non-normal nature of the iteration matrices, leading to increasing number interactions k for the algorithm to converge. This adversely impacts the parallel efficiency which scales as 1/k. To overcome this, we consider a symmetrized version, as was done in [5] for the application to control problem, by constructing the adjoint operators F and G to (1). Consider the adjoint operator F and 3

M FM F Ū = M FŪ0 = Ū0. (7) For simplicity, we assume that the adjoint operator is the conjugate transposed of M F, but this not an essential assumption. Clearly, M F M F is symmetric positive definite, and we can use the preconditioned conjugate gradient (PC method to solve this problem. Inspired by the parareal method, it is natural to consider ( M G M 1 as a preconditioner. It is noteworthy that M G M G is an approximation to the Cholesky factorization of M F M F and it can be applied directly in the PCG method. The overall algorithm is outlined in algorithm 1 below. We present a method developed for linear systems, but the algorithm itself can be applied to nonlinear systems without modifications. Algorithm 1 Pseudo code for PCG based Adjoint Parareal Ū 0 = (M G ) 1 Ū 0 \\Creation of initial iterate ( ) r (0) = Ū0 MF MFŪ0 \\Parallel application of F Tn and F Tn to U 0 n d (0) = ( ( ) MG M 1 r(0) \\Sequential application of G Tn and G Tn on residual G (0) = d (0) \\Iterate for k = 0 to K max N 1 do ( ) F (k) = MF M Fd (k) \\Parallel application of F Tn and F Tn to d(k) α (k) = rt (k) G (k) d T (k) F (k) Ū k+1 = Ū k + α (k) d (k) r (k+1) = r (k) α (k) F (k) Un k < ɛ n then BREAK \\Terminate loop if converged end if G (k+1) = ( ( ) MG M 1 r(k+1) \\Sequential application of G Tn and G Tn on residual if U k+1 n β (k+1) = rt (k+1) G (k+1) r T (k) G (k) d (k+1) = G (k+1) + β (k+1) d (k) end for 4. Numerical Experiments We again consider the test equation d u (t) = λu (t), u (0) = 1, t [0, 100] (8) dt with a forward Euler discretization and λ = i, i.e., it is strictly non-dissipative. In Fig. 2a we show the error between the exact solution u (t) = exp (λt) and the parareal iterative solution Un k as a function of the iteration count for different ratios between the coarse and the fine time-step. As the ratio R between timestep dt in G and dt in F decreases, the coarse solver improves in accuracy and the level of non-normality decreases. In Fig. 2b we illustrate convergence of the PCG accelerated adjoint parareal algorithm for the same test case and the expected rapid convergence is clear. The convergence rate of the PCG algorithm depends 4

on the condition number and the clustering of eigenvalues of the preconditioned system. In Fig. 4a the eigenvalues of ( MG M 1 M F M F for (8) is given for different values of R. It is clear that as R increases, small eigenvalues are introduced, contributing to the reduced convergence rate. To improve the convergence rate, we define the projection P = ( ) I + εv 1 V1 T where V1 is the eigenvector of ( MG M 1 M F M F associated with the smallest eigenvalue µ 1 that we attempt to shift. V 1 is an eigenvector of P ( MG M 1 M F M F, but now associated ( a new eigenvalue ) λ 1 = (1 + ε) µ 1. In Fig. 4a the modified preconditioned system is solved with P = I + 1 µ1 µ 1 V 1 V1 T to shift the smallest eigenvalue of the preconditioned system to 1, illustrating the potential for a substantially improved convergence rate. U k N n=0 n u (Tn) 1 10 6 10 2 R = 500 R = 200 R = 100 R = 50 R = 25 0 20 40 60 80 100 U k N n=0 n u (Tn) 1 10 0 0 5 10 15 Figure 2. Error as a function of parareal iterations k for solving (8) with dt = 10 3. R is the ratio between timestep dt in G and dt in F. Original parareal. Adjoint parareal (dashed), modified adjoint, (solid). As a second test we consider u (x, t) u (x, t) + = 0, (t, x) [0, 100] [0, 2] t x u (x, 0) = exp (sin (2πx)), u (0, t) = u (2, t) t [0, 100] (9) Using a spectral discretization in space and a forward Euler in time, we show in Fig. 3a the convergence rate of the original parareal algorithm as a function of the ratio between the coarse and the fine time-step, clearly exposing the same identified transient effect of the non-normal iteration matrix. In Fig. 3b the convergence of the PCG based adjoint parareal algorithm is presented. While the convergence is monotone, the rate of convergence is prohibitively slow. Figure 4b shows the eigenvalues of the preconditioned system for (9), spanning from to 10 8 in the case of the coarse discretization. While convergence can be improved by shifting the small eigenvalues as discussed above, this quickly becomes prohibitive due to the large number that needs to be shifted. During the numerical tests we observed that by constructing a new preconditioner LL as a modified incomplete Cholesky factorization of M G M G, this improves the convergence rate dramatically. This is illustrated in Fig. 3b. The eigenvalues of the preconditioned system (LL ) 1 M F M F are also better clustered, Fig. 4b. It is surpricing that using a less accurate approximation of M G M G leads to a substantially better preconditioner, we do not have a complete understanding of this observation. 5

N n=0 U n k u (Tn) 1 10 3 R = 100 R = 80 R = 60 R = 40 R = 20 N n=0 U n k u (Tn) 1 10 3 Figure 3. Error as a function of parareal iterations k solving (9) with dx = 2 49 and dt = 10 6. R is the ratio between timestep dt in G and dt in F. Original parareal. Adjoint parareal (dashed), LL precondtioner (solid). 10 1 10 8 (M G M G ) 1 M F M F (M G M 1 M F M F P (M G M G ) 1 M F M F (LL ) 1 M F M F Eigenvalues Eigenvalues 10 4 500 200 100 50 25 100 80 60 40 20 Figure 4. Eigenvalues of the preconditioned systems as a function of R. Equation (8) and Equation (9). References [1] G. Bal, 2005, On the convergence and the stability of the parareal algorithm to solve partial differential equations, Domain decomposition methods in science and engineering, Springer Berlin Heidelberg, 425-432. [2] X. Dai, and Y. Maday, 2013, Stable parareal in time method for first-and second-order hyperbolic systems. SIAM Journal on Scientific Computing, 35(1), A52-A78. [3] M.J. Gander and S. Vandewalle, 2007, Analysis of the parareal time-parallel time-integration method, SIAM Journal on Scientific Computing 29(2), 556-578. [4] J.L. Lions, Y. Maday and G. Turinici. Rśolution d EDP par un schéma en temps pararéel, 2001, Comptes Rendus de l Académie des Sciences - Mathematics, 332(7), 661-668. [5] Y. Maday and G. Turinici, 2002, A parareal in time procedure for the control of partial differential equations. Comptes Rendus Mathematique, 335(4), 387-392. [6] G.A. Staff and E.M. Rønquist, 2005, Stability of the parareal algorithm, Domain decomposition methods in science and engineering. Springer Berlin Heidelberg, 449-456. 6

[7] L.N. Trefethen., M. Embree, 2005, Spectra and pseudospectra: the behavior of nonnormal matrices and operators, Princeton University Press. [8] P. J. Van Der Houwen, B. P. Sommeijer and P. A. Van Mourik, 1989, Note on explicit parallel multistep RungeKutta methods, Journal of computational and applied mathematics 27(3), 411-420. 7