ERIK STERNER. Discretize the equations in space utilizing a nite volume or nite dierence scheme. 3. Integrate the corresponding time-dependent problem

Convergence Acceleration for the Navier{Stokes Equations Using Optimal Semicirculant Approximations Sverker Holmgren y, Henrik Branden y and Erik Sterner y Abstract. The iterative solution of systems of equations arising from partial dierential equations (PDE) governing boundary layer ow for large Reynolds numbers is studied. We consider a convergence acceleration technique, where an optimal semicirculant approximation of the spatial dierence operator is employed as preconditioner. A relevant model problem is derived, and the spectrum of the preconditioned coecient matrix is analyzed. It is proved that, asymptotically, the time step for the forward Euler method could be chosen as a constant, which is independent of the number of gridpoints and the Reynolds number Re. The same type of result is also derived for nite size grids, where the solution fullls a given accuracy requirement. By linearizing the Navier{Stokes equations around an approximate solution, we form a system of linear PDE with variable coecients. When utilizing the semicirculant (SC) preconditioner for this problem, which has properties very similar to the full nonlinear equations, the results show that the favorable convergence properties hold also here. We compare the results for the SC method to those for a multigrid (MG) scheme. The number of iterations and the arithmetic complexities are considered, and it clear that SC method is much more ecient for problems where p the Reynolds number is large. The number of iterations for the MG method grows like Re, while the convergence rate for the SC method is independent of Re. Also, the MG scheme is very sensitive to the level of articial dissipation, while the SC method is not. Introduction The following strategy for solving the Navier{Stokes equations governing compressible, stationary ow is frequently employed in computer codes used by the CFD community:. Generate a structured grid, or a multi-block grid where the partition grids are structured. This work was supported by the Swedish National Board for Industrial and Technical Development (NUTEK). y Department of Scientic Computing, Uppsala University, Sweden

ERIK STERNER. Discretize the equations in space utilizing a nite volume or nite dierence scheme. 3. Integrate the corresponding time-dependent problem in time by utilizing an explicit Runge{Kutta time-marching method. 4. Improve the convergence properties by employing some convergence acceleration technique. In x of this paper, we review the semicirculant convergence acceleration, or preconditioning, technique, which is applicable to the type of computations described above, as well as for other problem settings. Then, in x3, we describe the boundary layer equations, which yields an approximate solution to the compressible Navier{ Stokes equations for the two-dimensional ow over a at plate. We employ the boundary layer equations for formulating a relevant scalar model problem, and for linearizing the Navier{Stokes equations, yielding a system of linear PDE which has very similar properties to the full nonlinear problem. In xx4 and 5, we apply the semicirculant preconditioning technique to the scalar model problem and analyze the convergence properties, and in x6 we present numerical experiments for computing the ow over a at plate, governed by the linearized Navier{Stokes equations. The semicirculant preconditioning technique is easily adapted to the solution of the full nonlinear problem in three spatial dimensions. In x7, the solution procedure for such problems is described, and some implementation issues are discussed. A system of linear, time-independent PDE may be written = Bu? g: (.) Here, u and g denotes the solution and forcing functions, and B is a linear dierential operator in space. By introducing a discretization of the spatial derivatives in (.), the vector u, representing the approximate solution, may be computed by solving a large system of linear equations, = Bu? g R(u): (.) Here, the matrix B is a dierence approximation of B, the vector g contains contributions from the forcing functions and from the boundary conditions, and R(u) denotes the residual. We choose the sign such that, if B corresponds to a stable discretization, its eigenvalues have non-negative real parts. The solution of (.) is often found by integrating the corresponding time-dependent problem in time until steady state is reached, @u + R(u) = : (.3) @t A standard approach for advancing the solution of (.3) from time level n to n + is to exploit an s-stage explicit Runge{Kutta method of the form

PAPER IV 3 u () = u n u () = u ()? tr(u () ). u (s?) = u ()? s? tr(u (s?) ) u (s) = u ()? s tr(u (s?) ) u n+ = u (s) : (.4) Here, t denotes the time step. The iteration must be consistent, yielding the condition s =. The other parameters : : : s? may be tuned to obtain a favorable stability region. For viscous ow problems, the boundary layers near solid walls must be resolved. This implies that, locally, the scale of the spatial discretization in the body-normal direction is very small. Due to the stability constraint for the explicit time-integration technique, the small cells in the boundary layer imply that very small time steps must be employed. The number of time steps required will increase with the number of cells in the grid, and the time-marching process becomes prohibitively slow for problems where the Reynolds number is large. To increase the convergence rate, convergence acceleration techniques such as implicit residual smoothing and multigrid are utilized, see,e.g., [7]. Employing such techniques corresponds to specifying a preconditioner M, and applying the time-integration method to @u @t + M? (Bu? g) = : (.5) Note that, if M 6= I, the time-accuracy is generally lost, and the time step t has no physical interpretation. An almost trivial example of preconditioning is known as local time-stepping. Here, M is chosen as a diagonal matrix, where the diagonal elements are determined by the local stability constraint, see, e.g., []. In this paper, we will study a convergence acceleration technique, where M is based on an optimal semicirculant approximation of B. Semicirculant approximations have earlier been successfully employed for computing time-accurate and steady-state solutions of hyperbolic PDE problems [] [] [] [5]. The eigenvalues and eigenvectors of M? B have been analyzed, and convergence estimates show that the number of iterations required is independent of the number of unknowns. In [7], a semicirculant preconditioner is utilized for solving the scalar advection-diusion equation @u + @u @ u = " @x @x @x + @ u : (.6) @x Also here, the eigenvalues and eigenvectors of M? B are analyzed, and, under some conditions, convergence estimates are derived which yield the same type of favorable convergence properties as for hyperbolic PDE problems.

4 ERIK STERNER In [3], semicirculant approximations are employed for convergence acceleration when solving time-independent hyperbolic PDE, utilizing the technique described in this paper. Also here, the results are very encouraging. When the number of gridpoints is increased, t can be kept O(). Furthermore, the results show that utilizing the simplest one-stage Runge{Kutta method, i.e., the forward Euler scheme, is favorable compared to methods with a larger number of stages. Semicirculant approximations are included in the Polynomial Normal Block (PNB) and Normal Block (NB) frameworks [3] [5], which provide general methodologies for computing matrix approximations belonging to certain classes of normal matrices. In [4], it is described how PNB approximations are utilized for preconditioning when solving systems of equations arising from general systems of PDE, discretized on a structured grid using a nite dierence or nite volume approximation of arbitrary order of accuracy. Also, high-order accurate discretizations of hyperbolic, time-dependent PDE are studied. Again, numerical results show that the arithmetic work per unknown and time step does not increase when the number of unknowns is increased. During the last decade, many authors have contributed to the study of computational techniques where, in some generalized meaning, circulant approximations are employed for preconditioning Toeplitz matrices, see the review paper [4] and the references therein. Semicirculant approximations In this section, we will consider semicirculant approximations for systems of n c PDE in two space dimensions, discretized on an m m grid. Hence, the total number of unknown grid function values is n n c m m. The extension to 3D is straightforward, see, e.g., [5]. We will use the following notation: diag j;m ( j ) = B @... m C A and block j;n;k;n ( j;k ) = Also, the m m circulant basis matrix C m is dened by C m = B @...... C A ; B @ ; ;n..... n; n;n and pband (r;?q:q) j;m ( j;r ) denotes an m m periodic band matrix with bandwidth q + q +, q X pband (r;?q:q) j;m ( j;r ) = diag j;m ( j;r )Cm: r r=?q C A :

PAPER IV 5 A nite volume or nite dierence discretization may be specied by a computational stencil. The maximal extent of the stencil in the four grid directions is determined by the integers q W, q E, q S, and q N, see Fig... For discretizations with local boundary conditions, these integers are all O(). In the formulas describing the arithmetic complexity and memory requirement, we will for simplicity assume that q W = q E = q S = q N q. For the discretization of the model problem studied in x5, q =, while for the linearized Navier{Stokes equations studied in x6, q =. q N q S q W q E Figure.: The maximal extent of the discretization stencil. The matrix B, corresponding to the dierence operator in space, has the following two-level band structure B = pband (r;?qs:qn ) k;m (B k;r ); (r;?qw :qe) B k;r = pband j;m (B j;k;r;r ); B j;k;r;r = block jc;n c;kc;n c ( j;k;r;r ;j c;k c ): By dening B as a periodic band matrix, periodic boundary conditions may be accounted for. In the PNB approximation framework, a normal m m basis matrix R m must be chosen. For all choices, it holds that R m = Q m m Q m ; where m = diag j;m ( j ) and Q m is unitary. Normally, the basis matrix R m is chosen as a narrow-banded matrix, corresponding to a stencil operation on a grid function. Furthermore, R m is also chosen such that the unitary transformation associated with the eigenvector matrix Q m is computable using an algorithm with

6 ERIK STERNER O(m log m ) arithmetic complexity, implying that the preconditioner solve may be implemented eciently. In [4], the choice of basis matrix is discussed. It is argued that, for PDE problems containing both rst- and second-order derivatives, semicirculant approximations should be employed. Here, R m = C m is chosen. The corresponding unitary transform is the discrete Fourier transform, which may be implemented using the FFT algorithm. In the sequel we will assume that semicirculant approximations are utilized. However, note that for other PDE settings, other basis matrices should preferably be employed. For example, in [6], the Helmholtz equation describing sound propagation in a duct is successfully solved utilizing a PNB preconditioner where R m represents a centered second dierence operator with Dirichlet-Neumann boundary conditions. The semicirculant approximation M is dened by replacing the band structure at the inner level by a power series in R m : M = pband (r;?qs:qn ) k;m (M k;r ); M k;r = P q E r=?q W R r m M k;r;r ; M k;r;r = block jc;n c;kc;n c ( k;r;r ;j c;k c ): One of the main results in [3] describes how the (q + ) n cm parameters are computed such that the approximation M is optimal in the sense that kb? Mk F is minimal. When a semicirculant approximation M is utilized for preconditioning, preconditioner solves, y = M? x, are required. The computations are divided into three phases:. Perform n c m independent FFTs of length m.. Solve m =+ independent block-banded systems of n c m equations (block bandwidth q + and block size n c n c ). 3. Perform n c m independent inverse FFTs of length m. The arithmetic complexity per gridpoint required for computing is at most (q + ) n c. It is possible to take full advantage of the sparsity of B, i.e., the structure of the computational stencil and sparsity in the coecients matrices in the PDE to reduce the operation count, since the corresponding parameters will be zero by construction. For factorizing the block-banded systems occuring in phase of the preconditioner solve, 4n c ((q + )n c? ) a.o. per gridpoint are required. When the semicirculant preconditioner is employed for solving a linear PDE problem, the computation of and the factorization of the blockbanded system only have to be performed once, while for nonlinear problems, these computations have to be performed at least every other iteration, see xx6 and 7. The arithmetic complexity for the preconditioner solve, including back substitution

PAPER IV 7 for the block-banded systems in phase, is (5 log m + (q + )n c? 4)n c if m is a power of two. If m is not a power of two, (dlog m e + (q + )n c + )n c a.o. per gridpoint are required. In the present implementation, the total memory requirement for the semicirculant solver is 3qn c n. At the expense of slightly larger arithmetic complexity, an implementation requiring only 3qn c m memory positions is possible. Note that the potential parallelism in the solver is large. Parallel implementations of the semicirculant solver are studied in [9]. It is interesting to note that the optimal semicirculant approximation M corresponds to the discretization of a modied PDE problem, where the boundary conditions in the x -direction are altered to periodic, and where the coecients are averaged over this direction. Hence, the semicirculant solver provides a fast, noniterative solution method for such modied PDE problems. This observation indicates that the optimal approximation should possibly be based on the modied matrix B(x?periodic), where the boundary conditions in the x -direction are altered to periodic [4]. For linear hyperbolic model problems, it has been observed that altering the boundary conditions in the x -direction may improve the convergence properties slightly [4]. Also, utilizing optimal approximations of B(x?periodic) implies that the analysis required for characterizing the convergence properties is manageable, see x5. Finally, utilizing optimal approximations of a modied coecient matrix B may facilitate the implementation of semicirculant preconditioners in existing CFD-codes, see x7. 3 The Navier{Stokes equations In two space dimensions, the Navier{Stokes equations for compressible ow may be written @u @t + @F c + @Gc = @F v + @Gv ; (3.) @x @x @x @x where u = ( ; u ; u ; E) T, is the density, u and u are the Cartesian velocity components and E is the total energy. The ux vectors are given by? F c = u ; u T + p ; u u ; u E + u p ;? G c = u ; u u ; u T + p ; u E + u p ;? F v T = ; xx ; xx ; u xx + u xx ;? G v T = ; xx ; xx ; u xx + u xx ; where the stress tensor is given by

8 ERIK STERNER xx = @u? @x 3 @u + @u ; @x @x xx = @u? @x 3 @u + @u ; @x @x xx = @u + @u : @x @x Here, is the coecient of viscosity and p is the pressure. neglected, and the system is closed by an equation of state, The heat ux is p = (? )(E? (u + v )=); where is the ratio of specic heats. A standard problem in uid mechanics is the laminar, time-independent ow over a semi-innite at plate oriented along the positive x -axis. For this setting, (3.) is well approximated by the boundary layer equations, see e.g. [7], @U @U U + U = @ U ; @x @x @U + @U = : @x @x Here, the boundary conditions are U = U = at x =, and U = U at x =, where U denotes the free stream velocity. The kinematic viscosity is dened by =, the density is given by = R, and the Reynolds number is dened by Re U L=; @x (3.) where L is the length scale. The transition to turbulent ow occurs for Re 6 [7]. Exploiting the transformation = x (U =x ) =, (3.) is transformed into the Blasius equation f + ff = : Here, the boundary conditions are f = f = at = and f = at =. This ODE can easily be solved numerically, yielding the Blasius solution, U = U f (); U = = U x (f ()? f()): (3.3) In [7], an analysis of (3.3) shows that the boundary layer thickness, i.e. the distance from the wall where U :99 U, is given by

PAPER IV 9 5:(x =U ) = : (3.4) For illustration, we in Fig. 3. plot the boundary layer and the velocity proles for = : and U = :. U profiles U profiles.8.8.6.6 x x.4.4....4.6.8 x..4.6.8 x Figure 3.: Velocity proles and growth of boundary layer thickness. We now derive a scalar model problem, for which we in x5 perform a theoretical analysis of the semicirculant preconditioning technique. Let Substituting (3.3) into (3.5) yields v() U ()=U (); "() =U (): (3.5) v() = f ()? f() p p v x U f () p ; () "() = U f () "(): (3.6) The functions f() and f () are analyzed in [7] and []. From these studies, it is clear that, if >, then < v (); " () <. By dividing the rst equation in (3.) by U, we arrive at the scalar advection-diusion equation

ERIK STERNER @u + v @u = " @ u : (3.7) @x @x From (3:6), we see that (3.7) is a relevant model problem for at plate ow if we choose v = O( p "). The PDE (3.7) is well posed if u(x ; ), u(x ; ), and u(; x ) are prescribed. In the numerical experiments presented in x5, we use u(x ; ) =, u(x ; ) =, and u(; x ) =? x. The solution forms a boundary layer of width O("=v) at the boundary x =. Finally, we also utilize (3.3) for deriving a system of linear PDE, which has properties very similar to the linear problems arising in the nal phase of a Newtontype iteration for the full nonlinear problem (3.), see x7. By linearizing (3.) around the Blasius solution (3.3), the following system of PDE is obtained B = @u @t + A @u @u @ u + A = B @x @x @ 4 3 A = A ; @ U R U R U @x B = @ 4 3 @x A ; A = @ u @ u + B + B @x 3 + F; (3.8) @x @x A ; B3 = @ 3 @ U U R R U 3 Here, u = (u ; u ; ) T. The pressure has been eliminated by the relation c = p=, where = is utilized, c.f. [9]. The function F is a sum of spatial derivatives of U and U. In the sequel we let F since it has no inuence on the error analysis nor the convergence analysis. We also employ c =, and thus = c = =. 4 Convergence analysis A stationary iterative method for (.5) is specied by the iteration matrix S, where A : A ; u i+ = Su i + T g: (4.) Here, u i is the approximation of u computed in iteration i, and T = (I? S)B?. For the Runge{Kutta method (.4), S is a polynomial of degree s, S = p s (tm? B). Let the error vector e i be dened by e i u? u i. It is easily proved that ke i k ksk i ke k: To derive convergence estimates for (4.), an eigendecomposition of M? B is often exploited. Assume that M? B is diagonalizable. Then, e i is bounded by

PAPER IV ke i k ke k cond (W M? B) i max jp s(t`)j : (4.) `n Here, W M? B is the eigenvector matrix and ` the eigenvalues of M? B. Consequently, a necessary condition for convergence is max jp s(t`)j < : (4.3) `n The common interpretation of (4.3) is that the points t` must lie strictly inside the stability region for the Runge{Kutta method. We will employ the simplest possible explicit Runge{Kutta method, the onestage forward Euler scheme. Here, u i+ =? I? tm? B u i + tm? g: The convergence criterion for this method is j? t`j < ; ` = ; : : : ; n: (4.4) A standard tool for analyzing nite dierence approximations of ow problems is Fourier analysis. Here, the original boundary conditions are altered to periodic, and variable coecients are approximated by constants. This yields a dierence approximation matrix B, ~ which is block-circulant with circulant blocks. If a preconditioner is utilized, M ~ is constructed such that it has the same circulant properties as B. ~ Since the eigenvectors W ~M? ~B are orthogonal, cond (W ~M? ~B ) = and the matrix M ~? B ~ is characterized by its eigenvalues `. ~ An analysis of p s (t `) ~ yields what is often referred to as the von Neumann stability condition. For practical computations, this criterion is frequently used as a guideline for the choice of time step. It is often found that the results from Fourier analysis are applicable to the original problem, including variable coecients and the correct boundary conditions. In this context, it is interesting to note that the matrices B ~ are exactly invertible within the semicirculant framework, c.f. x. Let M ~ denote the optimal semicirculant preconditioner for B. ~ Then, M ~ is block-circulant with circulant blocks, and M ~? B ~ = I []. This indicates that using semicirculant approximations for preconditioning may be a good idea. Since Fourier analysis predicts convergence after one iteration, the boundary conditions for the original problem must be taken into account to describe the performance of an iteration utilizing a semicirculant preconditioner. Note that the matrix M? B is not normal, and a complete analysis would involve a study of the eigenvectors W M? B. However, for nite size problems, the asymptotic convergence rate is determined by the eigenvalues only. Also note that, when M? B is non-normal, (4.) often yields a huge overestimate of the convergence factor for all e. Hence, a result proving that cond (W M? B) is large does not imply that the iteration converges slowly. In the analysis of (3.7) in x5, we will mainly study the eigenvalues `. We will also present results from numerical experiments, verifying that the convergence properties are described by the spectrum of M? B.

ERIK STERNER 5 Analysis of the model problem In this section, we analyze the spectrum of M? B for the model problem (3.7). The discretization is performed on a uniform grid with (m + ) (m + ) grid points, where h =m and h =(m + ). The derivatives are approximated using second order accurate centered dierences in the interior of the domain, and at the outow boundary one-sided dierences are utilized. Inside the domain, the order of accuracy of the discretization is two in both directions in space. For application problems, employing a uniform grid would lead to a prohibitive problem size. Instead, a nonuniform grid where h is small only inside the boundary layer should be utilized. Results from numerical experiments where (3.7) is solved show that employing a nonuniform grid leads to practically no degradation of the convergence properties described below. This is no surprise, since variable coecients and boundary conditions in the x -direction are accounted for by the semicirculant preconditioner. However, a complete theoretical analysis of the spectrum of M? B is only manageable for uniform grids, and the presentation in this section is limited to this setting. In x6, results for solving the more realistic problem (3.8) are presented, and here nonuniform grids are employed. By introducing the discretizations described above in (3.7), we arrive at a system of equations of the type (.), with m m unknowns. Since the dierence approximation is separable, the coecient matrix B may be written in Kronecker product form, B = I m B + B I m : Here, h =h, and B d corresponds to the discretization in the d-direction. Note that separability of the dierence approximations is only assumed for simplicity, semicirculant approximations are applicable to general PDE problems. The matrix B is given by B = 6 4?.........?? 3 7 5 : Also, B is a tridiagonal Toeplitz matrix B = 6 4 4" h v? h "?v? h " 4" h v? h ".........?v? h " 4" h v? h "?v? h " 4" h 3 7 5 : The eigenvalues of B are given by []

PAPER IV 3 s k; = 4" " + 4 h h? v cos 4 k m + ; k = ; : : : ; m : (5.) We prove the following lemma: Lemma 5.. The matrix B is diagonalizable and nonsingular. Proof. Since cos() is one-to-one on (?; ) for (; ), it follows that all eigenvalues k; are distinct. Hence, B is diagonalizable. To prove that B is nonsingular, we show that all eigenvalues are nonzero. If "=h < v=, (5.) yields s v k; = 4 " h + 4_{ 4? " h cos k m + : Since " >, it is clear that <e( k; ) >, and hence j k; j >. If "=h v=, (5.) yields k; = 4 " h + 4 s " h s s? v 4 cos k m + 4 " "? 4? v h h 4 > 4 " "? 4 = h h The semicirculant approximation corresponding to (3.7) is formed by altering the boundary conditions in the x -direction to periodic. This yields Here, ~ B is given by ~B = C m? C? m = Since ~ B is circulant, we have M = I m ~ B + B I m : 6 4??.........?? F m ~ B F m = = diag j;m ( j; ); j; =! j??!?(j?) = _{ sin 3 7 5 : (j? ) m : Here, F m is the m :th order Fourier matrix and! = exp(i=m ). We now prove that M is nonsingular and diagonalizable. Theorem 5.. The matrix M is nonsingular and diagonalizable. Proof. From Lemma 5. we conclude that there exists a nonsingular matrix V diagonalizing B, that is V? B V = = diag k;m ( k; ):

4 ERIK STERNER This implies (V? F m )M(V F m ) = (V? F m )(I m ~ B + B I m )(V F m ) = I m F m ~ B F m + V? B V I m = I m + I m : Thus, M is diagonalizable, and the eigenvalues of M are j; + k;. Since <e( j; + k; ) = <e( k; ), it follows from the proof of Lemma 5. that all eigenvalues are nonzero. Hence, M is nonsingular. It is possible to derive analytic formulas for the eigenvalues of M? B. Dene the error matrix E B? M. Since M? B = M? (M + E) = I + M? E, examining the eigendecomposition of M? E will suce. Theorem 5.3. The matrix M? E has the eigenvalue zero with multiplicity at least m (m? ), and m eigenvalues given by p (;);k =? z k + R (z k )?3zk? 4z k? R (z k ) (z k + z k? ) ; k = ; : : : ; m ; where zk m R (z k ) = (? z k ) + ( + z?? zk m k ) (?z k ) m? (?z k ) ; m R (z k ) = (6zk + 8z k + 4 + 4z k? ) zk m +? z m and nally + (4z k + 4z k?? 4z? k ) (?z k ) m + (z k? )? zk m? z m k k? (?z k ) m + + (?z? k + (8zk + 4z k +? 4z k? + 8z k? ) zk m? zk q m z k =? k + + k ; k = k;:? )? (?z k ) m? (?z k ) m + (?z k ) m? (?z k ) m ; Proof. The proof is essentially identical to that of Theorem 7. in [4]. Here, a sparse, block-diagonal matrix T which is similar to M? B is constructed. From Lemma 5., it is clear that <e( k ) >. Employing this result, the matrix entries in the blocks T k are computable by utilizing the Poisson summation formula and the residue theorem. Solving the characteristic equations of T k yields the eigenvalues of M? B. We now briey discuss the eigenvectors W M? B. In [] and [4], it is proved that, under some conditions which are fullled for large values of m and m,

PAPER IV 5 cond (W M? B) cond (V ) max? k; : km Here, V is the eigenvector matrix of B, and k; > is the smallest root of a fourth degree polynomial, given in [4]. It is probably not possible to bound cond (W M? B) by a constant, independent of m. The matrix B is nonsymmetric and Toeplitz, and for such matrices the condition number of the eigenvector matrix V often grows very rapidly with the dimension. However, note that this does not imply that, e.g., the forward Euler scheme converges slowly, see x4. Let the condition number reduction be dened by cond (W M? B)=cond (W B ); where W B is the eigenvector matrix of B. It is easily derived that max km? k; : cond (V ) It may be possible to prove that, asymptotically,!, implying that the conditioning is \innitely improved" by preconditioning. For the advection-diusion equation (.6), a result of this type is, under some conditions, derived in [7]. So far, we have not pursued the analysis of for the problem studied here. We now utilize Theorem 5.3 for analyzing the spectrum of M? B. First, assume that we want to compute more and more accurate solutions to (3.7). Since the discretization is second order accurate in both spatial directions, this corresponds to studying a sequence of matrices where m,m! and is constant. The following theorem describes the asymptotic behavior of the eigenvalues of M? E. Theorem 5.4. Assume that < v <, < " <, and < <. Also assume that m is even. Then, asymptotically as m ; m!, (;);k satises where <e( (;);k ) coth(); p 7 j=m( (;);k )j < 4 coth(); v 8" + " : Proof. Let m m +. First study the case k = m, <. Utilizing that cos (m =(m + )) > cos() =? + for some >, a Taylor expansion of (5.) yields? k; > 4" m + O m? :

6 ERIK STERNER Next, study the case k = m?l +; : : : ; m, where l is a positive integer. A Taylor expansion of (5.) now yields? k; = 4 l m? + O m?3 ; where l v 8" + l " > : Hence, the eigenvalues k; lie on the positive real axis, and lim m! (min k k; ) =, lim m! (max k k; ) = +. It follows that z k is real, and lim m! (min k z k ) = ; lim m! (max k z k ) = : (5.) However, for nite values of m, < z k <. Now study R ; (z k ), dened in Theorem 5.3. For m even, these terms may be rewritten [4], R (z k ) =?z k + 4 + z k? ; z k?m? R (z k ) = z k + z k +? 4z k? + 4z k? + z k?m? + 7z k + 8z k +? 8z k? + 7z k? : (z k?m? ) For k = m? l + ; : : : ; m, a Taylor expansion yields (5.3)? z k =? l m? + O m? : Hence, lim m! z?m k = exp( l ) > ; which yields? lim z?m? m! k? = (exp(l )? )? = exp(? l) sinh? ( l ): (5.4)? We have proved that, in the limit m!, z k?m?? decays exponentionally with increasing l. By employing (5.) and (5.4) in (5.3), some algebra yields lim m! R (z k ) = exp(? l ) sinh? ( l ) exp(?) sinh? (); lim m! R (z k ) < 6 sinh? ( l ) 6 sinh? (): (5.5) Inserting (5.5) in Theorem 5.3, and utilizing (5.), yields

PAPER IV 7 <e( (;);k ) j=m( (;);k )j 4 < 4? + exp(?) sinh? () = coth(); q 7 + 6 sinh? () q 7 + 7 sinh? () = p 7 4 coth(): The condition that m is even facilitates the proof. Numerical experiments indicate that the result holds also for m odd. However, for this case it is signicantly more complicated to derive bounds for R ;. By employing Theorem 5.4, it is possible to prove the following asymptotic convergence criterion for the forward Euler method. Corollary 5.5. Under the assumptions in Theorem 5.4, the convergence criterion (4.4) is fullled if t < R? 4 p tanh(): Furthermore, if v = O( p "), then R? > R? >, where R is independent of ". Finally, if v = O() and " is small, then R? 4= p :6. Proof. In terms of (;);k, (4.4) could be reformulated as (;);k < (t)? : Theorem 5.4 implies that, asymptotically, (;);k are located in a box, centered at the origin. The size of the box is given by, and inscribing it in a circle with radius R yields that, if t < R?, the convergence criterion is fullled. If v = O( p "), the formula for in Theorem 5.4 yields that > v =(8") >, where is independent of ". This implies that R? > 4= p tanh( ) R? >. Also, if v = O(), the same formula for yields that, for small values of ",. Hence, R? 4= p. Theorem 5.4 and Corollary 5.5 are asymptotic results, derived in the limit of innite accuracy, i.e., when m ; m!. From Corollary 5.5, we see that, for the important case v = O( p "), the time step for the forward Euler method could be chosen as t t, where t > is independent of ", m and m. For applications, an approximate solution which fullls a given accuracy criterion is sucient. It is possible do derive theoretical results also for this setting. Recall that the thickness of the boundary layer is O("=v). A study of the truncation error of the dierence approximation yields that requiring a given accuracy corresponds to specifying that at least n gridpoints in the x -direction should be inside the boundary layer. Here, n > does not depend on " or v. This requirement corresponds to a criterion of the form h h ; = O("=v). If (3.7) is

8 ERIK STERNER employed as a model for laminar boundary layer ow, h = O( p "). For application problems, sucient accuracy is normally obtained for n 5 [9], and (3.4) is employed for determining the space step. The discretization in the x -directions is independent of v and ", and a given accuracy requirement is fullled if h h ; where h ; is constant. It is interesting to note that, for other problem settings than the laminar boundary layer ow described in x3, requiring a given accuracy may impose other relations between h and ". In many theoretical studies of scalar advection-diusion equations, v = O(), and consequently h = O(") should be employed in a boundary layer. Also, if a turbulent ow is modeled by employing the k-" turbulence model, h = O(" :9 ) should hold in the viscous sublayer, i.e., very close to solid wall boundaries []. We prove the following lemma. Lemma 5.6. Assume that v = v "? and h = h ". Then, k in Theorem 5.3 is given by k = 4? h + r? v h 4 cos! k m? m + "? 4 k m? "? : Proof. By employing (5.) in Theorem 5.3, the result follows. We rst study the case corresponding to laminar boundary layer ow, i.e., = =. Analogously to [5], we dene the cell Reynolds number Re h by Re h h v =. Recall that the boundary layer thickness is O( p ") = O(h =Re h ). Assuming that Re h is constant is equivalent to assuming that n is constant, which yields a solution with a xed accuracy. Numerical experiments for the case v = p " shows that Re h = corresponds to n 5, which normally does not yield a suciently accurate solution. Hence, a criterion of the form Re h could be considered as a minimal requirement for practical computations. The following theorem characterizes the spectrum of M? E for suciently large values of m. Theorem 5.7. Let = = in Lemma 5.6, and assume that < v <, < h <. Then, (;);k is independent of ". Furthermore, if m is suciently large and < Re h, then (;);k satises where <e( (;);k )? coth() + O m? ; p 7 j=m( (;);k )j < 4 coth() + O m? ; =? h? r? v h 4! :

PAPER IV 9 Proof. By substituting = = in Lemma 5.6, it follows that k is independent of ", which implies that (;);k does not depend on this parameter. Now choose m large, such that j k j. Then, a Taylor expansion yields z k =? k m? + k m? + O(m?4 ): (5.6) The assumption < Re h implies that k > is real. Another Taylor expansion yields ln(z?m k ) = k? 4 3 3 k m? + O(m?3 ): Hence, there exists a constant > such that z?m k > (? m? ) exp( k): Since k >, we for suciently large values of m have that z k?m algebra yields? z?m k?? < exp(? k) sinh? ( k ) + O(m? ): By utilizing (5.3) and (5.6), this yields >. Some R (z k ) < exp(? k ) sinh? ( k ) + O(m? ) exp(?) sinh? () + O(m? ); R (z k ) < 6 sinh? ( k ) + O(m? ) 6 sinh? () + O(m? ): Finally, by employing these bounds and (5.6) in Theorem 5.3, the proof is completed. Analogously to the asymptotic setting, we derive a convergence criterion for the forward Euler method. Corollary 5.8. Under the assumptions in Theorem 5.7, the convergence criterion (4.4) is fullled if t R? p 4 tanh() + O : m? Proof. The proof is analogous to that for Corollary 5.5.

ERIK STERNER Now study the formula for in Theorem 5.7. For Re h, corresponding to high accuracy, a Taylor expansion yields v v =8 = 8" : Hence, for small values of ",, and we have recovered the asymptotic results in Theorem 5.4 and Corollary 5.5. Also, a simple analysis shows that, if < Re h, then v =8 < v =4. This implies that, for small values of ", we have that R? < R? < R?. Hence, it is possible to employ slightly larger values of t for nite size problems than predicted by the asymptotic study. For example, assume that v = and that " is small. Then, Theorem 5.4 and Corollary 5.5 yields =8, and R? 4 p tanh(=8) :5. In Fig. 5., R? is shown as a function of h. Here, h yields high accuracy, while h = corresponds to Re h =, i.e., the coarsest discretization covered by Theorem 5.7..3.5 R _..5..5..4.6.8..4.6.8 κ h Figure 5.: R? as a function of h; v = " =. From the proof of Theorem 5.4, it is clear that, in the limit m ; m!, the spectrum of M? E approaches two nite curve segments in the complex plane, which are uniformly separated from zero. In the upper left part of Fig. 5., these curves are shown for the case " =?5 ; v = p " :3. In all subgures, the dashed circle has radius R = 6:665, as given by Corollary 5.5. In the upper right part of Fig. 5., the spectrum for m = 5, m = 5 is shown. For this discretization, the boundary layer is resolved, and h :676, yielding Re h :38. The dotted circle has radius R = 6:57, as given by Corollary 5.8. The spectrum for Re h is shown in the lower left part of Fig. 5.. Here, m = 5, m = 58, and the dotted circle has radius R = 3:786. Note that, if Reh = holds, the spectrum consists of two discrete points. Finally, in the lower right part of Fig. 5., the spectrum for Re h is shown. Here, m = 5, m = 78, and the boundary layer is not resolved. This case is not covered by the theoretical results. The spectrum shows that a smaller t would be required for convergence here than for the other cases.

PAPER IV Asymptotic spectrum Re =.38 h Re =.9944 h Re =.4 h Figure 5.: Spectra of M? E. We now present results from numerical experiments, which conrm that the convergence rate for the forward Euler method is described by the theoretical results. In Tables 5. and 5., numerical results from two grid renement studies are presented. In Table 5., m = m is used, while in Table 5., m = 5 is employed. The optimal choice of t (determined up to the second decimal) and the corresponding number of forward Euler iterations required for reducing the initial residual by a factor of?6 are shown. Also, the cell Reynolds number Re h and the bound R? are included. Here, " =?5 and v = p " :3, yielding the asymptotic bound R? :5. Table 5.: Grid renement study, " =?5 ; v = " = :3. m = m 5 4 Re h 3..56.79.394.58 R? - -.849.56.59 t opt.6..7.3.3 # iterations 95 96 95 9

ERIK STERNER A comparison of the results in Table 5. and 5. shows that there is practically no dierence in convergence properties if m = m compared to if m = 5. For the coarse grids, the boundary layer is not resolved, and the theoretical results are not valid. However, the boundary layer must be resolved to get an accurate solution. For such discretizations, the results in Tables 5. and 5. conrm that the convergence properties are determined by the theoretical results describing the spectrum of M? B. Table 5.: Grid renement study, m = 5; " =?5 ; v = " = :3. m 5 4 Re h 3..56.79.394.58 R? - -.849.56.59 t opt.6..7.3.3 # iterations 84 95 95 9 In Table 5.3 numerical results for dierent values of " are presented. Here, v = p ", m = 5, and m is increased when " is decreased, such that Re h = :5 for all experiments. Hence, R? :66. Table 5.3: A study for dierent "; v = " =. "??3?4?5?6 t opt.6.5.3.4.3 # iterations 45 8 3 34 From the results in Table 5.3, it clear that t could be chosen independently of ". The slow growth in the number of iterations required as " is decreased indicates that the non-normality of W M? B may not be completely neglected. Finally, we briey study the case v = O(), i.e., we let = in Lemma 5.6. In Table 5.4, numerical results for dierent values of " are presented. Here, v =, and m = m = 4. The boundary layer thickness is now O("), and the discretization does not resolve the boundary layer for " <?4. Table 5.4: A study for dierent "; v =. "??3?4?5?6 t opt.65.7.84.3. # iterations 6 7 8 5 43

PAPER IV 3 For this parameter setting, Corollary 5.5 predicts that, for small values of ", it would be possible to employ values of t close to.. The results presented in Table 5.4 shows that, if the boundary layer is resolved, this asymptotic results describes the convergence properties also for nite size problems. If the discretization is too coarse, the optimal choice of t is much smaller than :. 6 The linearized Navier{Stokes equations In this section, we study the linearized Navier{Stokes equations (3.8). The PDE is solved on the domain [:; :] [; ], which is discretized utilizing a grid with m m cells, identied by the integers (j; k), j = ; : : : m, k = ; : : : m, see Fig. 6.. Cell (j; k) has length h ;j;k and height h ;j;k. The locations of the vertical gridlines are given by x ( ) = exp( k=m )? ; (6.) exp? where is constant. This yields a constant stretching factor in the x -direction, i.e., h ;j;k+ =h ;j;k = +, where = exp ( =m )?. Correspondingly, the stretching in the x -direction is determined by the parameter. Note that, in the limit d!, d =, and the grid is uniform in the x d -direction. The approximate solution and residual at gridpoint (j; k), located at the center of cell (j; k), are denoted by u j;k = (u ;j;k ; u ;j;k ; j;k ) T and R j;k, respectively. The spatial derivatives in (3.8) are approximated by centered dierences, which are second order accurate on a grid where the stretching parameters and are constant. For application problems, employing a centered dierence approximation normally implies that articial dissipation is required to damp out high frequency oscillations and to avoid nonlinear instabilities. For the experiments presented here, the articial dissipation operators are given by D art;d = " (4) d;j;k h 3 d;j;k(d +;d D?;d ) : (6.) Here, " (4) is constant, D +; u j;k = (u j+;k? u j;k ) =h ;j;k, D?; u j;k = D +; u j?;k, and d;j;k is computed according to an anisotropic dissipation model, d;j;k = ju d;j;k j + c: (6.3) Near the boundaries (6.) must be modied, see [3]. The boundary conditions are introduced by using one layer of ghost cells around the computational domain, see Fig. 6.. At the solid wall, the nonslip condition is imposed by reecting the velocity components and the density is computed by zeroth order extrapolation from the interior, while at the open boundaries, characteristic variables for the subsonic case are used [8].

4 ERIK STERNER o o o o o o o o o o o o o o o o o o o o o o x o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o.. x Figure 6.: Computational grid where m = 6; m = 8; = :3 Utilizing the discretization of the PDE, the articial dissipation, and the boundary conditions described above, we arrive at a linear system of equations of the form (.). This system is integrated in time by employing an explicit Runge{ Kutta iteration of the form (.4), as described in x. In the sequel, we use the acronym the SC method for the Euler forward method combined with the semicirculant preconditioning technique presented in x. Here, the parameter t is chosen such that the convergence rate is optimal. We compare the convergence properties to those of an ecient standard method, where multigrid is employed for convergence acceleration. We will call this method the MG method. Here, a linear multigrid scheme given by Eq. (4..) in [6] is employed. We set = yielding a W-cycle. The prolongation operator and the restriction operator are based on bilinear interpolation, and the smoother is a three-stage explicit Runge{Kutta scheme with weights = = :6. We use one pre- and one post-smoothing iteration, and on the coarsest grid level we do not employ an exact solver, but instead a number of smoothing iterations. In the MG method, local time-stepping is also employed. The local time steps are given by where t hyperbolic j;k = CFL t j;k = min @ ju ;j;kj h ;j;k t hyperbolic j;k + ju ;j;kj h ;j;k ; t parabolic + c j;k h ;j;k ; (6.4) + h ;j;k! = A? ;

PAPER IV 5 t parabolic j;k = RK? 3 4 4 h ;j;k! + 4? : h ;j;k Here, CFL=.5 and RK=.8 is used. Finally, for comparison, we also present results for the three-stage Runge{Kutta scheme described above employed on a single grid. Also here, local time-stepping is utilized, corresponding to a diagonal preconditioner. This method will be denoted the ERK method. The iterative methods are terminated when the norm kr n k = P j;k h ;j;kh ;j;k R T j;k R j;k (6.5) has been decreased by a factor 6. As initial values we use u = U = :; v = and = =. In Experiment {4, the articial dissipation parameter is given by " (4) = :, which yields good convergence properties for the MG method. Note that, as shown in Experiment 5, the SC method converges faster for smaller values of " (4), for which the MG method does not converge at all. In Fig. 6., the solution components u and u are shown for a problem where Re =, solved on a grid of the type employed for Experiments 3{4..5.3.. u.5 u..5.6..5.6. x x. x x. Figure 6.: The velocities u (x ; x ) and u (x ; x ). For a single residual evaluation, 87 a.o. per gridpoint are required. Hence, one iteration for the ERK method requires 79 a.o. per gridpoint. The arithmetic complexity for the MG method depends on the number of grid levels, and is

6 ERIK STERNER computed by summing up contributions from the smoother, the restriction operator, the prolongation operator, and the residual evaluations. For example, for three grid levels, 46 a.o. per gridpoint in the ne grid are required to complete one iteration. The computational work for the SC method is discussed in x. Since we aim at solving nonlinear problems, the parameters describing the preconditioner and factor the block-banded systems are computed in every iteration. For these computations, 885 a.o. are required. If m is a power of two, the preconditioner solve requires 5 log m + 3 a.o., otherwise the complexity is slightly larger, see x. Hence, one iteration for the SC method requires 5 log m + 9 a.o. However, note that, since the problem is in fact linear, 5 log m + 45 a.o. would be sucient. Experiment. We rst perform a grid renement study for the case Re =, utilizing a grid where m = m. The grid stretching is given by = and = :563, i.e., the grid is stretched only in the x -direction. The results presented in Table 6. show that, for the ERK method the number of iterations rst increases linearly and then quadratically as the grid is rened. Studying the time step constraint (6.4), this can be explained by the hyperbolic and parabolic stability limits, respectively. For the MG method, the number of grid levels is increased as the grid is rened, i.e., for m ; = 6, two levels are employed, while for m ; = 5, seven levels are utilized. Employing grids where m ; are powers of two implies that it is easy to introduce new grid levels as the problem size increases. The stretching factor decreases as m increases, and the coarsest grid is identical for all problem sizes. The coarse grids contribute to the convergence, leading to that the convergence rate is independent of the number of gridpoints. This is an example of what is normally referred to as optimal or grid-independent convergence for a multigrid scheme. In this sense, this type of experiment is \ideally suited" for the MG method. Note that, at least for m 56, the same type of favorable convergence properties are observed also for the SC method. Table 6.: Grid renement study, stretched grid in x -direction only. m = m 6 3 64 8 56 5 # ERK iterations 469 889 33 765 838 3884 # MG iterations 7 9 83 6 66 7 # SC iterations 8 84 86 88 3 47 In Fig. 6.3, we show a log-log plot of the number of a.o. per grid point required for convergence as a function of the number of gridpoints. The results clearly show that some form of acceleration technique is necessary in order to speed up the convergence rate of a basic iterative method. Also, the results show that the MG method is indeed optimal in the sense that a xed number of a.o. per gridpoints is sucient for computing the solution of the discretized problem. For the SC method, the arithmetic complexity increases slightly for very large problems.

PAPER IV 7 However, note that, for m ; 56, the SC method is faster than the MG method. 8 log (a.o. per gridpoint) 7 6 5 4.5.5 3 log (m ) Figure 6.3: Number of a.o. per gridpoint for the ERK (*), the MG (o) and the SC method (x). Experiment. For application problems, grid renement studies are often impossible to perform because of the extreme demand on computer resources. A moderately accurate solution is sucient, and the important issue is to reduce the arithmetic and memory requirements for computing such solutions. As mentioned in x5, a standard accuracy criterion is given by requiring that n is constant. In the experiment below, the Reynolds number, Re, is varied and m is increased with Re, so that n 5 at x = :6. The other parameters m = 48; = and = :5 are xed. The grid is rened to give sucient accuracy and not to enable the introduction of more grid levels in the MG method. From Table 6., we see that the maximum number of grid levels that may be employed is four. However, we employ only three levels, since this yields faster convergence. The reason for that coarse grids do not contribute to the convergence is probably that they are extremely stretched. In Table 6., we see that, for both the ERK and MG method, the number of iterations grows approximately as p Re, since the time step constraint (6.4) becomes more restrictive as the grid points cluster near the wall. This result is in agreement with the analysis in [8]. Hence, the grid independent convergence for the MG method is lost. However, for the SC method, the results in Table 6. show that the convergence rate is independent of Re, and again t :5 is used in all experiments. For large Reynolds numbers, a signicantly smaller number of iterations are required for the SC method than for the MG method. In Fig. 6.4, the arithmetic complexity per gridpoint is shown. By employing multigrid convergence acceleration, we speed up the ERK method by a factor of

8 ERIK STERNER Table 6.: Convergence study for dierent Re, stretched grid in x -direction only. Re 3 4 5 6 # ERK iterations 343 44 353 9746 # MG iterations 4 448 669 56 # SC iterations 7 6 5, whereas the number of a.o. is decreased by more than two orders of magnitude by the SC method. 8 log (a.o. per gridpoint) 7 6 5 Figure 6.4: method (x). 4 3 4 5 6 log (Re) Number of a.o. per gridpoint for the ERK (*), the MG (o) and the SC Experiment 3. We now repeat experiment, but let = = :563. Hence, we stretch the grid also in the x -direction, yielding a concentration of grid points not only near the plate, but also near the leading edge, where the gradients in the solution are larger than downstream, c.f. Fig. 6.. In the SC method, averages of the coecients in the dierence approximation are computed over the x -direction, and one might suspect that introducing a stretched grid, i.e., introducing more variability in the coecients, would aect the convergence properties. In Table 6.3, we see that the results for the ERK and the MG method are very similar to those in experiment. For the SC method, the grid independent convergence rate is indeed lost. However, for moderate values of m ;, corresponding to a moderate accuracy requirement, the convergence rate is still reasonable. Experiment 4. We now repeat experiment, but as in experiment 3, we employ a grid which is stretched also in the x -direction. In Table 6.4, we see that the