Preconditioning for modal discontinuous Galerkin methods for unsteady 3D Navier-Stokes equations

Size: px
Start display at page:

Download "Preconditioning for modal discontinuous Galerkin methods for unsteady 3D Navier-Stokes equations"

Transcription

1 Preconditioning for modal discontinuous Galerkin methods for unsteady 3D Navier-Stokes equations Philipp Birken a Gregor Gassner b Mark Haas b Claus-Dieter Munz b a University of Kassel, Department of Mathematics, Heinrich-Plett-Str. 40, Kassel, Germany b University of Stuttgart, Institute of Aerodynamics and Gas Dynamics, Pfaffenwaldring 21, Stuttgart, Germany Abstract We compare different block preconditioners in the context of parallel time adaptive higher order implicit time integration using Jacobian-free Newton-Krylov (JFNK) solvers for discontinuous Galerkin (DG) discretizations of the three dimensional time dependent Navier-Stokes equations. A special emphasis of this work is the performance for a relative high number of processors, i.e. with a low number of elements on the processor. For high order DG discretizations, a particular problem that needs to be addressed is the size of the blocks in the Jacobian. Thus, we propose a new class of preconditioners that exploits the hierarchy of modal basis functions and introduces a flexible order of the off-diagonal Jacobian blocks. While the standard preconditioners block Jacobi (no off-blocks) and full symmetric Gauss-Seidel (full off-blocks) are included as special cases, the reduction of the off-block order results in the new scheme ROBO-SGS. This allows us to investigate the impact of the preconditioner s sparsity pattern with respect to the computational performance. Since the number of iterations is not well suited to judge the efficiency of a preconditioner, we additionally consider CPU time for the comparisons. We found that both block Jacobi and ROBO-SGS have good overall performance and good strong parallel scaling behavior. Key words: Discontinuous Galerkin, Unsteady flows, Navier-Stokes, Implicit methods, Preconditioning, Three dimensional problems addresses: birken@mathematik.uni-kassel.de (Philipp Birken), gassner@iag.uni-stuttgart.de (Gregor Gassner), haas@iag.uni-stuttgart.de (Mark Haas), munz@iag.uni-stuttgart.de Preprint submitted to Elsevier 10 December 2012

2 1 Introduction The solution of unsteady compressible viscous flows may lead to stiff problems, in particular for wall bounded flows and flows at low Mach numbers. This means that the time step size in explicit methods is driven by stability and in implicit methods by accuracy alone. Therefore, implicit methods, which can be constructed to have unbounded stability regions, are attractive for a number of problems and are a standard part of solution procedures for finite volume methods. However, finding an efficient solver for the resulting linear and nonlinear equation systems has turned out to be a difficult problem in the DG case [42], in particular for three-dimensional problems. The structure of the nonlinear systems to be solved is of the form u ψ + α tf(u) = 0, where ψ is a known vector, α is a method dependent scalar paramater, u the vector of unknowns and f a function representing the overall spatial discretization. Hence, it does not depend on the specific time integration method. A corresponding statement holds for the linear equation systems which results from an iterative solution process of the nonlinear system. Thus, we will use in this work the 4th order accurate diagonally implicit Runge-Kutta method ESDIRK4 [22]. We choose a 4th order accurate method in time, as we are interested in the simulation of unsteady problems. Regarding the solvers for the algebraic systems, there is a number of requirements that have to be satisfied. Firstly, three-dimensional computations have strong memory demands that will actually increase in the future with newer computer generations having less and less memory per core available. This problem is particularly pronounced for DG methods. These use a large number of degrees of freedom per cell, leading to large Jacobian blocks with more intercell connectivity. Already on today s supercomputers one may find oneself running out of memory faster than one would expect from experience with finite volume methods. Secondly, the solver has to scale in parallel to be feasible for supercomputing. Thirdly, it has to be reasonably fast, which is still a challenge in the DG context. Finally, the implementation cost should be as low as possible, where here, we are concerned with the additional coding needed to make an explicit DG method implicit. If all of these requirements are met (low storage requirements, parallel scaling, fast convergence and ease of implementation), the use of high order methods in the industrial context would become more feasible. At this point it has to be noted that there is not yet a standard DG method and that in our experience, the question of an efficient solver depends on (Claus-Dieter Munz). 2

3 the specific discretization in a nontrivial way. However, it seems that an efficient DG method makes use of what is called a nodal basis in some way [17]. Here, we will consider the mixed modal-nodal variant suggested by Gassner et. al. [15], which is based on a modal basis but uses a nodal basis for integration. For the diffusive terms, the dgrp flux is used [13]. Furthermore, we did some preliminary tests of the methodology on a DG Spectral Element method (DGSEM), e.g. [25], which showed that the preconditioning procedure used here indeed has to be modified in that context. Basic candidates for solvers are FAS-Multigrid and preconditioned Jacobian- Free Newton-Krylov methods (JFNK), whereby multigrid can be used as a preconditioner. The FAS multigrid is the method of choice for steady Euler flows when using finite volume schemes. In the context of unsteady flows, it is often a dual time stepping procedure which seems to be a slow method, as reported by several authors, e.g. [20,6]. When looking at DG methods, the design of a fast multigrid solver is an open problem both for steady and unsteady flows [31,23,3,2,34,28]. We will not attack this problem and consider JFNK schemes in this work. There, the linear systems are solved using Krylov subspace methods, which do not need the system matrix explicitly, but only matrix vector products. Since the system matrix is a Jacobian, it can be approximated using finite differences, circumventing in theory the construction and storage of the Jacobian. In practice, a preconditioner is needed, making the schemes not completely matrix-free. Regarding the specific Krylov subspace method, it turns out that GMRES is the best choice in this context [24]. The Newton method is an inexact Newton method, where a good strategy to control the termination criteria for the linear solver is necessary. Namely, the strategy by Eisenstat and Walker is used [11]. As mentioned, a preconditioner is necessary to speed up GMRES. Here, multigrid can be used as a preconditioner, which was considered by several authors for the steady Euler equations in two dimensions [31,9,28]. Generally, multigrid preconditioners would satisfy the requirements mentioned, but they face the same problem as the FAS multigrid: the lack of theory for DG discretizations leads to nonoptimal methods. Therefore, we will not consider these schemes here. Regarding other preconditioners, a number of authors has considered Newton-Krylov methods for two-dimensional flows, in particular Rasetarinera and Hussaini (steady NS) [36], Dolejsi and Feistauer (steady Euler) [10], Darmofal et al. (steady NS) [12,9], Kanevsky et. al. (unsteady Euler, NS) [21], as well as Persson and Peraire (unsteady NS) [33]. Whereas all the work mentioned above considers two-dimensional flow problems, we will focus on the 3D unsteady case. As opposed to the case of finite volume methods, where going from 2D to 3D increases the number of un- 3

4 knowns per cell from four to five and thus by just one, we have to multiply the number of unknowns per cell with a factor dependend on the polynomial degree, resulting in hundreds of unknowns per cell with an according block structure. Thus, the two-dimensional case is in our opinion not representative of the three-dimensional one and furthermore, the discontinuous Galerkin case is very different from the finite volume case. Therefore, a successful implicit DG scheme must take these huge blocks in the Jacobian into account. In this work, we examine the performance of different preconditioners: Block Jacobi, block symmetric Gauß-Seidel (SGS), block-ilu and a multilevel block- ILU suggested by Persson and Peraire [33]. Furthermore, we propose a new class of SGS-type preconditioners, which we call ROBO-SGS (Reduced Offdiagonal Block Order). This new class exploits the hierarchical basis of the mixed modal-nodal DG method to reduce the order of the off block Jacobian blocks and includes the block Jacobi and the full SGS preconditioner as special cases. An approach that is similar in spirit has been suggested by Renac et al. [37]. The variable sparsity pattern of the ROBO-SGS preconditioner class gives us the possibility to investigate the impact of the preconditioners matrix structure on the overall performance. The point about the comparison is that first of all, it is done on threedimensional test cases, second, it is done in a realistic setting of a time adaptive scheme with a smart choice of tolerances in Newton and a parallel solver and third, we compare not only iteration numbers, but also CPU time. This is important, because iterations show accuracy of a preconditioner but not its efficiency, since the cost of application of the preconditioner is neglected. As we are interested in high fidelity simulations (direct numerical simulation or large eddy simulation) of compressible turbulent flow problems, we are solely interested in unsteady computations on large parallel architectures. Thus, an important aspect of the investigations is the parallel scaling of the methods and the impact of the preconditioner on the parallel performance. The outline of the paper is as follows: First we will describe the governing equations and the DG methodology used. Then we will briefly discuss the ESDIRK4 method, after which we will describe the JFNK method and the different preconditioners. Finally, numerical results are presented where we compare the different preconditioners. 2 Governing equations The Navier-Stokes equations are a second order system of conservation laws (mass, momentum, energy) modeling viscous compressible flow. Written in conservative variables density ρ, momentum m and energy per unit volume 4

5 ρe: t ρ + m = 0, d t m i + xj (m i v j + pδ ij ) = 1 j=1 Re d xj S ij + q i, j=1 d t (ρe) + (Hm) = 1 ( d xj Re j=1 i=1 i = 1... d S ij v i 1 ) P r W j + q e. Here, d stands for the number of dimensions, H for the enthalpy per unit mass, S represents the viscous shear stress tensor and W the heat flux. As the equations are dimensionless, the Reynolds number Re and the Prandtl number P r appear. The equations are closed by the equation of state for the pressure p = (γ 1)ρe, where we assume a perfect gas. Finally, q e denotes a possible source term in the energy equation, whereas q = (q 1,..., q d ) T is a source term in the momentum equation, for example due to external forces. 3 Spatial Discretization We employ the mixed modal-nodal Discontinuous Galerkin scheme which has been suggested by Gassner et al. [15]. One of the main advantages of this method is that it allows the use of elements of arbitrary shape (i.e. tetrahedrons, prisms, pyramids, hexahedrons,...) with high order of accuracy. In our experience, discretizations using hexahedrons very often require less elements and thus less total degrees of freedom than ones that only use tetrahedrons for approximately the same discretizaton error. This property becomes extremely important in the context of implicit methods, since the total number of degrees of freedom has a strong influence on the performance of the solver and an even greater impact on the memory consumption. This is crucial for DG methods, especially, when doing real world 3D simulations. We will demonstrate these aspects in the following sections. 3.1 The Discontinuous Galerkin Method We write the Navier-Stokes equations in the form u t + f (u) = q(t, u), (1) with suitable initial and boundary conditions in a domain Ω [0, T ] R d R + 0. Here, u = u ( x, t) R d+2 is the state vector, f (u) = f C (u) f D (u, u) is the 5

6 physical flux, where f C (u) is the convective (i.e. hyperbolic) and f D (u, u) the diffusive (i.e. parabolic) flux component. The possibly time and space dependent source term is given by q(t, x, u). We derive the DG method by first subdividing the domain Ω into non-overlapping grid cells Q. In each grid cell we approximate the state vector using a local polynomial approximation of the form u ( x, t) N Q u Q ( x, t) = û j (t) ϕ Q j ( x), (2) j=1 where in our case, {ϕ Q j ( x)} j=1,...,n are modal hierarchical orthonormal basis functions and û are the corresponding coefficients in the cell Q. The basis functions are constructed from a monomial basis with a simple Gram-Schmidt orthogonalization algorithm for arbitrary (reference) grid cell types. The dimension of the local approximation space depends on the spatial dimension d and the polynomial degree p N = N(p, d) = (p + d)!. (3) p!d! The next step of our approximation is to define how the unknown degrees of freedom û j (t) are determined. The basis of the considered discontinuous Galerkin method is a weak formulation. Neglecting the source term for now, we insert the approximate solution (2) into the conservation law (1), multiply with a smooth test function φ = φ ( x) and integrate over Q to obtain u Q t + f ( u Q), φ Q = 0, (4) where.,. Q denotes the L 2 (Q) scalar product over Q. We proceed with an integration by parts to obtain u Q t, φ Q + (f (u) n, φ) Q f ( u Q), φ Q = 0, (5) where (.,.) Q denotes the surface integral over the boundary of the element Q. As the approximate solution is in general discontinuous across grid cell interfaces, the trace of the flux normal component f (u) n is not uniquely defined. To get a stable and accurate discretization, several choices for the numerical approximation are known. Here, we use the HLLC flux [40]. For a purely convective problem inserting the trace approximation f ( u Q) n g C (u, u +, n) into equation (5) would yield u Q t, φ Q + ( g C ( u, u +, n ), φ ) Q f C ( u Q), φ Q = 0. (6) We denote by (.) values at the inner side of a cell interface, i.e. values that depend on u Q and by (.) + values that depend on the neighbor cells sharing the interface with the cell Q. 6

7 The handling of the diffusive part of the flux is a little more delicate for DG methods, because the jump in the gradients needs special handling. Several authors have suggested solutions for this problem [32,7,1,4,5] and all of these have been used in conjunction with implicit temporal discretizations. In this work we apply the dgrp flux of Gassner, Lörcher and Munz [13,14,27]. The dgrp flux is an extension of the symmetric interior penalty (SIP) method for compressible Navier-Stokes equations to guarantee optimal order of convergence. We choose this variant of the diffusion flux as it has been derived in a way that optimizes stability, i. e. minimizes the eigenvalues of the DG operator [13,14]. From a technical point of view this flux introduces an approximation of the trace of the flux normal component f D (u, u) n g dgrp (u, u, u +, u +, η, n), where η is a parameter that depends on the geometry of the cell Q and its neighbor and the local order of the polynomial approximation (2). To ensure adjoint consistency an additional surface flux term h (u, u +, n) is introduced via two integrations by parts [14] yielding the final DG formulation u Q t, φ Q + ( g C g dgrp, φ ) Q (h, φ) Q f C f D, φ Q = 0. (7) The coupling between elements and thus the resulting fill in of the Jacobian matrix is comparable to the standard SIP method and the commonly used second method of Bassi and Rebay (BR2). We expect thus that the results shown below are directly applicable for those diffusive flux variants as well, whereas flux functions with different element coupling such as e.g. the local DG (LDG) and its modification, the compact DG (CDG), may perform differently, although we expect the impact of the choice of the diffusive flux function not to be significant. 3.2 Nodal Integration The computation of the volume and surface quadrature operators can be a very expensive task if standard methods such as Gaussian quadrature are used, which is caused by the high number of polynomial evaluations required for computing the fluxes. Based on the nodal DG scheme developed by Hesthaven and Warburton [17], Gassner et al. developed a way of constructing efficient quadrature operators that work on arbitrarily shaped elements, see [15] for further details. This has the advantage that the number of degrees of freedom does not depend on the element shape as it would be for a purely nodal scheme when using elements other than tetrahedra. As points to define the nodal basis, Legendre-Gauss-Lobatto (LGL) points are used on edges and then a method called LGL-type nesting is used to determine the interior points, which leads to a small Lebesgue constant. The coexistence of modal and nodal elements is quite natural for a DG scheme 7

8 since the transformation from modal (û) to nodal (ũ) degrees of freedom is nothing else but a polynomial evaluation of the modal polynomials at the nodal interpolation points which can be expressed in the form of a matrixvector-multiplication: ũ = Vû. (8) Here V is a Vandermonde matrix containing the evaluations of the modal polynomials at the interpolation points. The back transformation can be implemented using the inverse of the Vandermonde matrix: û = V 1 ũ. (9) If the number of nodal interpolation points is different from the number of modal degrees of freedom, as is the case for elements other than tetrahedra, the approximate inverse V 1 is defined using a least squares procedure based on singular value decomposition [15]. The nodal DG method can be conveniently formulated in terms of matrices representing the discrete integrals in (7): nfaces Mũ t + M S d i g i N i h i S }{{} k f k = 0. (10) }{{} i=1 k=1 surface integral volume integral In the case of nonlinear equations, such as e.g. the compressible Navier-Stokes equations, the nonlinearity is present in the evaluation of the fluxes. In Eq. (10), f k, k = 1,..., d are the vectors of flux evaluations at all nodal points, while g i and h i stand for the evaluations of the surface flux approximations at the nodal points of the element face i. The operators in Eq. (10) are designed to act on nodal input vectors and to produce a nodal output. Using Eq. (9) all the operators in Eq. (10) can be modified in order to produce a modal output yielding the mixed modal-nodal DG method: ( nfaces û t = V 1 M 1 i=1 ) ( ) d M S i g i N i hi S k fk. (11) k=1 4 Time integration scheme Equation (11) represents a system of ordinary differential equations (ODEs) in the cell Q. If we combine the modal coefficient vectors in one vector u R m, we obtain a large system of ODEs u t (t) = f(t, u(t)), (12) where f is a vector valued function corresponding to the right hand side in (11) for the whole grid. Generally, from now on a vector with an underbar 8

9 will denote a vector from R m. We denote the time step size by t and u n is the numerical approximation to u(t n ). Note that the explicit dependence of the right hand side in (12) on t is relevant only for time dependent boundary conditions or time dependent source terms. 4.1 ESDIRK4 We will restrict ourselves here to the explicit step singly diagonally implicit Runge-Kutta method of fourth order (ESDIRK4), designed in [22]. Given coefficients a ij and b i, such a method with s stages can be written as i U i = u n + t a ij f(t n + c j t n, U j ), i = 1,..., s (13) j=1 s u n+1 = u n + t b j f(t n + c j t n, U j ). (14) j=1 Thus, all entries of the Butcher array in the strictly upper triangular part are γ γ γ / γ / γ / γ γ b i γ ˆbi b i ˆb i Table 1 Butcher diagram for ESDIRK4 with γ = zero. The coefficients can be obtained from table 1. This scheme is A-stable, also L-stable and stiffly accurate. The point about DIRK schemes is, that the computation of the stage vectors corresponds to the sequential application of several implicit Euler steps. With the starting vectors we can solve for the stage values s i = u n i 1 + t a ij f(t n + c j t n, U j ), (15) j=1 U i = s i + ta ii f(t n + c i t n, U i ). (16) The equation (16) corresponds to a step of the implicit Euler method with starting vector s i and time step a ii t. Note that because the method is stiffly accurate, u n+1 = U s and thus we do not need to evaluate (14). 9

10 The first explicit step of the Runge-Kutta schemes allows to have a stage order of two, but also means that the methods can t be algebraically stable. Furthermore, the explicit stage involving f(t n, u n ) allows to reuse the last stage derivative from the last time step, since f(t n+1, u n+1 ) from the last time step is the same quantity, thus avoiding an evaluation of the right hand side. 4.2 Adaptive time step size selection For unsteady flows, we need to make sure that the time integration error can be controlled. To do this, we estimate the time integration error and select the time step size accordingly. This is done using embedded schemes of a lower order ˆp. For ESDIRK4, ˆp = 3. Comparing the local truncation error of both schemes, we obtain the following estimate for the local error of the lower order scheme: s l t n (b j ˆb j )f(t n + c j t n, U j ). (17) j=1 To determine the new step size, we decide beforehand on a target error tolerance and use a common fixed resolution test [39]. This means that we define the error tolerance per component via d i = RT OL u n i + AT OL, (18) where RT OL and AT OL are the relative and absolute tolerances. We always choose AT OL = RT OL =: T OL. Then we compare this to the local error estimate via requiring l./d 1, where. denotes a pointwise division operator and we use the 2-norm throughout the text. The next question is, how the time step has to be chosen, such that the error can be controlled. The classical method is the following, also called EPS (error per step) control [16]: t new = t n l./d 1/(ˆp+1). (19) This is combined with two safety factors to avoid volatile increases or decreases in time step size: if l./d 1, t n+1 = t n max(f min, f safety l./d 1/ˆp+1 ) else t n+1 = t n min(f max, f safety l./d 1/ˆp+1 ). 10

11 Here, we chose f min = 0.3, f max = 2.0 and f safety = Solver for the nonlinear equation systems The application of implicit time discretizations used herein leads to a globally coupled nonlinear system of equations of the form u ν+1 = a ν+1 t f ( u ν+1) + R fix ( u 1,..., u ν), (20) where we assume that u 1,..., u ν are given, R fix depends on space- and time discretization and a ν+1 is a constant depending on the time discretization method. 5.1 Inexact Newton method To solve the appearing nonlinear systems, we use inexact Newton s method, which is locally convergent and can exhibit quadratic convergence. This method solves the root equation u a ν+1 t f (u) R fix ( u 1,..., u ν) =: F(u) = 0. As a termination criterion we use a residual based one similar to (18) with a relative tolerance, resulting in F(u k ) ɛ = ɛ r F(u 0 ). If the iteration does not converge after a maximal number of iterations, the time step is repeated with half the time step size. In particular, we will use the inexact Newton s method from [8], where the linear system in the k-th Newton step is solved only up to a relative tolerance, given by a forcing term η k. This can be written as: F(u) u + F(u (k) ) u η k F(u k ) (21) u (k) u (k+1) = u (k) + u, k = 0, 1,... In [11], the choice of the sequence of forcing terms is discussed and it is proved that the inexact Newton iteration (21) converges linearly. Moreover, if η k 0, the convergence is superlinear and 11

12 if η k K η F(u k ) p for some K η > 0, p [0, 1], the convergence is superlinear with order 1 + p. In particular, this means that for a properly chosen sequence of forcing terms, the convergence can be quadratic. A way of achieving this (as proved in [11]) is the following: η A k = γ F(u k) 2 F(u k 1 ) 2 with a parameter γ (0, 1]. The theorem says that convergence is quadratic if this sequence is bounded away from one uniformly. Therefore, we set η 0 = η max for some η max < 1 and for k > 0: η k = min(η max, η A k ). Eisenstat and Walker furthermore suggest safeguards to avoid volatile decreases in η k. To this end, γη 2 k 1 > 0.1 is used as a condition to determine if η k 1 is rather large and thus the definition of η k is refined to η B k = min(η max, max(η A k, γη 2 k 1)). Finally, to avoid oversolving in the final stages, they use η k = min(η max, max(η B k, 0.5ɛ/ F(u k ) )), where ɛ is the tolerance at which the nonlinear iteration would terminate. 5.2 Linear Solver In each iteration of the inexact Newton method, a linear equation system of the form Ax = b, A R m m has to be solved with A = I a ν+1 t f u ū. In general, A consist of n E block rows, where n E is the number of elements in our computational domain. Each row i consists of a dense main diagonal block of the size (N i n V ar ) (N i n V ar ), where N i is the number of degrees of freedom of the cell i given by Eq. (3) and n V ar is the number of unknowns in the equations that are being solved. For the compressible Navier-Stokes Equations n V ar = 4 for the two-dimensional case and n V ar = 5 for the threedimensional case. In addition to the main block each Neumann neighbor j of i, i.e. a neighbor j sharing a common face with i, contributes a block of the size (N i n V ar ) (N j n V ar ) which leads to a quickly rising number of 12

13 Memory / Element [MB] Tri Quad Tet Hex DOF / GB Order of Approximation Order of Approximation (a) Memory required for each element (b) Number of degrees of freedom that can be stored in the Jacobian per GB of memory Fig. 1. Memory considerations for the Jacobian matrix entries when using higher order DG methods. Figure 1a shows the memory requirements per block row for different element types and orders of approximation in 2D and 3D where we assumed a uniform order in the computational domain, i.e. N j = N i. It becomes clear that while memory usually is not an issue for Finite Volume methods (equivalent to the first order case) it is a critical aspect for DG schemes, especially for the 3D case. In order to illustrate how restrictive this may become, Fig. 1b shows the maximum number of cells a computational domain may contain if only one gigabyte of memory is available for the storage of the Jacobian. It should also become clear now that while a 2D scheme is not a problem for today s computers this is absolutely not the case for a 3D scheme as the memory requirements are an order of magnitude more restrictive than for the 2D case. Now for real world problems, the number of unknowns is in the magnitude of tens of millions, so direct solvers are infeasible which leads us to iterative methods. In particular, Krylov subspace methods such as GMRES or BiCGSTAB have been shown to perform well in this context. In their basic version, these schemes need the Jacobian and in addition a preconditioning matrix to improve convergence speed, which is storage wise a huge problem. Furthermore, the use of approximate Jacobians to save storage and CPU time leads to a decrease of the convergence speed of the Newton method. Therefore, we will use here Jacobian-free Newton-Krylov methods [24], which do not need the Jacobian (but still a preconditioner). The idea is that in Krylov subspace methods, the Jacobian appears only in the form of matrix vector products Av i which can be approximated by a difference quotient Av i F (ū + ɛv i) F(ū) ɛ = v i a ν+1 t f (ū + ɛv i) f(ū). (22) ɛ 13

14 The parameter ɛ is a scalar, where smaller values lead to a better approximation but may lead to truncation errors. A simple choice for the parameter, that avoids cancellation but still is moderately small is given by Quin, Ludlow and Shaw [35] as eps ɛ =, u 2 where eps is the machine accuracy. Second order convergence is obtained up to ɛ-accuracy if proper forcing terms are employed, since it is possible to view the errors coming from the finite difference approximation as arising from inexact solves. Of the Krylov subspace methods suitable for the solution of unsymmetric linear systems, the GMRES method of Saad and Schultz [38] was explained by McHugh and Knoll [29] to perform better than others in the matrix free context. The reason for this is that the vectors in matrix vector multiplications in GMRES are normalised, as opposed to those in other methods. 6 Preconditioning It is well known that the speed of convergence of Krylov subspace methods depends strongly on the matrix. Therefore, right preconditioning is used to transform the linear equation system appropriately: AP 1 x P = b, x = P 1 x P. Here, P is an invertible matrix, called a right preconditioner that approximates the system matrix in a cheap way. Every time a matrix vector product Av j appears in a Krylov subspace method, the right preconditioned method is obtained by applying the preconditioner to the vector in advance and then computing the matrix vector product with A. Right preconditioning does not change the initial residual, because r 0 = b 0 Ax 0 = b 0 AP 1 x P 0. This also means that, in contrast to left preconditioning, right preconditioning does not interfere with the Eisenstat-Walker strategy, which is the main reason we use right preconditioning only. Once the termination criterion is fulfilled, the right preconditioner has to be applied one last time to change back from the preconditioned approximation to the unpreconditioned. Often, the preconditioner is not given directly, but implicitly via its inverse. Then its application corresponds to the solution of a linear equation system. If chosen well, the speedup of the Krylov subspace method is significantly and 14

15 therefore, the choice of the preconditioner is more important than the specific Krylov subspace method used. For non-normal matrices as we have here, the existing theory is not sufficient to determine optimal preconditioners in any sense. Therefore, we have to resort to numerical experiments and heuristics. An overview of preconditioners with special emphasis on application in flow problems can be found in [30]. For the DG case, Persson and Peraire have conducted a survey of several preconditioning methods in [33]. Several methods are interesting in this context. 6.1 Jacobi, SGS and ROBO-SGS An important class of preconditioners are splitting methods that are based on decomposing A into a (block) diagonal part D, an upper diagonal part U and a lower diagonal part L in such a way that A = L + D + U. These blocks are then used to obtain simple approximations to A 1. The most simple method here is block-jacobi, where the off diagonal blocks are neglected, which leads to P = D. (23) A much more sophisticated method that is a very good preconditioner for compressible flow problems is the symmetric block Gauss-Seidel-method (SGS), which corresponds to solving the equation system (D + L)D 1 (D + U)x = x P. (24) As mentioned before, the major issue here is that the Jacobian consists of blocks with hundreds of unknowns for DG methods in 3D. In FV schemes, where the blocks have size 5 in 3D, the off diagonal blocks are sometimes computed on the fly, whereas only the diagonal is stored. In the DG case, the high construction cost for the element Jacobians makes this infeasible. Therefore, using the full Jacobian A leads to significantly higher memory requirements compared to storing the diagonal D only, e.g. a factor of five for tetrahedra and a factor of seven for hexahedra. Thus, SGS needs a huge amount of storage and leads to rather high application costs. On the other hand, the storage requirements of Jacobi are reasonable, as well as the application cost, but the resulting decrease in iteration numbers is much smaller than for SGS. 15

16 Based on this observation we propose a new class of SGS-like preconditioners that is between SGS and Jacobi where a varying amount of entries of the off diagonal blocks L and U is neglected. In this way, we get a trade off between memory and application cost on the one hand and efficiency of the preconditioner on the other hand. In particular, we make use of the fact that the modal DG scheme we employ has a hierarchical basis, meaning that the basis functions can be grouped by their degree: u Q = p j=0 α =p û α ϕ Q α. Here, α is a multi-index, ϕ Q α is the unique hierarchical modal basis function corresponding to that multi-index and û α the vector of coefficients of the solution in this decomposition. A block of the Jacobian consists of subblocks where each subblock consists of the derivatives of the values corresponding to one multiindex with respect to the coefficient vector û α corresponding to one basis function and thus a possibly different multiindex. The idea of ROBO-SGS is now to reduce the interelement coupling in the Jacobian by neglecting all derivatives of higher-order degrees of freedom of the neighboring cells. Thus we neglect all derivatives in the offdiagonal blocks Jacobian with respect to degrees p > k with k user defined. We call this preconditioner ROBO-SGS-k for Reduced Offdiagonal Block Order, where k is the degree of the polynomial basis functions taken into account. Note that this idea requires the hierarchical (modal) basis and does not work with a purely nodal implementation. A similar idea is often used for finite volume discretizations, where an approximate Jacobian is computed based on the first order discretization, neglecting the impact of the reconstruction [41]. However, this does not change the amount of storage needed, but only the computational complexity of the Jacobian construction. For example, in the case k = 0 we take only the DOFs of the neighbors into account which correspond to the integral mean values of the conserved quantities. However, we keep not only the derivatives of the remaining degrees of freedom with respect to themselves, but to all degrees of freedom, resulting in a rectangular structure in the off diagonal blocks of the preconditioner. This is illustrated in Fig. 2. For k = p, we keep everything, thus recovering the original block SGS preconditioner. If we formally set k = 1, we neglect all off diagonal block entries, thus recovering block Jacobi. The number of entries of the off diagonal blocks in the preconditioner is then (N i n V ar ) ( ˆNj n V ar ), where ˆNj depends on the user-defined parameter k. While this results in a decreased accuracy of the preconditioner, the memory requirements and the computational cost of the application become smaller, 16

17 (a) k = 0 variant (b) k = 1 variant Fig. 2. Reduced versions of the off-diagonal blocks of the Jacobian, k = 0 and k = 1 variants the less degrees of freedom of the neighboring cells we take into account. Note that even in the case k = 0, the mean values of the neighbors are taken into account, thus leading to reduced offdiagonal blocks that still have a physical meaning. Furthermore, the effect of this strategy becomes more pronounced the larger the order is, since the number of basis functions corresponding to a certain degree increases with k. To estimate the memory savings, we neglect boundary elements. For the Jacobi preconditioner, we get the overall amount of memory M emory(jacobi) = nelems M emory(f ullblock), (25) where the memory of the full block is given by Memory(F ullblock) = (N i n V ar ) (N i n V ar ), (26) which scales like p 6 /36 with respect to the polynomial degree p. This means, that even the simple block Jacobi preconditioner can get prohibitive with respect to memory requirements for a three-dimensional computation for large polynomial degrees p. The full SGS preconditioner drastically amplifies this even further as we need to store the blocks for each neighbor. Thus, depending on the considered element type, we get the overall memory storage for the SGS preconditioner (assuming large total number of elements compared to the boundary elements) as Memory(SGS) = nelems Memory(F ullblock) (1 + nsides), (27) where nsides is the number of sides for the element type (e.g. nsides = 6 for the hexahedra). The memory reduction of ROBO-SGS can be expressed when we introduce the amount of memory needed by the reduced blocks Memory(Reducedblock) = (N i n V ar ) ( ˆNj n V ar ), (28) which scales like p 3 k 3 /36 for a given off block order k. The total amount of 17

18 memory needed by the ROBO-SGS preconditioner is given by M emory(robosgs) = nelems M emory(f ullblock) + nelems M emory(reducedblock) (nsides). (29) Thus, the ratio of the ROBO-SGS memory consumption in comparison to the memory needed by the Jacobi preconditioner is given by M emory(robosgs) M emory(jacobi) = 1 + nelems Memory(Reducedblock) (nsides) nelems M emory(f ullblock) = 1 + nsides ˆN j N i. (30) Consider the example of hexahedral elements (nsides = 6) and the polynomial degree p = 5, as used in the results section, we get for the ratio of the memory the values 1, 1.1, 1.4, 2.1, 3.1, 4.8 and 7 with the value of the off block order k = { 1, 0, 1, 2, 3, 4, 5}, respectively, where k = 1 yields the Jacobi preconditioner and k = 5 the full SGS preconditioner. 6.2 ILU preconditioning Another important class of preconditioners are block incomplete LU (ILU) decompositions, where the blocks correspond to the units the Jacobian consists of. The computation of a complete LU decomposition is quite expensive and in general needs also for sparse matrices full storage. By prescribing a sparsity pattern, incomplete LU decompositions can be defined. The application of such a decomposition as a preconditioner then corresponds to solving by forward-backward substition the appropriate linear equation system. The sparsity pattern can for example be influenced by the level of fill. This is in short a measure for how much beyond the original sparsity pattern is allowed for the purpose of ILU. Those decompositions with higher levels of fill are very good black box preconditioners for flow problems [30]. However they are not in line with the philosophy of matrix-free methods. Thus remains ILU(0), which has no additional level of fill beyond the sparsity pattern of the original matrix A. We use the ILU(0) preconditioner in the form proposed by Persson and Peraire [33] with the in-place factorization suggested by Diosady and Darmofal [9]. While this preconditioner usually performs better than the ones based on splitting it has the drawback that it has to act on the full Jacobian matrix which makes it less attractive in computational environments with limited memory. 18

19 6.3 Multilevel preconditioners Another possibility is to use multilevel schemes as preconditioners. A number of approaches have been tried in the context of DG methods, e.g. multigrid methods and multi-p methods with different smoothers, as well as a variant by Persson and Peraire [33], where a two-level multi-p method is used with ILU(0) as a presmoother and Jacobi as a postsmoother. We will employ a variant where no postsmoothing is employed as we have not experienced any convergence acceleration by the postsmoother and name it ILU-CSC for ILU with coarse scale correction. Since the computation of the residual requires a matrix-vector multiplication, the cost for applying a multilevel variant of one of the above preconditioners is approximately double the cost of a standard single level variant. 6.4 Parallelization Regarding parallelization, we use the MPI paradigm. The physical domain is decomposed into several domains, each of which is assigned to a processor. The matrix-free approach allows us to use the parallelization scheme described in the PhD thesis of Lörcher [26] for the function evaluation in the matrix-vector multiplication, which is shown to scale very well. The use of GMRES, however, is a drawback here, since this requires the use of k scalar products on the k-th iteration, which do not scale perfectly in parallel. However, the alternatives have other drawbacks, as discussed earlier. As for the preconditioner, with the exception of Jacobi, most schemes would actually require excessive communication at domain boundaries due to the fact that the Neumann neighbors off-block entries of the system matrix would be located on different CPUs. In order to circumvent adding this overhead to our scheme we neglect these parts of the system matrix. This way, as the number of cells per CPU decreases all preconditioners used herein ultimately converge to the Jacobi scheme reducing the overall parallel efficiency not due to communication overhead as it is the case for the scalar products but due to a degradation of the numerical scheme itself resulting in a higher iteration number for the solution process. However, as we will demonstrate in the results section the schemes scale very well. 7 Numerical results In this section, we examine the performance of the different preconditioners. To focus on the effect of the preconditioner, we freeze for all tests the numer- 19

20 ical discretization in space as well as in time: we choose the ESDIRK4 time integrator with adaptive time stepping using the tolerance T OL = We then compare the total number of GMRES iterations needed for a complete run of the solver as a measure of pre-conditioner accuracy and the total CPU time needed. While the latter is implementation dependent, it is a necessary additional information, since a more powerful preconditioner can be much more costly and less efficient overall than a less powerful preconditioner. All computations were carried out using the Fortran code HALO, developed at the Institute of Aerodynamics and Gasdynamics and all test runs were carried out in parallel using MPI with double precision arithmetic. The partitioning of the grid is achieved with a space-filling curve approach, which guarantees that the number of grid cells on a processor core is balanced and thus the memory requirements for each processor core is roughly about the same. As we are interested in three-dimensional unsteady simulations, the focus of the numerical model is on the high performance computing aspect. This means that we are especially interested in the efficiency of the preconditioner for a large numbers of cores, i.e. for parallel simulations with low computational load on each processor core. Explicit in time DG discretizations are known for their excellent strong parallel scaling, e.g. [18], and it is apriori not clear if implicit time integrators can sustain this property. As a first test case, we choose the flow past a circular cylinder with free stream Mach number M = 0.3 and Reynolds number of Re = 1, 000. The computational grid consists of 10, 400 hexahedral cells, as illustrated in figure 3, with curved grid cells at the cylinder boundary. For the discontinuous Galerkin discretization we choose polynomials with degree five, resulting in 582, 400 degrees of freedom per conservative variable or 2, 912, 000 unknowns in total. To obtain an initial condition, we decided to use an explicit time integration scheme, where we suppose that the time integration errors are negligible due to the small stability driven time step. The test interval is 1s and we choose the initial time step as t 0 = This time step size was chosen such that the resulting error estimate is between 0.9 and 1, leading to an accepted time step. The distribution of the velocity magnitude at initial time and at the end time of the test run are shown in figure 4. The following computations are all performed on the CRAY XE 6 cluster (Hermit) of the computing center HLRS. All preconditioners are tested for computations with 64, 128, 256 and 512 processor cores (threads), resulting in an average of 9100, 4550, 2275 and 1137 DOF per conservation variable on a processor core, which would be low even for an explicit in time discontinuous Galerkin discretization. Table 2 shows the result for the simulations with 64 cores. We list the number 20

21 Fig. 3. Grid for cylinder test case. The grid is extended in three dimensions with 8 regular grid cells. Fig. 4. Initial (top) and final solution of cylinder problem (bottom). Distribution of the velocity magnitude. of GMRES iterations, the overall wallclock time and a comparison of the CPU times with respect to the standard block Jacobi preconditioner. We clearly see that the number of iterations decreases the better the preconditioner, with ROBO-SGS-5 and ILU(0)-CSC being the most powerful. However, the wallclock time gives a very different picture. Here, the most powerful preconditioners are among the slowest in the end, because the preconditioner matrix has a larger sparsity pattern and is thus more expensive to apply. The most 21

22 efficient preconditioner is the ROBO-SGS-1 in this case, with the CPU time about 11% faster compared to the Jacobi preconditioner. Preconditioner Iter. CPU [s] Comparison to Jacobi [%] No preconditioner 8,797 2, Jacobi 3,712 1, ROBO-SGS-0 3,338 1, ROBO-SGS-1 2,824 1, ROBO-SGS-2 2,656 1, ROBO-SGS-3 2,641 1, ROBO-SGS-4 2,645 1, ROBO-SGS-5 2,640 2, ILU(0) 2,641 2, ILU(0)-CSC 2,640 2, Table 2 Number of iterations and wallclock time of the test computations on 64 cores of the CRAY XE6 cluster Hermit for all preconditioners with comparison of CPU time ot the Jacobi preconditioner computation. The following tables 3-5 show the results for 128, 256 and 512 processor cores (threads). The third column of the table shows the parallel scaling of the method with respect to the 64 processor cores computation. It is important to note that the no preconditioner case gives us basically the scaling similar to an explicit time discretization, as we only use the spatial DG operator to approximate the matrix-vector product. The only difference to an explicit method is that the GMRES algorithm needs an all-to-all communication because of the vector norm in each iteration. The strong scaling of over 85% in this case demonstrates again how well suited discontinuous Galerkin discretizations are for parallel computations. The results show that the scaling of the Jacobian-free implicit method is as good as the explicit method for this example. We even get better scaling results with preconditioner compared to the no preconditioner case. The reason for this is that the computational load on a processor increases due to the additional work of applying the preconditioner. Thus, the ratio of computation to communication increases, yielding a better parallel scaling for the preconditioned schemes. An additional sign for this is that the most expensive preconditioners (with the largest sparsity pattern) scale the best. The parallel implementation is such that we only consider the preconditioner for the local MPI domain, without communication of the preconditioner. By increasing the number of processors we decrease the load (number of grid 22

23 Preconditioner Iter. CPU [s] Scaling [%] Comparison to Jacobi [%] No preconditioner 8,797 1, Jacobi 3, ROBO-SGS-0 3, ROBO-SGS-1 2, ROBO-SGS-2 2, ROBO-SGS-3 2, ROBO-SGS-4 2, ROBO-SGS-5 2,641 1, ILU(0) 2,641 1, ILU(0)-CSC 2,640 1, Table 3 Number of iterations and wallclock time of the test computations on 128 cores of the CRAY XE6 cluster Hermit for all preconditioners. Strong scaling results compared to the 64 cores computations and comparison of CPU time to Jacobi preconditioner computation. Preconditioner Iter. CPU [s] Scaling [%] Comparison to Jacobi [%] No preconditioner 8, Jacobi 3, ROBO-SGS-0 3, ROBO-SGS-1 3, ROBO-SGS-2 2, ROBO-SGS-3 2, ROBO-SGS-4 2, ROBO-SGS-5 2, ILU(0) 2, ILU(0)-CSC Table 4 Number of iterations and wallclock time of the test computations on 256 cores of the CRAY XE6 cluster Hermit for all preconditioners. Strong scaling results compared to the 64 cores computations and comparison of CPU time to Jacobi preconditioner computation. cells) on the processor and thus the preconditioners all get more similar to block Jacobi. In the extreme case of only one element on a processor core, all preconditioners would be block Jacobi due to the parallelisation we use. Thus, 23

24 Preconditioner Iter. CPU [s] Scaling [%] Comparison to Jacobi [%] No preconditioner 8, Jacobi 3, ROBO-SGS-0 3, ROBO-SGS-1 3, ROBO-SGS-2 3, ROBO-SGS-3 2, ROBO-SGS-4 2, ROBO-SGS-5 2, ILU(0) 2, ILU(0)-CSC 2, Table 5 Number of iterations and wallclock time of the test computations on 512 cores of the CRAY XE6 cluster Hermit for all preconditioners. Strong scaling results compared to the 64 cores computations and comparison of CPU time to Jacobi preconditioner computation. we can observe that the number of iterations slightly increases when increasing the processor numbers for the more powerful preconditioning techniques. This has furthermore the effect that the difference to block Jacobi with respect to CPU time decreases, since Jacobi is not affected by the parallelisation due to its element local nature. We see that the most efficient preconditioners are the ROBO-SGS variants with low off block order. But using 512 processor cores (threads), the difference is only about 7% in favor of the best preconditioner, ROBO-SGS 2. It is clear that the more and more processors we use for the computation the more efficient block Jacobi gets in comparison to the more sophisticated preconditioners. This suggests that more powerful preconditioner are more effective compared to Jacobi when we have a large number of elements per process, as in this case their iteration number decreasing effect is more pronounced. To demonstrate this we consider a second test case which we compute with only eight cores on a 4 quad core AMD Opteron We consider for this the flow past a sphere at a Mach number of M = 0.3 and a Reynolds number of Re = 1, 000. The unstructured grid is larger compared to the cylinder grid and consists of 21, 128 hexahedral grid cells and the polynomial degree is chosen equal to four, resulting in 739, 480 DOF per conservative variable and a total of 3, 697, 400 unknowns. As before, we use again an explicit time integrator to generate a time error free inital flow field for our tests. We perform the computations with all preconditioner variants for a time interval of 30 seconds. The initial solution and the result at the end when using ESDIRK4 time integration with 24

25 a tolerance of T OL = 10 3 and an initial time step of t = are shown in figure 5, where we see isosurfaces of λ 2 = 10 4, a common vortex identifier [19]. Note that there is no visual difference between the results for ESDIRK4 and those obtained using an explicit Runge-Kutta scheme. Fig. 5. Isosurfaces of λ 2 = 10 4 for initial (left) and final solution of sphere problem (right). Table 6 shows the results of this simulation. Since only 8 processes are used to compute this test case, we have an average of 2641 grid cells on one core. The results show that the most efficient preconditioner, the ROBO-SGS-1, is about 30% faster compared to the Jacobi preconditioner. Using this result only, obtained with a low number of processors, one could argue that the new intermediate class of preconditioner, the ROBO-SGS-1, is the most effective among all preconditioner techniques. Again, the more powerful preconditioners like ILU(0) and ILU(0)-CSC are not more computationally efficient. If we compare this to the results with high processor numbers, where the maximum difference of the fastest preconditioner was only about 7%, we get the outcome that the standard Jacobi preconditioner is a viable candidate with good efficiency and scaling. This shows that for a meaningful comparison of preconditioner techniques for the simulation of three-dimensional unsteady compressible flows, test runs with high number of processors, i.e. low number of grid cells on a processor core (thread), are necessary to get the right picture for practical applications. 25

A new class of preconditioners for discontinuous Galerkin methods for unsteady 3D Navier-Stokes equations: ROBO-SGS

A new class of preconditioners for discontinuous Galerkin methods for unsteady 3D Navier-Stokes equations: ROBO-SGS A new class of preconditioners for discontinuous Galerkin methods for unsteady 3D Navier-Stokes equations: ROBO-SGS Philipp Birken a Gregor Gassner b Mark Haas b Claus-Dieter Munz b a University of Kassel,

More information

Newton s Method and Efficient, Robust Variants

Newton s Method and Efficient, Robust Variants Newton s Method and Efficient, Robust Variants Philipp Birken University of Kassel (SFB/TRR 30) Soon: University of Lund October 7th 2013 Efficient solution of large systems of non-linear PDEs in science

More information

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems P.-O. Persson and J. Peraire Massachusetts Institute of Technology 2006 AIAA Aerospace Sciences Meeting, Reno, Nevada January 9,

More information

Implicit Solution of Viscous Aerodynamic Flows using the Discontinuous Galerkin Method

Implicit Solution of Viscous Aerodynamic Flows using the Discontinuous Galerkin Method Implicit Solution of Viscous Aerodynamic Flows using the Discontinuous Galerkin Method Per-Olof Persson and Jaime Peraire Massachusetts Institute of Technology 7th World Congress on Computational Mechanics

More information

Optimizing Runge-Kutta smoothers for unsteady flow problems

Optimizing Runge-Kutta smoothers for unsteady flow problems Optimizing Runge-Kutta smoothers for unsteady flow problems Philipp Birken 1 November 24, 2011 1 Institute of Mathematics, University of Kassel, Heinrich-Plett-Str. 40, D-34132 Kassel, Germany. email:

More information

A comparison of Rosenbrock and ESDIRK methods combined with iterative solvers for unsteady compressible

A comparison of Rosenbrock and ESDIRK methods combined with iterative solvers for unsteady compressible Noname manuscript No. (will be inserted by the editor) A comparison of Rosenbrock and ESDIRK methods combined with iterative solvers for unsteady compressible flows David S. Blom Philipp Birken Hester

More information

A Linear Multigrid Preconditioner for the solution of the Navier-Stokes Equations using a Discontinuous Galerkin Discretization. Laslo Tibor Diosady

A Linear Multigrid Preconditioner for the solution of the Navier-Stokes Equations using a Discontinuous Galerkin Discretization. Laslo Tibor Diosady A Linear Multigrid Preconditioner for the solution of the Navier-Stokes Equations using a Discontinuous Galerkin Discretization by Laslo Tibor Diosady B.A.Sc., University of Toronto (2005) Submitted to

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large

More information

Chapter 7 Iterative Techniques in Matrix Algebra

Chapter 7 Iterative Techniques in Matrix Algebra Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

NEWTON-GMRES PRECONDITIONING FOR DISCONTINUOUS GALERKIN DISCRETIZATIONS OF THE NAVIER-STOKES EQUATIONS

NEWTON-GMRES PRECONDITIONING FOR DISCONTINUOUS GALERKIN DISCRETIZATIONS OF THE NAVIER-STOKES EQUATIONS NEWTON-GMRES PRECONDITIONING FOR DISCONTINUOUS GALERKIN DISCRETIZATIONS OF THE NAVIER-STOKES EQUATIONS P.-O. PERSSON AND J. PERAIRE Abstract. We study preconditioners for the iterative solution of the

More information

Stabilization and Acceleration of Algebraic Multigrid Method

Stabilization and Acceleration of Algebraic Multigrid Method Stabilization and Acceleration of Algebraic Multigrid Method Recursive Projection Algorithm A. Jemcov J.P. Maruszewski Fluent Inc. October 24, 2006 Outline 1 Need for Algorithm Stabilization and Acceleration

More information

Interior penalty tensor-product preconditioners for high-order discontinuous Galerkin discretizations

Interior penalty tensor-product preconditioners for high-order discontinuous Galerkin discretizations Interior penalty tensor-product preconditioners for high-order discontinuous Galerkin discretizations Will Pazner Brown University, 8 George St., Providence, RI, 9, U.S.A. Per-Olof Persson University of

More information

Shock Capturing for Discontinuous Galerkin Methods using Finite Volume Sub-cells

Shock Capturing for Discontinuous Galerkin Methods using Finite Volume Sub-cells Abstract We present a shock capturing procedure for high order Discontinuous Galerkin methods, by which shock regions are refined in sub-cells and treated by finite volume techniques Hence, our approach

More information

Lecture 8: Fast Linear Solvers (Part 7)

Lecture 8: Fast Linear Solvers (Part 7) Lecture 8: Fast Linear Solvers (Part 7) 1 Modified Gram-Schmidt Process with Reorthogonalization Test Reorthogonalization If Av k 2 + δ v k+1 2 = Av k 2 to working precision. δ = 10 3 2 Householder Arnoldi

More information

Introduction. Finite and Spectral Element Methods Using MATLAB. Second Edition. C. Pozrikidis. University of Massachusetts Amherst, USA

Introduction. Finite and Spectral Element Methods Using MATLAB. Second Edition. C. Pozrikidis. University of Massachusetts Amherst, USA Introduction to Finite and Spectral Element Methods Using MATLAB Second Edition C. Pozrikidis University of Massachusetts Amherst, USA (g) CRC Press Taylor & Francis Group Boca Raton London New York CRC

More information

arxiv: v1 [math.na] 25 Jan 2017

arxiv: v1 [math.na] 25 Jan 2017 Stage-parallel fully implicit Runge-Kutta solvers for discontinuous Galerkin fluid simulations arxiv:1701.07181v1 [math.na] 25 Jan 2017 Will Pazner 1 and Per-Olof Persson 2 1 Division of Applied Mathematics,

More information

A STUDY OF MULTIGRID SMOOTHERS USED IN COMPRESSIBLE CFD BASED ON THE CONVECTION DIFFUSION EQUATION

A STUDY OF MULTIGRID SMOOTHERS USED IN COMPRESSIBLE CFD BASED ON THE CONVECTION DIFFUSION EQUATION ECCOMAS Congress 2016 VII European Congress on Computational Methods in Applied Sciences and Engineering M. Papadrakakis, V. Papadopoulos, G. Stefanou, V. Plevris (eds.) Crete Island, Greece, 5 10 June

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

Multigrid Algorithms for High-Order Discontinuous Galerkin Discretizations of the Compressible Navier-Stokes Equations

Multigrid Algorithms for High-Order Discontinuous Galerkin Discretizations of the Compressible Navier-Stokes Equations Multigrid Algorithms for High-Order Discontinuous Galerkin Discretizations of the Compressible Navier-Stokes Equations Khosro Shahbazi,a Dimitri J. Mavriplis b Nicholas K. Burgess b a Division of Applied

More information

Nonlinear iterative solvers for unsteady Navier-Stokes equations

Nonlinear iterative solvers for unsteady Navier-Stokes equations Proceedings of Symposia in Applied Mathematics Nonlinear iterative solvers for unsteady Navier-Stokes equations Philipp Birken and Antony Jameson This paper is dedicated to Gene Golub. Abstract. The application

More information

Termination criteria for inexact fixed point methods

Termination criteria for inexact fixed point methods Termination criteria for inexact fixed point methods Philipp Birken 1 October 1, 2013 1 Institute of Mathematics, University of Kassel, Heinrich-Plett-Str. 40, D-34132 Kassel, Germany Department of Mathematics/Computer

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers Applied and Computational Mathematics 2017; 6(4): 202-207 http://www.sciencepublishinggroup.com/j/acm doi: 10.11648/j.acm.20170604.18 ISSN: 2328-5605 (Print); ISSN: 2328-5613 (Online) A Robust Preconditioned

More information

Performance tuning of Newton-GMRES methods for discontinuous Galerkin discretizations of the Navier-Stokes equations

Performance tuning of Newton-GMRES methods for discontinuous Galerkin discretizations of the Navier-Stokes equations Fluid Dynamics and Co-located Conferences June 24-27, 2013, San Diego, CA 21st AIAA Computational Fluid Dynamics Conference AIAA 2013-2685 Performance tuning of Newton-GMRES methods for discontinuous Galerkin

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

High Performance Nonlinear Solvers

High Performance Nonlinear Solvers What is a nonlinear system? High Performance Nonlinear Solvers Michael McCourt Division Argonne National Laboratory IIT Meshfree Seminar September 19, 2011 Every nonlinear system of equations can be described

More information

Solving Large Nonlinear Sparse Systems

Solving Large Nonlinear Sparse Systems Solving Large Nonlinear Sparse Systems Fred W. Wubs and Jonas Thies Computational Mechanics & Numerical Mathematics University of Groningen, the Netherlands f.w.wubs@rug.nl Centre for Interdisciplinary

More information

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems Per-Olof Persson and Jaime Peraire Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A. We present an efficient implicit

More information

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization

More information

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White Introduction to Simulation - Lecture 2 Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy Outline Reminder about

More information

Preface to the Second Edition. Preface to the First Edition

Preface to the Second Edition. Preface to the First Edition n page v Preface to the Second Edition Preface to the First Edition xiii xvii 1 Background in Linear Algebra 1 1.1 Matrices................................. 1 1.2 Square Matrices and Eigenvalues....................

More information

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication. CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax

More information

Approximate tensor-product preconditioners for very high order discontinuous Galerkin methods

Approximate tensor-product preconditioners for very high order discontinuous Galerkin methods Approximate tensor-product preconditioners for very high order discontinuous Galerkin methods Will Pazner 1 and Per-Olof Persson 2 1 Division of Applied Mathematics, Brown University, Providence, RI, 02912

More information

1. Fast Iterative Solvers of SLE

1. Fast Iterative Solvers of SLE 1. Fast Iterative Solvers of crucial drawback of solvers discussed so far: they become slower if we discretize more accurate! now: look for possible remedies relaxation: explicit application of the multigrid

More information

Solving PDEs with Multigrid Methods p.1

Solving PDEs with Multigrid Methods p.1 Solving PDEs with Multigrid Methods Scott MacLachlan maclachl@colorado.edu Department of Applied Mathematics, University of Colorado at Boulder Solving PDEs with Multigrid Methods p.1 Support and Collaboration

More information

Numerical Mathematics

Numerical Mathematics Alfio Quarteroni Riccardo Sacco Fausto Saleri Numerical Mathematics Second Edition With 135 Figures and 45 Tables 421 Springer Contents Part I Getting Started 1 Foundations of Matrix Analysis 3 1.1 Vector

More information

A high-order discontinuous Galerkin solver for 3D aerodynamic turbulent flows

A high-order discontinuous Galerkin solver for 3D aerodynamic turbulent flows A high-order discontinuous Galerkin solver for 3D aerodynamic turbulent flows F. Bassi, A. Crivellini, D. A. Di Pietro, S. Rebay Dipartimento di Ingegneria Industriale, Università di Bergamo CERMICS-ENPC

More information

9.1 Preconditioned Krylov Subspace Methods

9.1 Preconditioned Krylov Subspace Methods Chapter 9 PRECONDITIONING 9.1 Preconditioned Krylov Subspace Methods 9.2 Preconditioned Conjugate Gradient 9.3 Preconditioned Generalized Minimal Residual 9.4 Relaxation Method Preconditioners 9.5 Incomplete

More information

Solving Ax = b, an overview. Program

Solving Ax = b, an overview. Program Numerical Linear Algebra Improving iterative solvers: preconditioning, deflation, numerical software and parallelisation Gerard Sleijpen and Martin van Gijzen November 29, 27 Solving Ax = b, an overview

More information

Discontinuous Galerkin methods for nonlinear elasticity

Discontinuous Galerkin methods for nonlinear elasticity Discontinuous Galerkin methods for nonlinear elasticity Preprint submitted to lsevier Science 8 January 2008 The goal of this paper is to introduce Discontinuous Galerkin (DG) methods for nonlinear elasticity

More information

High Order Discontinuous Galerkin Methods for Aerodynamics

High Order Discontinuous Galerkin Methods for Aerodynamics High Order Discontinuous Galerkin Methods for Aerodynamics Per-Olof Persson Massachusetts Institute of Technology Collaborators: J. Peraire, J. Bonet, N. C. Nguyen, A. Mohnot Symposium on Recent Developments

More information

Rosenbrock time integration combined with Krylov subspace enrichment for unsteady flow simulations

Rosenbrock time integration combined with Krylov subspace enrichment for unsteady flow simulations Master of Science Thesis Rosenbrock time integration combined with Krylov subspace enrichment for unsteady flow simulations Unsteady aerodynamics David Blom January 11, 2013 Ad Rosenbrock time integration

More information

Semi-implicit Krylov Deferred Correction Methods for Ordinary Differential Equations

Semi-implicit Krylov Deferred Correction Methods for Ordinary Differential Equations Semi-implicit Krylov Deferred Correction Methods for Ordinary Differential Equations Sunyoung Bu University of North Carolina Department of Mathematics CB # 325, Chapel Hill USA agatha@email.unc.edu Jingfang

More information

Application of Dual Time Stepping to Fully Implicit Runge Kutta Schemes for Unsteady Flow Calculations

Application of Dual Time Stepping to Fully Implicit Runge Kutta Schemes for Unsteady Flow Calculations Application of Dual Time Stepping to Fully Implicit Runge Kutta Schemes for Unsteady Flow Calculations Antony Jameson Department of Aeronautics and Astronautics, Stanford University, Stanford, CA, 94305

More information

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations Sparse Linear Systems Iterative Methods for Sparse Linear Systems Matrix Computations and Applications, Lecture C11 Fredrik Bengzon, Robert Söderlund We consider the problem of solving the linear system

More information

Domain decomposition on different levels of the Jacobi-Davidson method

Domain decomposition on different levels of the Jacobi-Davidson method hapter 5 Domain decomposition on different levels of the Jacobi-Davidson method Abstract Most computational work of Jacobi-Davidson [46], an iterative method suitable for computing solutions of large dimensional

More information

Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems Iterative Methods for Sparse Linear Systems Luca Bergamaschi e-mail: berga@dmsa.unipd.it - http://www.dmsa.unipd.it/ berga Department of Mathematical Methods and Models for Scientific Applications University

More information

The Discontinuous Galerkin Finite Element Method

The Discontinuous Galerkin Finite Element Method The Discontinuous Galerkin Finite Element Method Michael A. Saum msaum@math.utk.edu Department of Mathematics University of Tennessee, Knoxville The Discontinuous Galerkin Finite Element Method p.1/41

More information

Numerical Methods for Large-Scale Nonlinear Equations

Numerical Methods for Large-Scale Nonlinear Equations Slide 1 Numerical Methods for Large-Scale Nonlinear Equations Homer Walker MA 512 April 28, 2005 Inexact Newton and Newton Krylov Methods a. Newton-iterative and inexact Newton methods. Slide 2 i. Formulation

More information

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation Tao Zhao 1, Feng-Nan Hwang 2 and Xiao-Chuan Cai 3 Abstract In this paper, we develop an overlapping domain decomposition

More information

Lab 1: Iterative Methods for Solving Linear Systems

Lab 1: Iterative Methods for Solving Linear Systems Lab 1: Iterative Methods for Solving Linear Systems January 22, 2017 Introduction Many real world applications require the solution to very large and sparse linear systems where direct methods such as

More information

Discontinuous Galerkin Methods

Discontinuous Galerkin Methods Discontinuous Galerkin Methods Joachim Schöberl May 20, 206 Discontinuous Galerkin (DG) methods approximate the solution with piecewise functions (polynomials), which are discontinuous across element interfaces.

More information

Parallel Discontinuous Galerkin Method

Parallel Discontinuous Galerkin Method Parallel Discontinuous Galerkin Method Yin Ki, NG The Chinese University of Hong Kong Aug 5, 2015 Mentors: Dr. Ohannes Karakashian, Dr. Kwai Wong Overview Project Goal Implement parallelization on Discontinuous

More information

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU Preconditioning Techniques for Solving Large Sparse Linear Systems Arnold Reusken Institut für Geometrie und Praktische Mathematik RWTH-Aachen OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative

More information

Lecture 9 Approximations of Laplace s Equation, Finite Element Method. Mathématiques appliquées (MATH0504-1) B. Dewals, C.

Lecture 9 Approximations of Laplace s Equation, Finite Element Method. Mathématiques appliquées (MATH0504-1) B. Dewals, C. Lecture 9 Approximations of Laplace s Equation, Finite Element Method Mathématiques appliquées (MATH54-1) B. Dewals, C. Geuzaine V1.2 23/11/218 1 Learning objectives of this lecture Apply the finite difference

More information

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

6. Iterative Methods for Linear Systems. The stepwise approach to the solution... 6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT -09 Computational and Sensitivity Aspects of Eigenvalue-Based Methods for the Large-Scale Trust-Region Subproblem Marielba Rojas, Bjørn H. Fotland, and Trond Steihaug

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Jacobian-Free Newton Krylov Discontinuous Galerkin Method and Physics-Based Preconditioning for Nuclear Reactor Simulations

Jacobian-Free Newton Krylov Discontinuous Galerkin Method and Physics-Based Preconditioning for Nuclear Reactor Simulations INL/CON-08-14243 PREPRINT Jacobian-Free Newton Krylov Discontinuous Galerkin Method and Physics-Based Preconditioning for Nuclear Reactor Simulations International Conference on Reactor Physics, Nuclear

More information

Algebraic Multigrid as Solvers and as Preconditioner

Algebraic Multigrid as Solvers and as Preconditioner Ò Algebraic Multigrid as Solvers and as Preconditioner Domenico Lahaye domenico.lahaye@cs.kuleuven.ac.be http://www.cs.kuleuven.ac.be/ domenico/ Department of Computer Science Katholieke Universiteit Leuven

More information

An Introduction to the Discontinuous Galerkin Method

An Introduction to the Discontinuous Galerkin Method An Introduction to the Discontinuous Galerkin Method Krzysztof J. Fidkowski Aerospace Computational Design Lab Massachusetts Institute of Technology March 16, 2005 Computational Prototyping Group Seminar

More information

Parallel Methods for ODEs

Parallel Methods for ODEs Parallel Methods for ODEs Levels of parallelism There are a number of levels of parallelism that are possible within a program to numerically solve ODEs. An obvious place to start is with manual code restructuring

More information

ON THE BENEFIT OF THE SUMMATION-BY-PARTS PROPERTY ON INTERIOR NODAL SETS

ON THE BENEFIT OF THE SUMMATION-BY-PARTS PROPERTY ON INTERIOR NODAL SETS 6th European Conference on Computational Mechanics (ECCM 6 7th European Conference on Computational Fluid Dynamics (ECFD 7 11 15 June 018, Glasgow, UK ON THE BENEFIT OF THE SUMMATION-BY-PARTS PROPERTY

More information

TAU Solver Improvement [Implicit methods]

TAU Solver Improvement [Implicit methods] TAU Solver Improvement [Implicit methods] Richard Dwight Megadesign 23-24 May 2007 Folie 1 > Vortrag > Autor Outline Motivation (convergence acceleration to steady state, fast unsteady) Implicit methods

More information

Numerical methods for the Navier- Stokes equations

Numerical methods for the Navier- Stokes equations Numerical methods for the Navier- Stokes equations Hans Petter Langtangen 1,2 1 Center for Biomedical Computing, Simula Research Laboratory 2 Department of Informatics, University of Oslo Dec 6, 2012 Note:

More information

The solution of the discretized incompressible Navier-Stokes equations with iterative methods

The solution of the discretized incompressible Navier-Stokes equations with iterative methods The solution of the discretized incompressible Navier-Stokes equations with iterative methods Report 93-54 C. Vuik Technische Universiteit Delft Delft University of Technology Faculteit der Technische

More information

Chapter 9 Implicit integration, incompressible flows

Chapter 9 Implicit integration, incompressible flows Chapter 9 Implicit integration, incompressible flows The methods we discussed so far work well for problems of hydrodynamics in which the flow speeds of interest are not orders of magnitude smaller than

More information

An advanced ILU preconditioner for the incompressible Navier-Stokes equations

An advanced ILU preconditioner for the incompressible Navier-Stokes equations An advanced ILU preconditioner for the incompressible Navier-Stokes equations M. ur Rehman C. Vuik A. Segal Delft Institute of Applied Mathematics, TU delft The Netherlands Computational Methods with Applications,

More information

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A. AMSC/CMSC 661 Scientific Computing II Spring 2005 Solution of Sparse Linear Systems Part 2: Iterative methods Dianne P. O Leary c 2005 Solving Sparse Linear Systems: Iterative methods The plan: Iterative

More information

Newton-Krylov-Schwarz Method for a Spherical Shallow Water Model

Newton-Krylov-Schwarz Method for a Spherical Shallow Water Model Newton-Krylov-Schwarz Method for a Spherical Shallow Water Model Chao Yang 1 and Xiao-Chuan Cai 2 1 Institute of Software, Chinese Academy of Sciences, Beijing 100190, P. R. China, yang@mail.rdcps.ac.cn

More information

Multigrid Methods and their application in CFD

Multigrid Methods and their application in CFD Multigrid Methods and their application in CFD Michael Wurst TU München 16.06.2009 1 Multigrid Methods Definition Multigrid (MG) methods in numerical analysis are a group of algorithms for solving differential

More information

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros

More information

Robust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations

Robust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations Robust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations Rohit Gupta, Martin van Gijzen, Kees Vuik GPU Technology Conference 2012, San Jose CA. GPU Technology Conference 2012,

More information

A P-ADAPTIVE IMPLICIT DISCONTINUOUS GALERKIN METHOD FOR THE UNDER-RESOLVED SIMULATION OF COMPRESSIBLE TURBULENT FLOWS

A P-ADAPTIVE IMPLICIT DISCONTINUOUS GALERKIN METHOD FOR THE UNDER-RESOLVED SIMULATION OF COMPRESSIBLE TURBULENT FLOWS 6th European Conference on Computational Mechanics (ECCM 6) 7th European Conference on Computational Fluid Dynamics (ECFD 7) 11-15 June 2018, Glasgow, UK A P-ADAPTIVE IMPLICIT DISCONTINUOUS GALERKIN METHOD

More information

2.29 Numerical Fluid Mechanics Spring 2015 Lecture 9

2.29 Numerical Fluid Mechanics Spring 2015 Lecture 9 Spring 2015 Lecture 9 REVIEW Lecture 8: Direct Methods for solving (linear) algebraic equations Gauss Elimination LU decomposition/factorization Error Analysis for Linear Systems and Condition Numbers

More information

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros

More information

M.A. Botchev. September 5, 2014

M.A. Botchev. September 5, 2014 Rome-Moscow school of Matrix Methods and Applied Linear Algebra 2014 A short introduction to Krylov subspaces for linear systems, matrix functions and inexact Newton methods. Plan and exercises. M.A. Botchev

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT 16-02 The Induced Dimension Reduction method applied to convection-diffusion-reaction problems R. Astudillo and M. B. van Gijzen ISSN 1389-6520 Reports of the Delft

More information

High Order Accurate Runge Kutta Nodal Discontinuous Galerkin Method for Numerical Solution of Linear Convection Equation

High Order Accurate Runge Kutta Nodal Discontinuous Galerkin Method for Numerical Solution of Linear Convection Equation High Order Accurate Runge Kutta Nodal Discontinuous Galerkin Method for Numerical Solution of Linear Convection Equation Faheem Ahmed, Fareed Ahmed, Yongheng Guo, Yong Yang Abstract This paper deals with

More information

Constrained Minimization and Multigrid

Constrained Minimization and Multigrid Constrained Minimization and Multigrid C. Gräser (FU Berlin), R. Kornhuber (FU Berlin), and O. Sander (FU Berlin) Workshop on PDE Constrained Optimization Hamburg, March 27-29, 2008 Matheon Outline Successive

More information

6.4 Krylov Subspaces and Conjugate Gradients

6.4 Krylov Subspaces and Conjugate Gradients 6.4 Krylov Subspaces and Conjugate Gradients Our original equation is Ax = b. The preconditioned equation is P Ax = P b. When we write P, we never intend that an inverse will be explicitly computed. P

More information

RESIDUAL BASED ERROR ESTIMATES FOR THE SPACE-TIME DISCONTINUOUS GALERKIN METHOD APPLIED TO NONLINEAR HYPERBOLIC EQUATIONS

RESIDUAL BASED ERROR ESTIMATES FOR THE SPACE-TIME DISCONTINUOUS GALERKIN METHOD APPLIED TO NONLINEAR HYPERBOLIC EQUATIONS Proceedings of ALGORITMY 2016 pp. 113 124 RESIDUAL BASED ERROR ESTIMATES FOR THE SPACE-TIME DISCONTINUOUS GALERKIN METHOD APPLIED TO NONLINEAR HYPERBOLIC EQUATIONS VÍT DOLEJŠÍ AND FILIP ROSKOVEC Abstract.

More information

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1 Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations page 1 of 1 Contents 1 Introduction 1.1 Computer Science Aspects 1.2 Numerical Problems 1.3 Graphs 1.4 Loop Manipulations

More information

HYPERSONIC AERO-THERMO-DYNAMIC HEATING PREDICTION WITH HIGH-ORDER DISCONTINOUS GALERKIN SPECTRAL ELEMENT METHODS

HYPERSONIC AERO-THERMO-DYNAMIC HEATING PREDICTION WITH HIGH-ORDER DISCONTINOUS GALERKIN SPECTRAL ELEMENT METHODS 1 / 36 HYPERSONIC AERO-THERMO-DYNAMIC HEATING PREDICTION WITH HIGH-ORDER DISCONTINOUS GALERKIN SPECTRAL ELEMENT METHODS Jesús Garicano Mena, E. Valero Sánchez, G. Rubio Calzado, E. Ferrer Vaccarezza Universidad

More information

Newton-Multigrid Least-Squares FEM for S-V-P Formulation of the Navier-Stokes Equations

Newton-Multigrid Least-Squares FEM for S-V-P Formulation of the Navier-Stokes Equations Newton-Multigrid Least-Squares FEM for S-V-P Formulation of the Navier-Stokes Equations A. Ouazzi, M. Nickaeen, S. Turek, and M. Waseem Institut für Angewandte Mathematik, LSIII, TU Dortmund, Vogelpothsweg

More information

Space-time Discontinuous Galerkin Methods for Compressible Flows

Space-time Discontinuous Galerkin Methods for Compressible Flows Space-time Discontinuous Galerkin Methods for Compressible Flows Jaap van der Vegt Numerical Analysis and Computational Mechanics Group Department of Applied Mathematics University of Twente Joint Work

More information

Concepts. 3.1 Numerical Analysis. Chapter Numerical Analysis Scheme

Concepts. 3.1 Numerical Analysis. Chapter Numerical Analysis Scheme Chapter 3 Concepts The objective of this work is to create a framework to implement multi-disciplinary finite element applications. Before starting, it is necessary to explain some basic concepts of the

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 11 Partial Differential Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002.

More information

Multigrid absolute value preconditioning

Multigrid absolute value preconditioning Multigrid absolute value preconditioning Eugene Vecharynski 1 Andrew Knyazev 2 (speaker) 1 Department of Computer Science and Engineering University of Minnesota 2 Department of Mathematical and Statistical

More information

Preconditioning Techniques Analysis for CG Method

Preconditioning Techniques Analysis for CG Method Preconditioning Techniques Analysis for CG Method Huaguang Song Department of Computer Science University of California, Davis hso@ucdavis.edu Abstract Matrix computation issue for solve linear system

More information

Next topics: Solving systems of linear equations

Next topics: Solving systems of linear equations Next topics: Solving systems of linear equations 1 Gaussian elimination (today) 2 Gaussian elimination with partial pivoting (Week 9) 3 The method of LU-decomposition (Week 10) 4 Iterative techniques:

More information

Preconditioned Smoothers for the Full Approximation Scheme for the RANS Equations

Preconditioned Smoothers for the Full Approximation Scheme for the RANS Equations https://doi.org/10.1007/s10915-018-0792-9 Preconditioned Smoothers for the Full Approximation Scheme for the RANS Equations Philipp Birken 1 Jonathan Bull 2 Antony Jameson 3 Received: 14 October 2017 /

More information

Kasetsart University Workshop. Multigrid methods: An introduction

Kasetsart University Workshop. Multigrid methods: An introduction Kasetsart University Workshop Multigrid methods: An introduction Dr. Anand Pardhanani Mathematics Department Earlham College Richmond, Indiana USA pardhan@earlham.edu A copy of these slides is available

More information

Iterative Methods for Linear Systems of Equations

Iterative Methods for Linear Systems of Equations Iterative Methods for Linear Systems of Equations Projection methods (3) ITMAN PhD-course DTU 20-10-08 till 24-10-08 Martin van Gijzen 1 Delft University of Technology Overview day 4 Bi-Lanczos method

More information

Department of Applied Mathematics and Theoretical Physics. AMA 204 Numerical analysis. Exam Winter 2004

Department of Applied Mathematics and Theoretical Physics. AMA 204 Numerical analysis. Exam Winter 2004 Department of Applied Mathematics and Theoretical Physics AMA 204 Numerical analysis Exam Winter 2004 The best six answers will be credited All questions carry equal marks Answer all parts of each question

More information

Iterative solvers within sequences of large linear systems in non-linear structural mechanics

Iterative solvers within sequences of large linear systems in non-linear structural mechanics Zeitschrift für Angewandte Mathematik und Mechanik, 9 December 2008 Iterative solvers within sequences of large linear systems in non-linear structural mechanics Stefan Hartmann 1,, Jurjen Duintjer Tebbens

More information

Preconditioning for Nonsymmetry and Time-dependence

Preconditioning for Nonsymmetry and Time-dependence Preconditioning for Nonsymmetry and Time-dependence Andy Wathen Oxford University, UK joint work with Jen Pestana and Elle McDonald Jeju, Korea, 2015 p.1/24 Iterative methods For self-adjoint problems/symmetric

More information

Review of matrices. Let m, n IN. A rectangle of numbers written like A =

Review of matrices. Let m, n IN. A rectangle of numbers written like A = Review of matrices Let m, n IN. A rectangle of numbers written like a 11 a 12... a 1n a 21 a 22... a 2n A =...... a m1 a m2... a mn where each a ij IR is called a matrix with m rows and n columns or an

More information

A Space-Time Expansion Discontinuous Galerkin Scheme with Local Time-Stepping for the Ideal and Viscous MHD Equations

A Space-Time Expansion Discontinuous Galerkin Scheme with Local Time-Stepping for the Ideal and Viscous MHD Equations A Space-Time Expansion Discontinuous Galerkin Scheme with Local Time-Stepping for the Ideal and Viscous MHD Equations Ch. Altmann, G. Gassner, F. Lörcher, C.-D. Munz Numerical Flow Models for Controlled

More information