A Block Red-Black SOR Method. for a Two-Dimensional Parabolic. Equation Using Hermite. Collocation. Stephen H. Brill 1 and George F.

1 A lock ed-lack SO Method for a Two-Dimensional Parabolic Equation Using Hermite Collocation Stephen H. rill 1 and George F. Pinder 1 Department of Mathematics and Statistics University ofvermont urlington, Vermont 00 U. S. A. Department of Civil and Environmental Engineering University ofvermont urlington, Vermont 00 U. S. A. 1 Introduction In [LHH9], Lai et al. study a block Jacobi method to solve the two-dimensional Poisson equation r u = @ u @x + @ u = H(x y) (1) @y dened on the interior of the unit square S =[0 1] [0 1], discretized by the collocation method with a uniform mesh, given Dirichlet boundary conditions u(x y) =C(x y) (x y) @S () They determine eigenvalues for the iteration matrix of their block Jacobi method and then use the theory in [You1] to determine a formula for! opt, the optimal relaxation factor! for the block SO method associated with their block Jacobi scheme. In this paper, we explain how to extend their work to ensure that the optimal SO method is parallelizable by using a red-black ordering scheme. We then use these ideas to eciently solve the two-dimensional parabolic equation @u @t, @ u @x, @ u =,H(x y t) @y with Dirichlet boundary conditions.

LOCK ED-LACK SO FO A TWO-DIMENSIONAL PAAOLIC EQUATION Hermite Cubic Polynomials.1 One-dimensional formulation Let u(x) be a function dened on the interval [0 1]. Partition the interval using n equally spaced nodes 0 = x 0x 1x m = 1, where m = n, 1 is the number of elements. Let h =1=m. Consider the functions (cf. [Pic9]) and f j(x) = 8 >< > g j(x) = (x, x j,1) h [(x j, x)+h] x j,1 x x j (x j+1, x) h [h, (x j+1, x)] x j x x j+1 0 otherwise 8 >< > (x, x j,1) (x, x j) h x j,1 x x j (x j+1, x) (x, x j) h x j x x j+1 0 otherwise These are the Hermite cubic polynomials that we use as basis functions in our collocation approach. Notice that f j(x i)= ij df j dx (xi) =f j(x 0 i)=0 g j(x i)=0 8i j dg j dx (xi) =g0 j(x i)= ij where ij is the Kronecker symbol. Let u j = u(x j) and let u 0 j = u 0 (x j)= dx du (xj) for j =0 1m. Then the cubic polynomial interpolating the u j's and the u 0 j's is ^u(x) = mx j=0 8i j (u jf j(x)+u 0 jg j(x)) () In [Pap8], Papatheodorou uses g j? (x) = g j (x) h (in place of g j(x)) when forming (). He makes this choice (also used in [LHH9]) because eigenvalue analysis is much easier using g j? (x) instead of g j(x). It is easily seen that the iteration matrices studied herein that one obtains using g j(x) and g j? (x) are identical. In this paper, we use g j(x) in the computer code that generates the numerical results and employ g j? (x) for our analysis.. Two-dimensional formulation Let u(x y) be a function dened on S. Partition @S by using n equally spaced nodes in both the x- and y-directions. Letting m = n, 1 and h =1=m, we partition S into m square elements, where the dimensions of each element are h h. Ifwe consider two-dimensional bi-cubic Hermite basis polynomials, we obtain, by analogy to () ^u(x y) = mx mx q=0 r=0 [u qrf q(x)f r(y)+u x qrg q(x)f r(y)+u y qrf q(x)g r(y)+u xy qr g q(x)g r(y)] ()

where u qr = u(x qy r) u x qr = @u @x (xqyr) u y qr = @u @y (xqyr) u xy qr = @ u @x@y (xqyr) We see that ^u(x y) interpolates the functions u, @u for q r =0m. @x, @u @y, and @ u @x@y at the grid points (xqyr), Collocation Discretization of the PDE If the interpolating polynomial () is introduced into the governing equation (1), we obtain @ ^u @x + @ ^u, H(x y) =E(x y) @y where E(x y) is an error function. We see that at each of the n grid points (x qy r), we have four degrees of freedom, namely u qr, u x qr, u y qr, and u xy qr.however, on the boundary @S, many of these values are known. In particular, we know (from ()) u qr = u(x qy r)=c(x qy r) for all nodes (grid points) on @S. In addition, we can calculate on the north and south boundaries and u x qr = @u @C (xqyr) = @x @x (xqyr) u y qr = @u @C (xqyr) = @y @y (xqyr) on the east and west boundaries. We therefore know the values of a total of 8n, degrees of freedom and do not know the values of m degrees of freedom. Therefore, to uniquely determine these m degrees of freedom we require m equations, or equations per element. To achieve this, we choose four points (x ky`) in the interior of each element and enforce E(x ky`) =0ateach of these m \collocation points". It is known (from [Cel8]) that the optimal choices for the collocation points for the symmetric dierential operator given in (1) are the so-called \Gauss points". On the interval [,1 1], the Gauss points are z, where z =,1=. On the square element [,1 1] [,1 1], the Gauss points are (,z,z), (,zz), (z,z), and (zz). Transforming these four Gauss points into each ofthem elements of our mesh denes the full set of m \collocation" equations. These can be written mx mx q=0 r=0 f[f 00 q (x k)f r(y`)+f q(x k)f 00 r (y`)]u qr +[g 00 q (x k)f r(y`)+g q(x k)f 00 r (y`)]u x qr +[f 00 q (x k)g r(y`)+f q(x k)g 00 r (y`)]u y qr +[g 00 q (x k)g r(y`)+g q(x k)g 00 r (y`)]u xy qr g = H(x ky`) () where (x ky`) varies over all m collocation points.

LOCK ED-LACK SO FO A TWO-DIMENSIONAL PAAOLIC EQUATION y ^ 1 v =u xy v =u y v =u xy v =u y v =u xy v =u y v =u xy v =u xy 0 0 1 1 1 y h 0 h 1 h h h h h h y y h h h h h h 0 1 h h v =u xy v =u y v =u xy v =u y v =u xy v =u y v =u xy v =u xy 0 0 1 1 1 v =u x v =u v =u x v =u v =u x v =u v =u x v =u x 0 0 1 1 1 h h h h h h h h 0 1 y y h h h h h h h 0 1 v =u xy v =u y v =u xy v =u y v =u xy v =u y v =u xy v =u xy 0 0 1 1 1 v =u x v =u v =u x v =u v =u x v =u v =u x v =u x 0 0 1 1 1 h h h h h h h h 0 1 h y y 1 h h h h h h h 0 1 v =u xy v =u y v =u xy v =u y v =u xy v =u y v =u xy v =u xy 0 01 1 11 11 1 1 1 1 1 v =u x v =u v =u x v =u v =u x v =u v =u x v =u x 10 01 11 11 1 11 1 1 1 1 1 1 1 1 1 1 h h h h h h h h 10 11 1 1 1 1 1 1 h y 0 0 h h h h h h h h 00 01 0 0 0 0 xy y xy y xy y v =u v =u v =u v =u v =u v =u xy 01 10 0 10 0 0 0 0 0 0 0 0 v =u 00 00 0 0 v =u xy 0 0 0 1 x 0 x 1 x x x x x x > x Figure I numbering of equations and unknowns There are many ways in which to number the unknowns and equations. Each numbering system will dene a dierent structure for the matrix arising from the system of linear equations () that we must solve. We use a numbering proposed by [Cel8] and by [LHH9], which is depicted pictorially in Figure I for the case of n =. In the gure, h ij indicates the approximate location of collocation point (x jy i). It is seen that the matrix equation that arises from this numbering for n =is emev = e k ()

where em = A A,A A A 1,A A 1 A A,A A A A 1,A A 1 A A,A A A A 1,A A 1 A,A A A,A ev = v T 0 v T 1 v T v T v T v T v T v T T ek = k T 0 k T 1 k T k T k T k T k T k T T The vectors v i and k i are given by v i = v i0 v i1 v i v i v i v i v i v i T k i = k i0 k i1 k i k i k i k i k i k i T where k ij = H(x jy i), (V ij). Here V ij indicates any known boundary value information that appears on the left side of () that is pertinent to the equation dened at collocation point (x jy i). It is clear that V ij may be non-zero only when (x jy i) is in a boundary element. The submatrices A i i=1,,,,allhave the structure A i = a i a i,a i a i a i1,a i a i1 a i a i,a i a i a i a i1,a i a i1 a i a i,a i a i a i a i1,a i a i1 a i,a i a i a i,a i Although the above example is for n =, it should be clear how the corresponding matrices and vectors would appear for dierent values of n. It is seen in [LHH9] and [Pap8] that a ij = a ij 9h, where a 11 =,, 18 p a 1 =,1, 8 p a 1 = a 1 =+ p a 1 =,1, 8 p a =,, p a =, p a =0 a 1 = a =, p a =, + 18 p a =,1 + 8 p a 1 =+ p a =0 a =,1 + 8 p a =,+ p ()

LOCK ED-LACK SO FO A TWO-DIMENSIONAL PAAOLIC EQUATION lock Jacobi Method for Poisson's Equation To begin, the matrix e M is partitioned into em = A A,A A A 1,A A 1 A A,A A A A 1,A A 1 A A,A A A A 1,A A 1 A,A A A,A which we write more concisely as em = A F F C F A C A C A L C L A L (8) The block Jacobi method is then dened by edev (p+1) =(e L + e U)ev (p) + e k (9) where ev (p) is the approximation to ev after p iterations and where M e is split into M e = ed, L e, U, e where A F A ed = A and,e L =,e U = C F F C C A C L A L L We solve (9) for ev (p+1) as follows. First, we note that each ofthem +1rows of e D in (9) denes a matrix equation, each of which is entirely decoupled from the rest. Hence, each of these m + 1 matrix equations may be solved simultaneously in parallel. We see that each of these equations is of the form Av = k where A = AF A L or A v = the corresponding vector of unknowns and k = the corresponding right-hand-side vector of known values. For the case where A = AF or A L,

it is clear that A is block tridiagonal, with the blocks being matrices. For example, consider A = A F = A = a a,a a a 1,a a 1 a a,a a a a 1,a a 1 a a,a a a a 1,a a 1 a,a a a,a We employ a direct block tridiagonal solver to obtain v k. The case where A = A is just slightly more complicated. Here we see that which has the structure A = A1,A A = A 1 A Permuting the rows and columns of A via a similarity transformation (see [LHH9]) gives A 0 =

LOCK ED-LACK SO FO A TWO-DIMENSIONAL PAAOLIC EQUATION 8 which is clearly block tridiagonal, with the blocks being matrices. Obviously, we must also permute correspondingly the entries of v (giving v 0 ) and those of k (giving k 0 ). We then employ a direct block tridiagonal solver on A 0 v 0 = k 0 to obtain v 0. ed-lack SO for Poisson's Equation While the equations in (9) may be solved simultaneously in parallel, we nd that the rate at which the sequence fev (p) g converges to ev to be unacceptably slow. This motivates us to seek a method with a faster convergence rate that can still take advantage of parallelism. We recall (8) em = A F F C F A C A C A L C L A L 0 1 (10) where the last column gives the block rownumber of e M. Via a similarity transformation, we permute the rows and columns of e M (and correspondingly the entries of ev and e k) in () and (10) to obtain where M = A F M v = k (11) F A C A L C L C F A C L A More precisely, M is obtained from e M by writing from top to bottom all the even numbered block rows of e M (in ascending order), followed by all the odd numbered block rows of e M (in ascending order). We abbreviate M as M = D M U M L D 0 1 Correspondingly, we write v = v v and k = k k Analogously to (9), we split M into M = D, L, U, where D D =,L = and, U = D Then the standard block SO formulation is M L M U (D,!L)v (p+1) = [(1,!)D +!U]v (p) +!k (1) where the relaxation factor! is chosen such that 1 <!<. Dividing (1) into its red (top) and black (bottom) parts, we obtain

9 D v (p+1) and (p+1)!m Lv Wenowintroduce the vectors =(1,!)D v (p) + D v (p+1) z (p+1) c = v (p+1) c,!mu v(p) +!k (1) (p) =(1,!)D v +!k (1), v (p) c where the color subscript c = or. We also introduce the color dependent residual vectors r c (ab), dened as and r (ab) = k, r (ab) = k, D v (a) M Lv (a) + MU v(b) + Dv(b) where the superscripts (a) and (b) denote iteration level. y considering (11), it is clear that these residual vectors measure how close the approximants v (a) and v(b) are to v and v, components of the true solution of (11). Algebraically manipulating (1) and (1) and using the notation introduced above yields and D z (p+1) D z (p+1) =!r (pp) (1) =!r (p+1p) (1) which are of a form and structure very similar to that of (9). In the SO algorithm, we compute v (p+1) using (1) and (1). It is clear that we have still maintained a high degree of parallelism by using this red-black SO scheme. Evidently, all of the red equations in (1) may be solved simultaneously in parallel. Once we have obtained v (p+1) from (1), we may solve all the black equations in (1) simultaneously in parallel, obtaining v (p+1). Numerical results are illustrated in Figure II. We ran our version of the algorithm for both the Jacobi method and the red-black SO method using various values of!.wechose m =10 and chose the boundary conditions and the function H(x y) such that u = x sin y. Letting r (pp) = r (pp) T r (pp) T T, our convergence criterion was that kr (pp) k1 < 0001. The Jacobi method needed 11 iterations to converge, which is indicated by an asterisk in the middle of the graph. y comparison, for! =1, the SO method required only 19 iterations to converge. Indeed, according to the theory in [LHH9] and [You1], the optimal! for this problem is! opt 108, which agrees well with our numerical investigation. The Parabolic Equation We now seek to solve the parabolic equation @u @t = @ u @x + @ u, H(x y t) (1) @y dened on the interior of S, discretized by the collocation method with a uniform mesh, given Dirichlet boundary conditions u(x y t) =C(x y t) (x y) @S

LOCK ED-LACK SO FO A TWO-DIMENSIONAL PAAOLIC EQUATION 10 0 00 iterations until convergence 10 100 0 0 1 1.1 1. 1. 1. 1. 1. 1. 1.8 1.9 omega Figure II results using red-black SO to solve Poisson's equation We approximate the time derivative by @u @t = u(q+1), u (q) (18) t where the superscript (q) indicates the value of u after q time steps. ecalling (), we see that matrix M e was formed by evaluating @ ^u @x + @ ^u @y at the collocation points. Correspondingly, we form matrix P e by evaluating ^u at the collocation points. Clearly, ep has precisely the same structure as that of M. e Letting pij be the non-trivial entries of P e (just as the a ij's in () are the non-trivial entries of M), e we see that the numbers pij are given by p 11 =8+8 p p 1 =1+ p p 1 = p 1 =+ p p 1 =1+ p p =+ p p =, p p =1 p 1 = p =, p p =8, 8 p p =1, p p 1 =+ p p =1 p =1, p p =, p If we nowintroduce (18) and the interpolating polynomial () 1 into (1) and evaluate the right side of (1) at the collocation (Gauss) points at time t (q+1) +(1, ) t (q), where 1 The interpolating polynomial () and forcing function now have time dependence, i.e. uqr, u x qr, u y qr, u xy qr, and H are now functions also of t.

11 0 1, then we obtain the matrix form of the collocation discretization of the parabolic PDE epev (q+1), e Pev (q) = [ e Mev (q+1), e k (q+1) ]+(1, )[e Mev (q), e k (q) ] (19) t Letting = t and =(1, )t, wemay express (19) as (e P, e M)ev (q+1) =(e P + e M)ev (q), ( e k (q+1) +e k (q) ) (0) In examining (0), we see that this equation denes how wemaymove from time step (q) to time step (q + 1). In particular, at time step (q), all the vectors on the right side of (0) contain known values. Letting e Q =( e P, e M) and e b (q) = the right side of (0), we may write (0) as eqev (q+1) = e b (q) (1) which is of a form and structure identical to those of (). We may therefore apply to (1) the block red-black SO algorithm that we developed for (). That is, at each time step in (1) we iterate to convergence using block red-black SO. Eigenvalues and esults Using the work in [LHH9] as a guide, we determined the eigenvalues of the block Jacobi matrix one would use to solve (1). These eigenvalues may be computed using the following recipe k = k = k m c k = cos k r k = p + 0c k, c k (, p )[(, c k) + (8 + c k), 88 p ( p r k)] ( + p )(, c k) + ( + 9 p, p c k) + 18(10 + p, c k) k = (19, 9 p )[11(, c k ) + ( + c k ) + 8(,, 8c k p r k )] (19 + 9 p )(, c k ) + (1 + 99 p + 18c k + p c k ) + 18(1 + p, 9c k ) where k =1m, 1 and = h. Then form the sets f 1 mg = p p (, ) + (, ) ( + p ) + ( + p ) ( p p, 1) + (, ) ( p +1) + ( p +) + 1, 1 + m,1, m,1 and f 1 mg = p p (9, ) + 1(9, ) (9 + p ) + 1(9 + p ) (,+ p p ) + (,+ ) ( + p ) + ( + p, 1 + 1 m,1, + m,1 ) Now let jk =( j, j) c k for k =1m, 1 and j =1m. Then (J), the set of eigenvalues of the block Jacobi matrix, is (cf. [LHH9]) n [ = 1 jk (J) =f = j j=1mg q jk +j j j=1m k =1m, 1 o

LOCK ED-LACK SO FO A TWO-DIMENSIONAL PAAOLIC EQUATION 1 Given this recipe for the computation of eigenvalues of the Jacobi matrix, one can use the theory in [LHH9] and [You1] to compute! opt for the optimal block SO method. 00 0 00 iterations until convergence 0 00 10 100 0 0 1 1.1 1. 1. 1. 1. 1. 1. 1.8 1.9 omega Figure III results using red-black SO to solve the model parabolic equation For an example, we ran both the block Jacobi method and block SO method for various values of! on the parabolic problem. The boundary conditions and function H(x y t) were chosen such that u = x y (1 + e,t ). We chose m =, = 1 and let the code run over one time step, from t =0tot =t =01. The convergence criterion was that the innity norm of the residual vector had to be less than 10,.For the Jacobi method, iterations were required for convergence. The number of iterations needed for convergence of the SO method is illustrated in Figure III for various values of!. The value of! that gave usthe fewest number of iterations (namely iterations) was! = 1. This agrees well with the value of! opt given by the theory, namely! opt 18. 8 Summary Given the work of Lai et al., we developed herein a fast and parallelizable SO method for the numerical solution of Poisson's equation on the unit square with uniform mesh and Dirichlet boundary conditions. We then extended these techniques to the numerical solution It can also be shown that all these eigenvalues must have modulus less than unity, irrespective of the value of. Thus, the Jacobi method for the model parabolic problem must converge for any.

1 of a model parabolic equation. Our numerical results agree with our analytic results, showing that using our block red-block SO method on the parabolic equation with appropriately chosen relaxation factor! gives much faster results than does the block Jacobi method.

LOCK ED-LACK SO FO A TWO-DIMENSIONAL PAAOLIC EQUATION 1

eferences [Cel8] Celia M. A. (198) Collocation on Deformed Finite Elements and Alternating Direction Collocation Methods. PhD thesis, Princeton University. [LHH9] Lai Y.-L., Hadjidimos A., Houstis E. N., and ice J.. (199) On the Iterative Solution of Hermite Collocation Equations. SIAM J. Matrix Anal. Appl. 1 {. (Also Technical eport, Purdue University, 199). [Pap8] [Pic9] [You1] Papatheodorou T. S. (198) lock AO Iteration for Nonsymmetric Matrices. Math. Comp 1 11{. Piccirilli D. T. (199) Using the Collocation Method with Splines under Tension and Upstream Weighting to Solve the One-Dimensional Convection-Diusion Equation. Master's thesis, University of Vermont. Young D. M. (191) Iterative Solution of Large Linear Systems. Academic Press, New York.