Fast Fourier Transform Solvers and Preconditioners for Quadratic Spline Collocation

Fast Fourier Transform Solvers and Preconditioners for Quadratic Spline Collocation Christina C. Christara and Kit Sun Ng Department of Computer Science University of Toronto Toronto, Ontario M5S 3G4, Canada ccc,ngkit@cs.utoronto.ca submitted December 2000, revised October 2001, February 2002 Abstract Quadratic Spline Collocation (QSC) methods of optimal order of convergence have been recently developed for the solution of elliptic Partial Differential Equations (PDEs). In this paper, linear solvers based on Fast Fourier Transforms (FFT) are developed for the solution of the QSC equations. The complexity of the FFT solvers is Ç Æ ÐÓ Æ µ, where Æ is the gridsize in one dimension. These direct solvers can handle PDEs with coefficients in one variable or constant, and Dirichlet, Neumann and periodic boundary conditions, along at least one direction of a rectangular domain. General variable coefficient PDEs are handled by preconditioned iterative solvers. The preconditioner is the QSC matrix arising from a constant coefficient PDE. The convergence analysis of the preconditioner is presented. It is shown that, under certain conditions, the convergence rate is independent of the gridsize. The preconditioner is solved by FFT techniques, and integrated with one-step or acceleration methods, giving rise to asymptotically almost optimal linear solvers, with complexity Ç Æ ÐÓ Æ µ. Numerical experiments verify the effectiveness of the solvers and preconditioners, even on problems more general than the analysis assumes. The development and analysis of FFT solvers and preconditioners is extended to QSC equations corresponding to systems of elliptic PDEs. Keywords: spline collocation, elliptic boundary value problem, eigenvalue problem, fast Fourier transform, iterative solver, scaled Laplace preconditioner, system of PDEs. AMS Subject Classification: 65N22,65N25,65N35,65T50,65F10,65F15. 1 Introduction Optimal convergence order collocation methods based on smooth splines have been relatively recently developed [19], [4], for the solution of elliptic Boundary Value Problems (BVPs). These methods offer an alternative to Galerkin finite element methods as well as to Hermite spline collocation methods. It is known that collocation is a simple-to-implement and integration-free method. Its implementation is problem independent, requiring only one function evaluation per data point. Compared to collocation methods based on not-fully-smooth splines, e.g. Hermite cubic splines, smooth spline collocation methods give rise to smaller linear systems, since they use only one data point per subrectangle. Moreover, the linear systems arising from the so-called deferred-correction spline collocation methods are sparser than the respective ones from Hermite spline collocation and spline Galerkin methods. In the numerical solution of (multi-dimensional) BVPs, the larger part of the overall computation time is spent in solving the resulting linear system of equations. Therefore, the development of a fast solver associated with a new discretisation method is essential for the success of the method. A variety of solvers for spline collocation equations has been studied [18], [5], including acceleration techniques with various 1

preconditioners and domain decomposition, but the analysis has been carried out for only a few solvers and for restricted classes of PDE operators. For example, in [8], multigrid methods for quadratic splines are developed and proven to have rate of convergence independent of the problem size for a certain model problem. Hermite cubic spline collocation on the Gauss points gives fourth order of convergence approximations [11], [24]. Fast Fourier Transform (FFT) solvers for Hermite cubic spline collocation equations arising from Poisson s problem are developed in [3] and extended to PDEs with coefficients in one variable. These solvers are used as preconditioners for more general PDE problems with Dirichlet conditions in [1]. The convergence rate of the Richardson and Minimum Residual (MRES) preconditioned iterative methods is shown to be independent of the gridsize, by showing the spectral equivalence of the Hermite cubic spline operators corresponding to the Laplace and to a more general elliptic PDE operator with Dirichlet boundary conditions. In its standard formulation, Quadratic Spline Collocation (QSC) on the midpoints of a uniform rectangular partition gives second (sub-optimal) order of convergence approximations [4]. In [4], optimal QSC methods are derived and analyzed; more specifically, the QSC collocation approximation is fourth order on the nodes and midpoints of a uniform rectangular partition, and third order globally. The derivation of optimal QSC methods is based on appropriate perturbations of either the operator, giving rise to the so-called one-step or extrapolated QSC method, or of the right side, giving rise to the so-called two-step or deferred-correction QSC method. In [4], the QSC linear system arising from the two-step QSC method applied to PDEs with constant coefficients and even derivative terms is written in a tensor product form and formulae for the eigenvalues and eigenvectors of the matrix are derived. FFT solvers for the two-step QSC system arising from Helmholtz PDEs with constant coefficients are developed in [9]. The FFT is applied to both dimensions. Dirichlet, Neumann and periodic conditions are handled. The QSC linear system arising from general elliptic PDEs with variable coefficients is solved by preconditioned iterative methods, with the preconditioner being a diagonal scaling of the QSC matrix arising from a Helmholtz operator. The experiments show that the rate of convergence of the preconditioned iterative methods is independent of the grid size, however, no analysis is given. In this paper, we consider the analysis of the convergence rate and present a number of additional results. We summarize the relevant properties of the QSC matrix in Section 2, and the formulation of the FFT solvers for QSC equations arising from Helmholtz PDEs with constant coefficients in Section 3. We extend the FFT solvers to PDEs with coefficients in one variable, and boundary conditions of any type along the direction of the other variable, by applying the FFT to one dimension and tridiagonal solves to the other. We also consider a system of elliptic PDEs, and the optimal QSC discretisation method for this problem as developed in [22] and [7], and we formulate FFT solvers for the arising QSC matrix. These solvers can be easily extended to Ò Ò systems of elliptic PDEs. In Section 4, using a technique similar to the one used in [1] for Hermite cubic splines, we show that the QSC operators corresponding to the Laplace and to a general elliptic PDE operator without crossderivative term and with Dirichlet boundary conditions are spectrally equivalent, and that the preconditioned Richardson and MRES iterative methods applied to the QSC matrix arising from a general elliptic PDE without cross-derivative term with preconditioner the QSC matrix corresponding to the Laplace operator have convergence rate independent of the gridsize. We extend the results to a general system of elliptic PDEs. The technique used in [1] to show the spectral equivalence of the Hermite cubic spline operators is based, among other, on a discrete inner product defined to match the two-point Gauss quadrature rule, on various relations regarding orthogonal spline collocation operators shown in other papers [12], [23], [10], [2], and on the eigenvalues and eigenfunctions for a model discrete eigenvalue problem, as presented in 2

[3]. For quadratic splines (and other smooth splines), these results are not given, and the proof techniques used for these results in the case of orthogonal collocation are not directly applicable to spline collocation. (Note that the results in [12], [23] hold for ½ piecewise polynomials of degree Ö.) Reasons for this include the facts that spline collocation is not necessarily associated with orthogonal polynomials, and that it uses only one data point per subinterval. In general, there is less literature on the area of smooth spline collocation than on orthogonal ½ piecewise polynomial collocation, since optimal spline collocation methods have been relatively recently developed. Moreover, the fact that the basis functions used with quadratic (and other smooth) spline collocation are not nodal basis functions, i.e. the values of the coefficients do not represent function values at particular points of the grid, complicates matters even more. In Section 4, we fill a part of this literature gap. We prove a number of new mathematical results regarding the QSC operator, similar to those proven for the Hermite cubic spline operator in [12], [23], [10], [2], and we develop the eigenvalues and eigenfunctions for a model QSC eigenvalue problem. Finally, in Section 5, we present numerical results verifying the effectiveness of the solvers and of the preconditioners even when applied to PDEs more general than the ones assumed in the analysis. 2 Background Consider a BVP described by the operator equation ÄÙ Ù ÜÜ Ù ÜÝ Ù ÝÝ Ù Ü Ù Ý Ù for Ü Ýµ Å ¼ ½µ ¼ ½µ (1) where and are given functions of Ü and Ý, and Ù is the unknown function of Ü and Ý, subject to some boundary conditions on the boundary Å of Å. At each line of Å the boundary conditions may be any of the following types: homogeneous Dirichlet, homogeneous Neumann, or periodic. Note that some of the solvers that will be described are applicable to a wider range of boundary conditions (see Remark 1 of subsection 3.2 and Remark 3 of subsection 3.3). For brevity, in this section and subsection 3.1, we assume that the boundary conditions are homogeneous Neumann in the Ü direction and homogeneous Dirichlet in Ý, i.e. Ù Ü ¼ on Ü ¼ Ü ½ for ¼ Ý ½ (2) Ù ¼ on Ý ¼ Ý ½ for ¼ Ü ½ (3) Let Ü Ü Å ¼ Å and Ý Ý Æ ¼ Æ be uniform partitions of (0, 1) with step-sizes Ü ½ Å and Ý ½ Æ, respectively. We denote by Ë Ü and Ë Ý the quadratic spline spaces with respect to partitions Ü and Ý, respectively, constructed so that the splines satisfy the boundary conditions (2) and (3), respectively. The basis functions Ü ÜµÅ and ½ Ý ÝµÆ for Ë ½ Ü and Ë Ý, respectively, are generated through appropriate transformations of the model quadratic spline Üµ defined by Üµ Ü for ¼ Ü ½; Üµ Ü Ü for ½ Ü ; Üµ Ü Ü for Ü ; Üµ ¼ elsewhere, and appropriate adjustments to satisfy the boundary conditions. More specifically, let Ü Üµ ½ Ü Ü µ for ¼ Æ ½. Then Ü ½ Ü ½ Ü, ¼ Ü Ü, Å ½, Ü Å Ü Å Ü, Å ½ Ý ½ Ý ½ Ý ¼, Ý Ý, Æ ½ and Ý Æ Ý Æ Ý. Let Ü Æ ½ Ü ½ Ü µ, ½ Å and Ý Ý ½ Ý µ, ½ Æ be the midpoints of the partitions Ü and Ý, respectively. Let Ë Ë Ü Å Ë Ý be the approximating space for the BVP (1)-(3). This space has dimension ÅÆ. Note that any Ù Ë satisfies the boundary conditions by construction. The set of basis functions for Ë is chosen to be the tensor product Ü ÜµÝ ÝµÅÆ ½½ of quadratic B-splines in the Ü and Ý directions. µ, for ¼ Å ½, and Ý Ýµ ½ Ý Ý 3

The two-step optimal quadratic spline collocation (QSC) method [4] determines an approximation Ù Ë to Ù in two steps. In the first step, a bi-quadratic spline Í Ë is computed so that it satisfies (1), on the set Ì Ü Ý µ ½ Å ½ Æ of collocation points, i.e. ÄÍ on Ì (4) The approximation Í is of second order, i.e. non-optimal. In the second step, Ù Ë is computed so that it satisfies a perturbed operator equation, ÄÙ ÈÄÍ on Ì (5) where ÈÄ is a perturbation operator defined by stencils [4]. The resulting approximation Ù is of fourth order on the grid points and midpoints of the partition and third order globally, that is, it is of the same order as the bi-quadratic spline interpolant. The linear equations resulting from (4), if ordered according to the natural ordering (without loss of generality, first bottom-up, then left-to-right), give rise to a ÅÆ ÅÆ linear system Ü, where is a block-tridiagonal matrix with tridiagonal Æ Æ blocks, the right-side vector is a vector of values of Ü Ýµ at the collocation points, and Ü is the vector of unknown coefficients (degrees of freedom) of the finite element representation of the bi-quadratic spline approximation Í. It is instructive to note that equations (5) result in a linear system with the same matrix and a perturbed right-side vector. We next give the form of in (4) and (5) for specific cases of operators Ä. We will use the notation ØÖ ½ to denote a tridiagonal matrix whose all sub-, main- and super-diagonal elements are equal to scalars ½, and, respectively, except possibly a few elements which will be defined separately. If the operator Ä is of Helmholtz type with constant coefficients, i.e. the PDE is ÄÙ Ù ÜÜ Ù ÝÝ Ù in Å (6) where and are constants, then the QSC matrix takes the tensor product form where Ì Å ½ Ì Å Å Ì Æ Ì Å Ü Å Ì Æ Ý Ì Å Å Ì Æ µ (7) is a Å Å tridiagonal matrix of the form Ì Å and Ì Å µ ÅÅ ½, Ì Å Ì Å tridiagonal matrix of the form Ì Æ ØÖ½ ½, with Ì Å µ ½½ ½ is a Æ Æ Á Å, Á Å is the Å Å identity matrix, Ì Æ ØÖ½ ½, with Ì Æ µ ½½ and Ì Æ µ ÆÆ, Ì Æ Ì Æ Á Æ and Á Æ is the Æ Æ identity matrix. Note that the first superscript, or, of a matrix denotes the type of boundary conditions, Dirichlet or Neumann, respectively, inherited in the entries of the matrix. Now consider a more general type of operator Ä than the one in (6). Let Ä have coefficients in one variable and no first order terms with respect to the other variable. Without loss of generality, assume that the PDE is ÄÙ Ù ÜÜ Ù ÝÝ Ù Ý Ù in Å (8) where and are functions of Ý. Then the QSC matrix takes the tensor product form ½ ½ Ì Å Å Ì Æ ½ Ì Å Ü Å Ì Æ ½ Ì Å Å Ì Æ ¼ Ý Ý 4 ½ Å Ì Å Ì Æ µ (9)

where Ì Æ Ì Æ, Ì Æ Ì Æ, Ì Æ ¼ Ì Æ ¼ and Ì Æ and are Æ Æ diagonal matrices with the values of the functions and, respectively, on the points Ý, ½ Æ, and Ì Æ ¼ with Ì Æ ¼ µ ½½ ½ and Ì Æ ¼ µ ÆÆ ½. is a Æ Æ tridiagonal matrix of the form Ì Æ ¼ 3 FFT Solvers for Quadratic Spline Collocation Equations 3.1 Diagonalizations and Algorithms Ì Æ, and where,, ØÖ ½ ¼ ½, In this section, we describe two algorithms for the direct solution of the QSC equations arising from (6) and (8). Each algorithm is based on a certain diagonalization (or block-diagonalization) of the matrix of QSC equations. In [4] explicit formulae for the eigenvalues and eigenvectors of the matrices in (7) are derived. As noted in [9], both Ì Å and Ì Å can be diagonalized by the inverse of the Discrete Cosine Transform II (DCT-II) [20] matrix Å of size Å Å, and both Ì Æ and Ì Æ can be diagonalized by the inverse of the Discrete Sine Transform II (DST-II) [20] matrix Ë Æ of size Æ Æ, thus, the QSC matrix of (7) can be diagonalized by the inverse of Å Å Ë Æ. That is, where is a diagonal matrix with the eigenvalues of. If Å with the eigenvalues of Ì Å Å µ ½ Å Ë Æ µ ½ µ Å Å Ë Æ µ (10) and Ì Å diagonal matrices with the eigenvalues of Ì Æ and Å are Å Å diagonal matrices and Æ are Æ Æ, respectively, on the diagonal, takes the form, respectively, on the diagonal, and Æ and Ì Æ ½ Å Å Æ Å Ü Å Æ Ý Å Å Æ µ (11) The diagonalization (10) of the QSC matrix in (7) gives rise to an algorithm for computing the solution Ü ½, using the Fast Cosine Transform II (FCT-II), the Fast Sine Transform II (FST-II) and the respective inverse transforms, ifct-ii and ifst-ii. For describing this algorithm and the next one, we adopt a convenient notation from [20]. For any ÅÆ ½ vector, let Æ Å denote a Æ Å matrix with entries the components of laid out in Æ rows and Å columns, column-by-column. Also, for brevity and later convenience, we define the following modules: Module µ FCST(Å Æ ) Step 1: Perform FCT-II of size Å to each of the Æ columns of Æ Å µ Ì to obtain ½µ Å Æ Å Æ Å µ Ì. Step 2: Perform FST-II of size Æ to each of the Å columns of ½µ Å Æ µì to obtain µ Æ Å or equivalently, µ Å Å Ë Æ µ. ËÆ ½µ Å Æ µì, The above two steps require approximately ÅÆ ÐÓ Å and ÅÆ ÐÓ Æ real single flops [20], respectively. Hence, FCST(Å Æ ) requires approximately ÅÆ ÐÓ ÅÆµ real single flops. Module Ü ifcst(å Æ µ ) Step 1: Perform ifct-ii of size Å to each of the Æ columns of µ Æ Å µì to obtain µ Å Æ Å µ ½ µ Æ Å µì. Step 2: Perform ifst-ii of size Æ to each of the Å columns of µ Å Æ µì to obtain Ü Æ Å Ë Æ µ Å Æ µì, or equivalently, Ü Å µ ½ Å Ë Æ µ ½ µ µ. 5

The above two steps require approximately ÅÆ ÐÓ Å and ÅÆ ÐÓ Æ real single flops, respectively. Hence, ifcst(å Æ µ ) requires approximately ÅÆ ÐÓ ÅÆµ real single flops. We now give the algorithm for computing Ü ½ based on the diagonalization (10) of the QSC matrix in (7). Let be a ÅÆ ÅÆ diagonal matrix, with the eigenvalues of on the diagonal. Algorithm 2D-FFTQSC(Å Æ ) Step 1: Compute µ FCST Å Æ µ Å Å Ë Æ µ. Step 2: Compute µ ½ µ. Step 3: Compute Ü ifcst Å Æ µ µ Å µ ½ Å Ë Æ µ ½ µ µ Å µ ½ Å Ë Æ µ ½ µ ½ Å Å Ë Æ µ ½. The 2D-FFTQSC algorithm requires approximately ÅÆ ÐÓ ÅÆµ real single flops. We now consider the QSC matrix in (9). Since both Ì Å and Ì Å can be diagonalized by the inverse of the DCT-II matrix Å, the QSC matrix of (9) can be block-diagonalized by the inverse of Å Å Á Æ. That is, Å µ ½ Å Á Æ µ Å Å Á Æ µ (12) where ½ ½ Å Å Ì Æ ½ Å Ü Å Ì Æ ½ Å Å Ì Æ ¼ ½ Ý Ý Å Å Ì Æ µ (13) Note that the block-diagonal matrix consists of Å tridiagonal blocks of size Æ Æ. Also note that the form of in (7) is a sub-case of the form of in (9), therefore the block-diagonalization (12) holds for in (7) too. A second algorithm for computing Ü ½ based on the block-diagonalization (12) of the QSC matrix in (9) is now presented and turns out to be asymptotically twice as fast as the 2D-FFTQSC algorithm. Algorithm 1D-FFTQSC(Å Æ ) Step 1: Perform FCT-II of size Å to each of the Æ columns of Æ Å µ Ì to obtain ½µ Å Æ Å Æ Å µ Ì, or equivalently, ½µ Å Å Á Æ µ. Step 2: Solve the block-diagonal system µ ½µ, where is given in (13). Step 3: Perform ifct-ii of size Å to each of the Æ columns of µ Æ Å µì to obtain Ü Å Æ Å µ ½ µ Æ Å µì, or equivalently, Ü Å µ ½ Å Á Æ µ µ Å µ ½ Å Á Æ µ ½ Å Å Á Æ µ ½. The block-diagonal matrix consists of Å tridiagonal blocks of size Æ Æ, therefore Step 2 consists of solving Å tridiagonal systems of size Æ Æ, i.e. it requires approximately ÅÆ real single flops, of which ÅÆ are attributed to LU factorization and ÅÆ to back-and-forward (b/f) substitutions. Therefore, the 1D-FFTQSC algorithm requires approximately ÅÆ ÐÓ Åµ (+ lower order terms of ÅÆ) real single flops. If we assume that Å Æ, this is asymptotically twice as fast as the 2D-FFTQSC algorithm. However, for reasonable gridsizes, the advantage of the 1D over the 2D algorithm may not become visible, since the extra ÐÓ Æµ term of the 2D algorithm complexity is small. It is also worth noting that, if the boundary conditions are periodic in Ý, the block-diagonal matrix consists of blocks that are no longer tridiagonal, but almost tridiagonal (tridiagonal with corner entries), in which case about twice as many flops are needed for the solution of the blocks. It is further instructive to note that in both the 2D-FFTQSC and the 1D-FFTQSC algorithms, the intermediate data are accessed by rows and by columns, in an alternating way. For example, in Step 1 6

of 1D-FFTQSC, Æ Å is accessed by rows (Fourier transforms are applied to each of its rows), while in Step 2, the intermediate result ½µ Å Æ is accessed by columns (tridiagonal solves are applied to each of its columns). Algorithms 2D-FFTQSC and 1D-FFTQSC are the QSC equivalent to Algorithms I and II, respectively, in [3]. Algorithms I and II in [3] are given in a generic form and the Fourier diagonalization and block-diagonalization are given for Hermite cubic spline collocation. It is worth noting that, for Hermite cubic spline collocation, four times as many Fourier transforms are needed as for QSC, and that the system of Hermite cubic spline collocation equations is four times as large as the QSC system. Moreover, the matrix arising in Algorithm II from Hermite cubic spline collocation has 4 non-zero entries per row (it is penta-diagonal and almost block-diagonal) instead of 3 of the respective one from QSC. For three-dimensional problems, FFT solvers that apply Fourier transforms in one, two or three dimensions can be used. In the most effective form, Fourier transforms are applied to two dimensions and tridiagonal solves in the third dimension. Thus the 2D-FFTQSC can be incorporated into a three-dimensional solver that applies the 2D-FFTQSC to block-diagonalize the linear system and uses tridiagonal solves for the third dimension, giving rise to a Ç Æ ÐÓ Æµ asymptotic complexity. In [9], a three-dimensional solver that applies Fourier transforms in all three dimensions is developed. 3.2 Other boundary conditions The two algorithms given in the previous section were designed to handle boundary conditions that are homogeneous Neumann in the Ü direction and homogeneous Dirichlet in Ý. They can be easily adjusted to handle other boundary conditions that are Dirichlet, Neumann or periodic on any side of the rectangular domain. Further, algorithm 1D-FFTQSC can handle general conditions in one direction. For periodic conditions in any of (or both) the two directions, the one-dimensional matrices arising can be diagonalized by the inverse of the Discrete Fourier Transform (DFT) matrix, and the respective computation is implemented by the Fast Fourier Transform (FFT) and its inverse (ifft). Explicit formulae of the eigenvalues and eigenvectors for periodic conditions is given in [9]. When the 1D-FFTQSC algorithm is used for boundary conditions that are periodic in Ý and Dirichlet or Neumann in Ü, it is advisable to order the points, equations and unknowns first left-to-right, then bottom-up, so that the FFTs are applied to the Ý direction and the Ü direction is handled by the tridiagonal solves. In this way, the solution of the almost tridiagonal matrices is substituted by the solution of (purely) tridiagonal matrices. We now consider the case of Dirichlet conditions on one side and Neumann on the opposite side of the same direction, give explicit formulae for the eigenvalues and eigenvectors of the arising matrix, and derive FFT solvers for it. The study of the FFT solution of the linear system arising in this case of boundary conditions was motivated by [15]. Without loss of generality, we assume that the Dirichlet boundary is ordered first and the Neumann one last (Dirichlet-Neumann boundary conditions). The QSC matrix arising from the one-dimensional BVP µ ½½ and Ì ÆÅ µ ÅÅ ½. The eigenval- takes the form Ì ÆÅ ues of Ì ÆÅ are Ù ÜÜ for Ü ¼ ½µ Ù ¼µ ¼ Ù Ü ½µ ¼ ØÖ½ ½, with Ì ÆÅ and an orthonormal set of eigenvectors is Æ µ Ò ½µ Å Å ½ Å (14) ½µ ½µ Ò ½ Å Å ½ Å 7

We can show that Ì ÆÅ has a Fourier diagonalization of the form Ì ÆÅ Ô Å Ì Ë Å µ ½ Ê Ì µ ÆÅ ÊË Å Å µ (15) where ÆÅ denotes the Å Å diagonal matrix with the eigenvalues of Ì ÆÅ on the diagonal (given by (14)), and and Ê are Å Å extension and Å Å restriction matrices, respectively, defined by Á Å ¼ and Ê ½ ¼ ¼ ¼ ¼ ¼ ½ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ½ ¼ ¼ ¼ ¼ ¼ ½ ¼ ¼ ¼ ¼ ¼ ½ ¼ Thus the computation associated with the Dirichlet-Neumann boundary conditions can be handled by the FST-II and ifst-ii of double size, and the appropriate extension and restriction operations. This gives rise to an extra factor of 2 in the computational complexity of the algorithms. It should be noted that, if the 1D-FFTQSC algorithm is used for solving a problem with Dirichlet-Neumann boundary conditions in only one direction, this direction should be handled by the tridiagonal solves, with appropriate ordering of the points, equations and unknowns, thus avoiding the double size Fourier transforms. If the Neumann boundary is ordered first and the Dirichlet last, the eigenvectors are given by cosine formulae, and a diagonalization similar to (15) occurs, which uses the DCT-II matrix double size and its inverse, and slightly different extension and restriction operators. Remark 1. Algorithm 1D-FFTQSC is applicable not only to a wider range of PDE operators, but also to a wider range of boundary conditions, than algorithm 2D-FFTQSC. More specifically, the blockdiagonalization (12) and the applicability of algorithm 1D-FFTQSC are still valid, even if the boundary conditions in the Ý direction are not of the types listed in Section 2. Moreover, the roles of the Ü and Ý directions can be switched as follows. If,, and are functions of Ü, ¼, ¼, and the boundary conditions are of one of the types listed in Section 2 in the Ý direction and arbitrary in the Ü direction, with appropriate ordering of points, equations and unknowns, we can apply an algorithm similar to 1D-FFTQSC to solve the resulting linear system. 3.3 Extension to systems of PDEs We consider the extension of the FFT solvers to QSC equations arising from a system of linear second-order elliptic PDEs in two dimensions where, for ½ and ½, Ä ½½ Ä ½ Ä ½ Ä Ù Ú ½ in Å (16) Ä Ù Ù ÜÜ Ù ÜÝ Ù ÝÝ Ù Ü Ù Ý Ù (17) and are given functions of Ü and Ý, and Ù and Ú are the unknown functions of Ü and Ý. We assume that both Ù and Ú are subject to same boundary conditions, any of those listed in 8

Section 2. For simplicity, in this subsection we will assume that the boundary conditions are homogeneous Neumann in the Ü direction and homogeneous Dirichlet in Ý for both Ù and Ú, i.e. Ù Ü Ú Ü ¼ on Ü ¼ Ü ½ for ¼ Ý ½ (18) Ù Ú ¼ on Ý ¼ Ý ½ for ¼ Ü ½ (19) The optimal two-step QSC method applied to (16), (18)-(19) as described in [22], [7] gives rise to two linear systems. In the first step of the QSC method, a system Ü is to be solved, which, with appropriate ordering (block ordering), takes the block form ½½ ½ ½ Each submatrix, for ½ and ½, arises from the QSC discretization of the respective operator Ä of (16). In the second step of the QSC method, a linear system with the same matrix and perturbed right-side vector arises. If each of the operators Ä, for ½ and ½, is of Helmholtz type with constant coefficients, each of the blocks takes a tensor product form similar to that in (7) and the matrix assumes the block-diagonalization Ï ¼ ½½ ½ Ï ½ ¼ (21) ¼ Ï ½ ¼ Ï ½ where Ï Å µ ½ Å Ë Æ µ ½, and Ï ½ Ï, for ½ and ½, are diagonal matrices, that take a tensor product form as in (11). The matrix Ü ½ Ü ½½ ½ ½ can be reordered to give a block-diagonal matrix with blocks on the diagonal. It should be emphasized that (21) is not a point-diagonalization of the matrix in (20), as is (10) of the QSC matrix in (7), in the scalar PDE case. A point-diagonalization of the matrix in (20) is possible [22], [7], but it leads to an FFT algorithm that requires more flops than the FFT algorithm 2D-FFTQSC2 arising from (21) and described further in this section. If each of the operators Ä, for ½ and ½, has coefficients variable in Ý and no first order terms with respect to Ü, each of the blocks takes a tensor product form similar to that in (9) and the matrix assumes the block-diagonalization ¼ ½½ ½ ½ ¼ (23) ¼ ½ ¼ ½ where Å µ ½ Å Á Æ, and ½, for ½ and ½, are block-diagonal matrices with tridiagonal blocks on the diagonal, that can take a tensor product form similar to that in (13). The matrix ½½ ½ (24) ½ can be reordered to give a block-diagonal matrix with septa-diagonal blocks on the diagonal and at most 6 non-zero entries per row. The block-diagonalization (21) gives rise to an algorithm for computing the solution Ü ½, using the Fast Cosine Transform II (FCT-II), the Fast Sine Transform II (FST-II) and the respective inverse transforms, ifct-ii and ifst-ii. 9 ½ (20) (22)

Algorithm 2D-FFTQSC2(Å Æ ) Step 1: Compute ½µ Compute ½µ Step 2: Let ½µ ½µ ½ Let µ ½ Step 3: Compute µ ½ Compute µ ½ FCST Å Æ ½ µ Å Å Ë Æ µ ½. FCST Å Æ µ Å Å Ë Æ µ. µì ½µ µì Ì. Solve the system µ ½µ, where is given in (22). µ µ ½ ÅÆ Ì and µ µ µ ÅÆ ½ ÅÆ Ì. Let Ü Ü Ì ½ ÜÌ Ì. ifcst Å Æ µ ½ µ Å µ ½ Å Ë Æ µ ½ µ µ ½.. ifcst Å Æ µ µ Å µ ½ Å Ë Æ µ ½ µ µ The dominant part of the computation in the above algorithm are two performances of FCST and two of ifcst. The solution of µ ½µ in Step 2, is performed by reordering the rows and columns of and the rows of ½µ according to the alternating ordering [22], [7], so that becomes a block-diagonal matrix with blocks on the diagonal. Therefore, Step 2 requires Ç ÅÆµ flops, more precisely, approximately ÅÆ real single flops. Therefore, the 2D-FFTQSC2 algorithm requires approximately ½¼ÅÆ ÐÓ ÅÆµ (+ lower order terms of ÅÆ) real single flops, that is, about twice as much as the 2D-FFTQSC algorithm for a single Helmholtz PDE problem (6), (2)-(3). A second algorithm for computing Ü ½ based on the block-diagonalization (23) is now presented and turns out to be asymptotically twice as fast as the 2D-FFTQSC2 algorithm. Algorithm 1D-FFTQSC2(Å Æ ) Step 1: Perform FCT-II of size Å to each of the Æ columns of ½ µ Æ Å µ Ì to obtain ½µ ½ µ Å Æ Å ½ µ Æ Å µ Ì, or equivalently, ½µ ½ Å Å Á Æ µ ½. Perform FCT-II of size Å to each of the Æ columns of µ Æ Å µ Ì to obtain ½µ µ Å Æ Å µ Æ Å µ Ì, or equivalently, ½µ Å Å Á Æ µ. Step 2: Let ½µ ½µ ½ Let µ ½ µì ½µ µì Ì. Solve the system µ ½µ, where is given in (24). µ µ ½ ÅÆ Ì and µ µ µ ÅÆ ½ ÅÆ Ì. Step 3: Perform ifct-ii of size Å to each of the Æ columns of µ ½ µ Æ Å µ Ì to obtain Ü ½ µ Å Æ Å µ ½ µ ½ µ Æ Åµ Ì, or equivalently, Ü ½ Å µ ½ Å Á Æ µ µ ½. Perform ifct-ii of size Å to each of the Æ columns of µ µ Æ Å µ Ì to obtain Ü µ Å Æ Å µ ½ µ µ Æ Åµ Ì, or equivalently, Ü Å µ ½ Å Á Æ µ µ. Let Ü Ü Ì ½ ÜÌ Ì. The solution of µ ½µ in Step 2, is performed by reordering the rows and columns of and the rows of ½µ according to the alternating ordering [22], [7], so that becomes a block-diagonal matrix with septa-diagonal blocks on the diagonal. (In the implementation, we actually form as a septadiagonal matrix, and reorder the components of ½µ and µ appropriately.) Therefore, Step 2 requires Ç ÅÆµ flops, more precisely, approximately ÅÆ real single flops, of which ½ÅÆ are attributed to LU factorization and ÅÆ to b/f substitutions. Each of Steps 1 and 3 require ÅÆ ÐÓ Åµ real single flops. Therefore, the 1D-FFTQSC2 algorithm requires approximately ½¼ÅÆ ÐÓ Åµ (+ lower order terms of ÅÆ) real single flops, that is, about twice as much as the 1D-FFTQSC algorithm for a single Helmholtz PDE problem (6), (2)-(3), which is (relative to the single PDE case) optimal. If we assume that Å Æ, 1D-FFTQSC2 is asymptotically twice as fast as the 2D-FFTQSC2 algorithm. However, we note again that, for reasonable gridsizes, the advantage of the 1D over the 2D algorithm may not become visible, since the extra ÐÓ Æµ term of the 2D algorithm complexity is small, and the factor for the lower order terms of ÅÆ of the 1D algorithm is relatively large. 10

Remark 2. Both the 1D-FFTQSC2 and the 2D-FFTQSC2 algorithms can be extended in a straightforward way to Ò Ò systems of PDEs. Remark 3. In describing the algorithms, we have assumed that both Ù and Ú satisfy the same boundary conditions all along Å. While this assumption cannot be relaxed for algorithm 2D-FFTQSC2, algorithm 1D-FFTQSC2 is applicable in some other cases of boundary conditions. More specifically, if Ù and Ú satisfy the same boundary conditions of either Dirichlet, Neumann, periodic, or Dirichlet-Neumann type in the Ü-direction, and different in the Ý, then the diagonalization (23) is still valid, and the FFTs should be applied in the direction of the same boundary conditions, and the tridiagonal solves in the other. Moreover, the roles of Ü and Ý directions can be switched as explained in Remark 1 or subsection 3.2. Remark 4. When converting the biharmonic equation subject to certain boundary conditions into a system of two second-order PDEs, we obtain a special case of (16), with Ä ½½ Ä, Ä ½ ¼ and Ä ½, where is the Laplacian and is the identity operator, and where Ù and Ú satisfy Dirichlet or Neumann conditions. This system is decoupled and its solution can be obtained by solving two single PDEs. The single PDEs can be solved each by algorithm 1D-FFTQSC as long as the boundary conditions in one of the directions are among those listed in Section 2. (Algorithm 2D-FFTQSC is also applicable if the boundary conditions allow.) For the convergence analysis of QSC for systems of two PDEs see [7]. 4 Preconditioners for Quadratic Spline Collocation Equations In this section, we consider the solution of the QSC equations arising from general elliptic PDEs of the form (1) by preconditioned iterative methods. The analysis is carried out for the QSC equations arising from self-adjoint PDEs with homogeneous Dirichlet boundary conditions all along Å. It is then extended to non-self-adjoint PDEs without cross-derivative term. The analysis assumes that the preconditioner is the QSC operator arising from the Laplace operator and homogeneous Dirichlet conditions on Å, therefore, Ë Ü, the basis functions Ü, and consequently Ë are adjusted appropriately. 4.1 Quadrature relations Consider the BVP described by the operator equation (1), where Ä is a self-adjoint operator given by ÄÙ Ü ÝµÙ Ü µ Ü Ü ÝµÙ Ý µ Ý Ü ÝµÙ (25) and homogeneous Dirichlet boundary conditions on Å, i.e. Ù Ü Ýµ ¼ on Å (26) In the following, let µ denote the standard inner product, that is, Ê Ú Ûµ ÚÛ, where may be an one- or two-dimensional domain, and let Ä µ be the associated Ä norm. For any bounded functions Ú Ü Ýµ and Û Ü Ýµ, define the discrete pseudo-inner product Ú Ûµ ÜÝ by two equivalent formulae Ú Ûµ ÜÝ Æ ½ where Ú Ûµ Ü and Ú Ûµ Ý are defined by Ú Ûµ Ü Ý Ú Ý µ Û Ý µµ Ü Å ½ Å ½ Ü ÚÛµ Ü µ and Ú Ûµ Ý 11 Ü Ú Ü µ Û Ü µµ Ý Æ ½ Ý ÚÛµ Ý µ

Note that, for Ú Û Ë, Ú Ûµ ÜÝ is an inner product, since any bi-quadratic spline can be uniquely determined by its values on the collocation points [4], and a bi-quadratic spline is the zero one, if and only if its values on all the collocation points are zero. Therefore, Ë is a Hilbert space and µ ÜÝ the associated inner product. Using the Peano representation for the midpoint quadrature rule error applied to a function Ô Ü ½ Ü, ½ Å, we get Ü Ô Ü µ Ü Ü ½ ÔÜ Ü Ü ½ Ô ÜÜ Ã ÜµÜ (27) where Ã Üµ is the Peano kernel defined by Ã Üµ Ü ½ Üµ for Ü ½ Ü Ü and Ã Üµ Ü Üµ for Ü Ü Ü. It is easy to show that ¼ Ã Üµ Ü ½, for ½ Å, where ½ ½. Let Ä and be QSC operators from Ë into Ë corresponding to Ä in (25) and to (Laplacian), respectively. That is, Ä and are defined by Ä Úµ Ü Ý µ ÄÚ Ü Ý µ and Úµ Ü Ý µ Ú Ü Ý µ for ½ Å, ½ Æ. Our goal is to show that Ä is spectrally equivalent to, under the inner product µ ÜÝ. This is shown in Theorem 3, but to obtain this result we will need a number of other results which we show next. Lemma 1 Let Ô Ë Ü. Then ¼ Ô Ü Ä ¼½µ Ô ÜÜ Ôµ Ü. PROOF Using integration by parts and applying (27) to Ô ÜÜ Ô in each subinterval Ü ½ Ü, ½ Å, and summing up we have ¼ Ô Ü Ô Ü µ Ô ÜÜ Ôµ Ô ÜÜ Ôµ Ü Å ½ Ü Ü ½ Ô ÜÜ Ôµ ÜÜ Ã ÜµÜ Ô ÜÜ Ôµ Ü taking into account that, in each Ü ½ Ü, Ô ÜÜ Ôµ ÜÜ Ô ÜÜ ¼ and Ã Üµ ¼. QED. Theorem 1 Assume Ü Ýµ Åµ with respect to Ü and Ü Ýµ Åµ with respect to Ý, Ü Ýµ, and ¼ «Ü Ýµ Ü Ýµ for all Ü Ýµ Å. Then, for all Ú Û Ë, Ä Ú Ûµ ÜÝ ½ Ú Ûµ Ú Ûµ Ú Ûµ ÜÝ (28) where ½ Ú Ûµ ½ Û Úµ (29) «Ú Úµ ÜÝ ½ Ú Úµ Ú Úµ ÜÝ (30) Ú Ûµ Æ Ü Ý µ Ú Úµ ½ ÜÝ Û Ûµ ½ ÜÝ (31) and where is a positive constant independent of Ü and Ý, and Æ Ü Ý µ ÑÜ Ü ÑÜ Ü ½ ÜÜ ½ µ Ü ÜÜÜ ½ Ý ÑÜ Ý ½ ÝÝ ½ µ Ý ÝÝÝ ½ µ 12

PROOF For any ½ Æ, by applying (27) to Ú Ü µ Ü Ý µû Ý µ in each subinterval Ü ½ Ü, ½ Å, summing up and using integration by parts and Leibnitz s differentiation formula, we get Ú Ü µ Ü Ý µ Û Ý µµ Ü Ú Ü Ý µ Û Ü Ý µµ Å ½ Ü Ü ½ Ú Ü µ Ü Ûµ ÜÜ Ü Ý µã ÜµÜ Á ½ Ú Û Ý µ Á Ú Û Ý µ (32) where ½ Á ½ Ú Û Ý µ Ú Ü Û Ü µ Ü Ý µü «¼ Á Ú Û Ý µ Å ¼ ½ Ð½ ½Ñ¼Ò Ñ Ò Ð «ÐÑÒ Ü Å ½ Ü Ü ½ Ðµ Ü Ú Ñµ Ü Ü ½ Ú ÜÜ Û ÜÜ µ Ü Ý µã ÜµÜ Û Òµ Ü µ Ü Ý µã ÜµÜ where the constants «ÐÑÒ arise from Leibnitz s formula and are positive. From the definition of Á ½, the lower bound of, the positiveness of «ÐÑÒ and of the integrals, and the fact that Á ½ Ú Ú Ý µ ¼, we have Á ½ Ú Ú Ý ½ µ «ÚÜ Ü ««¼ ¼ Å ½ Ü Ü ½ Ú ÜÜ Ã ÜµÜ «Á ½ ½ Ú Ú Ý µ «Á ½ Ú Ú Ý µ «Ú ÜÜ Ý µ Ú Ý µµ Ü In a similar way, we can show an upper bound for Á ½ Ú Ú Ý µ. Thus «Ú ÜÜ Ý µ Ú Ý µµ Ü Á ½ Ú Ú Ý µ Ú ÜÜ Ý µ Ú Ý µµ Ü (33) From the definition of Á and the triangle and Cauchy-Schwarz inequalities, we have Á Ú Û Ý µ Ü Ü Å ½ Ð½ Ð½ ½Ñ¼Ò Ðµ Ü ½ Ñ Ò Ð ½Ñ¼Ò Ðµ Ü ½ Ñ Ò Ð Ú Ñµ Ü Ú Ñµ Ü Ý µ Ä Ü ½Ü µû Òµ Ü Ý µ Ä Ü ½Ü µ Ý µ Ä ¼½µÛ Òµ Ý µ Ä ¼½µ where is a positive constant arising from the constants «ÐÑÒ and ½. Using the inverse inequality Ú ÜÜ Ä ¼½µ Ü ½ Ú Ü Ä ¼½µ, for Ú Ë, where a positive constant independent of Ü, and the Poincaré inequality Ú Ä ¼½µ Ú Ü Ä ¼½µ, ¼, we have where Á Ú Û Ý µ Æ Ü µú Ü Ý µ Ä ¼½µÛ Ü Ý µ Ä ¼½µ (34) Æ Ü µ Ü ÑÜ Ü ½ ÜÜ ½ µ Ü ÜÜÜ ½ and a positive constant arising from, and. By applying Lemma 1, (34) becomes Á Ú Û Ý µ Æ Ü µ Ú ÜÜ Ý µ Ú Ý µµ½ Ü Û ÜÜ Ý µ Û Ý µµ½ Ü (35) By (32) and the definition of µ ÜÝ, we have Ú Ü µ Ü Ûµ ÜÝ Æ ½ Ý Ú Ü µ Ü Ý µ Û Ý µµ ½ Ü Ú Ûµ Ú Ûµ 13 Ü

where Æ Æ Ú ½ Ûµ Ý Á ½ Ú Û Ý µ and Ú Ûµ ½ By (33) and the definition of Á ½, we have Furthermore by (35), Lemma 1 and the inequality È ½ and Ø, we have Ú Ûµ Æ Ü µ Æ Ü µ By symmetry, we can show that ½ Ý Á Ú Û Ý µ ½ Ú Ûµ ½ Û Úµ (36) «Ú ÜÜ Úµ ÜÝ ½ Ú Úµ Ú ÜÜ Úµ ÜÝ (37) Æ ½ Æ ½ Ø ½ È µ ½ È Ø µ ½, for nonnegative scalars Ý Ú ÜÜ Ý µ Ú Ý µµ½ Ü Û ÜÜ Ý µ Û Ý µµ½ Ü Ý Ú ÜÜ Ý µ Ú Ý µµ Üµ ½ Æ ½ Ý Û ÜÜ Ý µ Û Ý µµ Üµ ½ Æ Ü µ Ú ÜÜ Úµ ½ ÜÝ Û ÜÜ Ûµ ½ ÜÝ (38) ½ Ú Ûµ ½ Û Úµ (39) «Ú ÝÝ Úµ ÜÝ ½ Ú Úµ Ú ÝÝ Úµ ÜÝ (40) Ú Ûµ Æ Ý µ Ú ÝÝ Úµ ½ ÜÝ Û ÝÝ Ûµ ½ ÜÝ (41) where Å Å Ú ½ Ûµ Ý Â ½ Ú Û Ü µ Ú Ûµ ½ ½ Â ½ Ú Û Ü µ Ú Ý Û Ý µ Ü ÝµÝ «¼ Â Ú Û Ü µ Æ ¼ ½ Ð½ ½Ñ¼Ò Ñ Ò Ð «ÐÑÒ Ý Æ ½ Ý Ý ½ ½ Ðµ Ý Ú Ñµ Ý Ý ½ Ý Â Ú Û Ü µ Ú ÝÝ Û ÝÝ µ Ü ÝµÃ ÝµÝ and Û Òµ Ý µ Ü ÝµÃ ÝµÝ Finally, let ½ ½ ½ and. Then Ä Ú Ûµ ÜÝ ½ Ú Ûµ Ú Ûµ Ü ÝµÚ Ûµ ÜÝ. Conditions (29) and (30) follow easily from (36), (37), (39) and (40). Condition (31) follows from (38), (41) and the inequality used in the derivation of (38). QED. Theorem 1 is the QSC counterpart of Theorem 3.1 in [1], which holds for Hermite cubic splines. 4.2 Eigenvalues and eigenfunctions of a QSC problem In this section, we obtain the eigenvalues and an orthonormal set of eigenfunctions for a model onedimensional QSC eigenproblem. The respective Hermite cubic spline results are found in [13], [14] and [3]. Consider the eigenvalue problem Ô ÜÜ Ü µ Ô Ü µ ½ Å Ô Ë Ü Ô ¼ (42) Ô ¼µ Ô ½µ ¼ (43) 14

Using the eigenvalue and eigenvector formulae for the QSC matrix as given in [4], it is easy to show that the eigenvalues for (42)-(43) are Ð Ü Ò Ð Ü µ Ò Ð Ü µ µ Ð ½ Å and the corresponding eigenfunctions are Ô Ð Üµ Ò Ð Ü µ Å ½ Õ Ð µ Ü Üµ Ð ½ Å where the vectors Õ Ð are defined by Ô Õ Ð µ Ò Ð Ü ½µµ ½ Å Ð ½ Å ½ and Õ Å µ Ò ½µµ ½ Å It is also easy to show that the eigenvalues are distinct and positive. Lemma 2 Let Ú Û Ë Ü. Then Ú ÜÜ Ûµ Ü Ú Û ÜÜ µ Ü PROOF As in the proof of Lemma 1 Ú ÜÜ Ûµ Ü Ú ÜÜ Ûµ and Ú Û ÜÜ µ Ü Ú Û ÜÜ µ Å Ü ½ Ü ½ Ü Å ½ Ü ½ Ú ÜÜ Ûµ ÜÜ Ã ÜµÜ ÚÛ ÜÜ µ ÜÜ Ã ÜµÜ The result of the Lemma now follows by noting that Ú ÜÜ Ûµ ÜÜ Ú ÜÜ Û ÜÜ ÚÛ ÜÜ µ ÜÜ for Ú Û Ë Ü, and that Ú ÜÜ Ûµ Ú Û ÜÜ µ due to the integration by parts rule. QED. Applying Lemma 2 in the Ü and Ý dimensions, we get that the QSC operator is self-adjoint. Lemma 3 The set Ô Ð Å Ð½ is orthonormal with respect to the inner product µ Ü. PROOF For Ð Ñ, we have and Ô Ð µ ÜÜ Ô Ñ µ Ü Ü Å ½ Ô Ð Ô Ñ µ ÜÜ µ Ü Ü Å ½ Å Ô Ð µ ÜÜ Ô Ñ µ Ü µ Ð Ü Ô Ð Ô Ñ µ Ü µ Ð Ô Ð Ô Ñ µ Ü ½ Å Ô Ð Ô Ñ µ ÜÜ µ Ü µ Ñ Ü Ô Ð Ô Ñ µ Ü µ Ñ Ô Ð Ô Ñ µ Ü By Lemma 2, Ô Ð µ ÜÜ Ô Ñ µ Ü Ô Ð Ô Ñ µ ÜÜ µ Ü. But as the eigenvalues are distinct we must have Ô Ð Ô Ñ µ Ü ¼, for Ð Ñ, which implies orthogonality. To show orthonormality, we have, for Ð Å, Õ Ì Ð Õ Ð È Å½ Ò Ð Ü ½µµ Å Å, while Õ Ì Å Õ Å È Å ½ Ò ½µµ Å. Let Ô Ð be the vector of values of Ô Ð Üµ on Ü, ½ Å. Then, Ô Ð Ì Å Õ Ð Ò Ð Ü µ µ Õ Ð. The orthonormality of Ô Ð, ½ Å, follows from the fact that Ô Ð Ô Ð µ Ü Ü Ô Ì Ð Ô Ð. QED. Any Ú Ë Ü can be written as Ú È Å ½ Ú Ô µ Ü Ô. The following theorem gives bounds for the QSC operator in terms of the identity operator. 15 ½

Theorem 2 For all Ú Ë, Ü Ý µ Ú Úµ ÜÝ Ú Úµ ÜÝ Ü Ý µ Ú Úµ ÜÝ (44) where Ü Ý µ ÑÒ Ü µ ÑÒ Ý µ Ü Ý µ ÑÜ Ü µ ÑÜ Ý µ ÑÒ µ Ò µ Ò µ µ and ÑÜ µ where Ü or Ý PROOF First, for Û Ë Ü, Û ÜÜ Ûµ Ü Å ½ Å Û Ô µ Ü Ô µ ÜÜ Å ½ ½ Å ½ Û Ô µ Ü Ô µ Ü Û Ô µ Ü Û Ô µ Ü Ô Ô µ Ü Since Û È Å Ûµ Ü ½ Û Ô µ Ü ¼, we have Then, for Ú Ë, Å ½ Å Å ½ ½ Û Ô µ Ü Û Ô µ Ü Ô µ ÜÜ Ô µ Ü Û Ô µ Ü Ô Ô µ Ü ÑÒ Ü µ Û Ûµ Ü Û ÜÜ Ûµ Ü ÑÜ Ü µ Û Ûµ Ü ÑÒ Ü µ Ú Úµ ÜÝ Ý Æ ½ Ý Æ ½ Ý Æ ½ ÑÒ Ü µ Ú Ý µ Ú Ý µµ Ü Ú Ý µ ÜÜ Ú Ý µµ Ü Ú ÜÜ Úµ ÜÝ Å ½ ÑÜ Ü µ Ú Ý µ Ú Ý µµ Ü ÑÜ Ü µ Ú Úµ ÜÝ By symmetry, ÑÒ Ý µ Ú Úµ ÜÝ Ú ÝÝ Úµ ÜÝ ÑÜ Ý µ Ú Úµ ÜÝ. Hence, Û Ô µ Ü Ü Ý µ Ú Úµ ÜÝ Ú ÜÜ Ú ÝÝ Úµ ÜÝ Ü Ý µ Ú Úµ ÜÝ QED. From Theorem 2, it is clear that is positive definite. Also, µ ½ exists and is unique. 4.3 Spectral equivalence of QSC operators The following theorem shows that the QSC operator Ä corresponding to Ä in (25) is spectrally equivalent to the QSC operator corresponding to the negative Laplacian. The Hermite cubic spline equivalent is Theorem 3.1 of [1]. For any two linear operators Ä ½ and Ä from Ë into Ë, the notation Ä ½ Ä means Ä ½ Ú Úµ ÜÝ Ä Ú Úµ ÜÝ Ú Ë. Theorem 3 Under the assumptions of Theorem 1, «Ü Ý µ Æ Ü Ý µ µ Ä 16 Ü Ý µ Æ Ü Ý µ µ (45)

and where Ä Ä µ µ ½ Ä Ä µ Æ Ü Ý µ µ (46) ÑÒ ¼ µ, ÑÜ ¼ µ, ÑÒ Ü Ýµ, ÑÜ Ü Ýµ and Æ Ü Ý µ are defined in Theorem 1, and Ü Ý µ in Theorem 2. PROOF By using the left inequality in (44), and Ú Úµ ÜÝ Ú Úµ ÜÝ Ú Úµ ÜÝ Ú Úµ ÜÝ Ü Ý µ Ú Úµ ÜÝ Ú Úµ ÜÝ Ú Úµ ÜÝ Ú Úµ ÜÝ Ü Ý µ Relation (45) now follows from Theorem 1. To show (46), consider Û µ ½ Ä Ä µú. From (28), (29) and (31) we have and hence Û Ä Ä µúµ ÜÝ Ú Ûµ Û Úµ Æ Ü Ý µ Ú Úµ ½ ÜÝ Û Ä which implies (46). QED. Û Ä Ä µúµ ÜÝ Æ Ü Ý µ Ú Úµ ÜÝ Ä µúµ½ ÜÝ 4.4 Preconditioned iterative methods for QSC Let be the quadratic spline interpolant of at the midpoints. In this section, using the spectral equivalence of Ä and, we formulate preconditioned iterative methods for solving Ä Ù, with preconditioner Õ, and convergence rate independent of Ü and Ý. Let µ ÜÝ be the standard norm in Ë, Ä ½ ÙÔ Ú¼ Ä½ ÚÚ be the induced norm Õ Ä µ be the energy norm (or Ä -norm) associated with the of operator Ä ½ from Ë to Ë, Ä self-adjoint and positive definite operator Ä from Ë to Ë and be the identity operator in Ë. Let us assume that Ü Ýµ «for Ü Ýµ Å, and let also the assumptions of Theorem 1 be valid. Relations (45) and (46) of Theorem 3 show that the operators Ä and ( and, respectively, in [1]) satisfy all the assumptions of Lemma 2.1 of [1], with ½ «Ü Ý µ Æ Ü Ý µ Ü Ý µ Æ Ü Ý µ Æ Ü Ý µ Note that, by using the fact that Ò Üµ Ü Ç Ü µ, Ü, we have ½ Ü Ý µ ½ µ Ç Ü Ý µ. Then, ½ «Ç Ü Ý µ Ç Ü Ý µ and Ç Ü Ý µ (47) Furthermore, since «, ½ ¼ for sufficiently small Ü and Ý, and ¼. Also note that the condition «guarantees that the eigenvalues of the QSC matrix arising from a constant coefficients Helmholtz operator are of the same sign [4]. Applying Lemma 2.1 of [1] with of the Lemma chosen to be gives ½ Ä ½ (48) 17

where ½ ½ µ ½ µ ½ µ ½ ½ µ ½ µ ½ Õ ½ (49) Relation (48) shows that is the bound for the norm of the iteration matrix of a one-step preconditioned iterative method applied to Ä Ù with preconditioner and scaling factor. Thus is the rate of convergence of the respective preconditioned Richardson and MRES methods as shown in Theorems 2.1 and 2.2 in [1]. More specifically, Ù µ Ù Ù ¼µ Ù (50) where Ù µ È Å ½ È Æ½ Ü µ Ü Ý is the bi-quadratic spline approximation computed at the -th iteration of Richardson s method applied to Ä Ù, with (symmetric) preconditioner and scaling factor, where and are given by (49). A similar relation can be shown for the MRES iterates and for the Ä µ ½ Ä µ-norm of the error. We next show that is asymptotically independent of Ü and Ý. We also predict an approximation to and the optimum scaling parameter for Richardson s method according to Lemma 2.1 in [1]. From (47) and (49) we have «µ µ Ç Ü Ý µ and «µ µ «µ µ Ç Ü Ý µ (51) It is interesting to note that the approximations for and obtained from (51) by disregarding the Ç Ü Ý µ terms are exactly the same as those in [1] for Hermite cubic spline collocation. It is also worth noting that, if we make certain assumptions for the signs of and, tighter bounds for Ä than those in (45) can be obtained. For example, if ¼, «Ü Ý µ Æ Ü Ý µ µ Ä Ü Ý µ Æ Ü Ý µ µ (52) However, (52) leads to a convergence rate improved by only Ç Ü Ý µ compared to in (51), since ½ Ü Ý µ Ç Ü Ý µ. 4.5 À ½ norms of the QSC approximation error Relation (50) gives the rate of convergence in the µ-norm of the error in the bi-quadratic spline approximations Ù µ È Å ½ È Æ½ Ü µ Ü Ý to Ù È Å ½ È Æ½ Ü Ü Ý computed by the preconditioned Richardson method. We will now obtain a result in the À ½ -norm of the error. For Hermite cubic splines, the equivalence of the and À ½ -norms follows easily from earlier work on Hermite cubic spline collocation, mainly [12] and [23]. In the case of QSC, we need to show first several results. Lemma 4 Let Ô Ë Ü. Then Ô ÜÜ Ôµ Ü Ô Ü Ä ¼½µ. PROOF Similarly as in the proof of Lemma 1, by applying (27) to Ô Ü Ô Ü in each subinterval Ü ½ Ü, ½ Å, and summing up we have ¼ Ô Ü Ô Ü µ Ü Ô Ü Ô Ü µ Å ½ 18 Ü Ü ½ Ô Ü µ ÜÜÃ ÜµÜ

Thus Ô Ü Ô È Å Ê Ü Ü µ ½ Ü ½ Ô Ü µ ÜÜÃ ÜµÜ. Since Ô is a quadratic in Ü ½ Ü, Ô Ü µ ÜÜ Ô ÜÜ Ô ÜÜÔµ ÜÜ. Thus Ô ÜÜ Ôµ Ü Ô Ü Ô Ü µ ÈÅ Ê Ü ½ Ü ½ Ô È ÜÜÔµ ÜÜ Ã ÜµÜ Ô Ü Ô Ü µ ½ Å½ Ê Ü Ü ½ Ô Ü µ ÜÜÃ ÜµÜ Ô Ü Ô Ü µ. QED. We establish bounds on the quadratic spline basis functions similar to those shown in Lemma 5.4 and Theorem 5.5 of [23], using the technique of that paper. Lemma 5 For ½ Å, ½ Å, Ü Ä ½ Ü ½Ü µ. PROOF Using the form of the quadratic spline basis functions, we have Ü Ä ½ Ü ½Ü µ ½ ÑÜ½ ½ ¼, which leads to the desired result. QED. Corollary 1 For ½ Å, ½ Å, Ü Ü µ Ü. PROOF Without loss of generality, let. Using Lemma 5, and a technique similar to that used for the proof of Theorem 5.5 in [23], we have Ü Ü µ È Å ½ Ê Ü Ü ½ Ü Ü Ü Ü È Å½ µ Ü µ µ. Now using the inequality Ð µð, which holds for any integer Ð ¼, we get the desired result. QED. The following lemma holds for quadratic splines and does not have a Hermite cubic spline equivalent, since quadratic splines are non-nodal, that is, the degrees of freedom of a quadratic spline do not represent values of the spline at some particular points. The lemma is needed in the proof of Theorem 4. Lemma 6 Let Ü, ½ Å, be the coefficients (degrees of freedom) of the finite element representation of a quadratic spline Ô Ë Ü. Then Å ½ Ü Å ½ Ô Ü µµ PROOF Given that Ô Ü ½ µ ½ Ü ½ Ü µ, Ô Ü µ Ü ½ ½ Ü Ü ½ µ, Å ½, and Ô Å Ü µ ½ Ü Å ½ Ü Å µ, the proof of this lemma, though tedious, uses only simple calculations. QED. Theorem 4 Let Ô Ë Ü. Then ½ Ô Ôµ Ü Ô Ä ¼½µ Ô Ôµ Ü. PROOF The left inequality is shown by considering an arbitrary subinterval Ü ½ Ü µ, ½ Å, and showing that Ê ½ Ü Ô Ü µµ Ü Ü ½ Ô Üµµ Ü by doing simple calculations, since Ô is a quadratic. To show the right inequality, consider the finite element representation of Ô È Å ½ Ü Ü. Then, using Corollary 1 and a technique similar È to that used for the proof of Theorem 5.5 in [23], we have Ô Ä ¼½µ Å È Å½ ½ Ü Ü Ü Ý µ È Å½ È Å½ Ü Ü Ü È Å½ È Ü Å½ Ü Ü µ È Å½ µ Ü È Å½ Ü. Now using Lemma 6, we get the desired result. QED. È Ü Å½ Ü We now extend Lemma 4 to two dimensions. In order to do this, we follow the approach in [23] and define a semidiscrete norm in Ë by Û Æ ½ ½ Å ½ Ý Û Ü Ü Ý µü Ü Û Ý Ü ÝµÝ Note that is a seminorm in the space of differentiable functions in Å. ¼ 19 ½ ¼

Corollary 2 Let Û Ë. Then ½ Û ÖÛ Ä Åµ Û. PROOF The proof follows from Lemma 4 applied to both the Ü and Ý dimensions. QED. The following theorem shows the equivalence of the - and À ½ -norms in the bi-quadratic spline space. Theorem 5 Let Û Ë. Then, for some positive constants and independent of, Û À½ Åµ Û Û À½ Åµ PROOF Consider the left inequality. By the definition of the µ ÜÝ, µ Ü and µ Ý inner products and by Lemma 1, we have Û Û ÜÜ Ûµ ÜÝ Û ÝÝ È Æ Ûµ ÜÝ ½ Ý Û ÜÜ Ûµ Ü Ý µ ÈÅ È ½ Ü Û ÝÝ Ûµ Ý Ü µ Æ ½ Ý Û Ü Û Ü µ Ý µ ÈÅ ½ Ü Û Ý Û Ý µ Ü µ Û. Employing Corollary 2, then the Poincaré inequality, we get Û way, using Lemma 4 instead of Lemma 1. QED. Having the equivalence of the - and À ½ -norms for bi-quadratic splines and using (50), we get ÖÛ Ä Åµ Û À½ Åµ. The right inequality can be shown in a similar Ù µ Ù À½ Åµ Ù ¼µ Ù À½ Åµ (53) where Ù µ È Å ½ È Æ½ Ü µ Ü Ý is the bi-quadratic spline approximation computed at the -th iteration of Richardson s method applied to Ä Ù, with preconditioner and scaling factor, where and are given by (51), and is a constant independent of,, and. We can obtain the relation (53) for the preconditioned MRES iterates too, by first establishing the spectral equivalence of the and Ä µ ½ Ä operators as in Lemma 3.1 of [1], then employing Theorem 2.2 in the same paper. 4.6 Non-self-adjoint operators In this section, we extend the result of Theorem 3 to non-self-adjoint operators, without cross-derivative term. Consider the BVP described by the operator equation (1), where Ä is given by ÄÙ Ü ÝµÙ Ü µ Ü Ü ÝµÙ Ý µ Ý Ü ÝµÙ Ü Ü ÝµÙ Ý Ü ÝµÙ (54) and homogeneous Dirichlet boundary conditions on Å (26). Let Ä be QSC operator from Ë into Ë corresponding to Ä in (54). Theorem 6 Assume Ü Ýµ and Ü Ýµ Åµ with respect to Ü and Ü Ýµ and Ü Ýµ Åµ with respect to Ý, Ü Ýµ, and ¼ «Ü Ýµ Ü Ýµ for all Ü Ýµ Å. Then, for all Ú Û Ë, «Ü Ý µ Æ µ Ä Ü Ý µ Æ µ (55) and Ä Ä µ µ ½ Ä Ä µ Æ Ü Ý µ Ü Ý µ µ (56) 20

where is a positive constant independent of Ü and Ý, Æ Ü Ý µ is defined in Theorem 1, Ü Ý µ in Theorem 2, and ÑÒ ¼ ÑÒ Ü Ýµµ ÑÜ ¼ ÑÜ Ü Ýµµ Ü Ý µ ÑÜ Ü ÑÜ ½ Ü ½ µ Ü ÑÜ ÜÜ ½ ÜÜÜ ½ µ Ý ÑÜ ½ Ý ½ µ Ý ÑÜ ÝÝ ½ ÝÝÝ ½ µµ ÑÜ ½ Ü ½ ½ Ý ½ µ where Ü Ýµ Ü Ýµ PROOF We can rewrite ÄÙ to get Ü Ü Ýµ Ý Ü Ýµ. ÄÙ Ü ÝµÙ Ü µ Ü Ü ÝµÙ Ý µ Ý ½ Ü ÝµÙ Ü Ü ÝµÙµ Ü Ü ÝµÙ Ý Ü ÝµÙµ Ý Ü ÝµÙ For any ½ Æ, by applying (27) to Ú Ü Úµ Ü µûµ Ü Ý µ in each subinterval Ü ½ Ü, ½ Å, summing up and using integration by parts and Leibnitz s differentiation formula, we get Ú Ü Úµ Ü µ Ý µ Û Ý µµ Ü Ú Ü Ý µ Û Ý µµ Å where ½ Úµ Ü Ý µ Û Ý µµ Å ½ Á Ú Û Ý µ Úµ Ü Ûµ Ü Ý µü Á Ú Û Ý µ Å ¼ Ü ½ Ü ½ Ü Ú Ü Ûµ ÜÜ Ü Ý µã ÜµÜ Ü ½ Úµ Ü Ûµ ÜÜ Ü Ý µã ÜµÜ Á Ú Û Ý µ Á Ú Û Ý µ (57) ¼ÑÒ ½ Ð½ Ñ Ò Ð «ÐÑÒ Ü ½ ¼ Ûµ Ü Úµ Ü Ý µü Ðµ Ü Ú Ñµ Ü Ü ½ Û Òµ Ü µ Ü Ý µã ÜµÜ By using (57) and a similar approach to that in the proof of Theorem 1, we can show that Ä Ú Ûµ ÜÝ where the bilinear forms ½ and satisfy (29)-(31) and ½ Ú Ûµ Ü ÝµÚ Ûµ Ú Úµ ¼ Ú Ûµ Ú Úµ ½ ÜÝ Û Ûµ ½ ÜÝ Ú Ûµ Ü Ý µ Ú Úµ ½ ÜÝ Û Ûµ ½ ÜÝ The proof of the inequalities (55) and (56) now follows in a similar way to that of the proof for the inequalities (45) and (46) in Theorem 3. QED. Theorem 6 is the QSC counterpart of Theorem 4.1 in [1]. Note that, for QSC as for Hermite cubic spline collocation, we cannot predict an approximation to the optimal scaling factor for Richardson s method, when the PDE operator is non-self-adjoint. Theorem 6 allows relations (50) and (53) to hold even on QSC equations arising from (54), assuming that the condition Ü Ýµ «is replaced by Ü Ýµ «, where is defined in Theorem 6. 21