Poisson Solvers. William McLean. April 21, Return to Math3301/Math5315 Common Material.

Similar documents
Finite Difference Methods for Boundary Value Problems

Finite difference method for elliptic problems: I

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)

MTH 464: Computational Linear Algebra

Solving linear systems (6 lectures)

Solving Dense Linear Systems I

Linear Algebra and Matrix Inversion

A Review of Matrix Analysis

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

The Solution of Linear Systems AX = B

Appendix C: Recapitulation of Numerical schemes

Numerical Solutions to Partial Differential Equations

Lecture4 INF-MAT : 5. Fast Direct Solution of Large Linear Systems

1 GSW Sets of Systems

16. Solution of elliptic partial differential equation

Linear System of Equations

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim

Chapter 4 No. 4.0 Answer True or False to the following. Give reasons for your answers.

Section 9.2: Matrices. Definition: A matrix A consists of a rectangular array of numbers, or elements, arranged in m rows and n columns.

9. Iterative Methods for Large Linear Systems

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

Solving Linear Systems of Equations

Matrices: 2.1 Operations with Matrices

Scientific Computing

Numerical Solution Techniques in Mechanical and Aerospace Engineering

Numerical Analysis of Differential Equations Numerical Solution of Elliptic Boundary Value

Lecture 2 INF-MAT : A boundary value problem and an eigenvalue problem; Block Multiplication; Tridiagonal Systems

AMS 147 Computational Methods and Applications Lecture 17 Copyright by Hongyun Wang, UCSC

Numerical Methods for Partial Differential Equations

Introduction. Vectors and Matrices. Vectors [1] Vectors [2]

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

LU Factorization. LU Decomposition. LU Decomposition. LU Decomposition: Motivation A = LU

Phys 201. Matrices and Determinants

Fundamentals of Engineering Analysis (650163)

Kasetsart University Workshop. Multigrid methods: An introduction

Numerical Linear Algebra

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

Name: INSERT YOUR NAME HERE. Due to dropbox by 6pm PDT, Wednesday, December 14, 2011

9. Numerical linear algebra background

MODULE 7. where A is an m n real (or complex) matrix. 2) Let K(t, s) be a function of two variables which is continuous on the square [0, 1] [0, 1].

Elementary Linear Algebra

Lab 1: Iterative Methods for Solving Linear Systems

Chapter Two: Numerical Methods for Elliptic PDEs. 1 Finite Difference Methods for Elliptic PDEs

Linear Algebra. Matrices Operations. Consider, for example, a system of equations such as x + 2y z + 4w = 0, 3x 4y + 2z 6w = 0, x 3y 2z + w = 0.

Finite Difference Methods (FDMs) 1

CS412: Lecture #17. Mridul Aanjaneya. March 19, 2015

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

n 1 f n 1 c 1 n+1 = c 1 n $ c 1 n 1. After taking logs, this becomes

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Direct Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le

Numerical Solutions to Partial Differential Equations

Matrix Assembly in FEA

GAUSSIAN ELIMINATION AND LU DECOMPOSITION (SUPPLEMENT FOR MA511)

Introduction to PDEs and Numerical Methods Lecture 7. Solving linear systems

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

ACM106a - Homework 2 Solutions

Lecture 9 Approximations of Laplace s Equation, Finite Element Method. Mathématiques appliquées (MATH0504-1) B. Dewals, C.

Lecture 1 INF-MAT3350/ : Some Tridiagonal Matrix Problems

Lecture 2 INF-MAT : , LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

Matrices BUSINESS MATHEMATICS

Lecture Notes in Linear Algebra

Gaussian Elimination and Back Substitution

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

9. Numerical linear algebra background

3.5 Finite Differences and Fast Poisson Solvers

Matrices and Determinants

Introduction to PDEs and Numerical Methods Tutorial 5. Finite difference methods equilibrium equation and iterative solvers

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Linear Equations and Matrix

Solving Linear Systems Using Gaussian Elimination. How can we solve

Boundary Value Problems and Iterative Methods for Linear Systems

Notes for CS542G (Iterative Solvers for Linear Systems)

Pivoting. Reading: GV96 Section 3.4, Stew98 Chapter 3: 1.3

CS513, Spring 2007 Prof. Amos Ron Assignment #5 Solutions Prepared by Houssain Kettani. a mj i,j [2,n] a 11

Prepared by: M. S. KumarSwamy, TGT(Maths) Page

L. Vandenberghe EE133A (Spring 2017) 3. Matrices. notation and terminology. matrix operations. linear and affine functions.

A Simple Compact Fourth-Order Poisson Solver on Polar Geometry

Practical in Numerical Astronomy, SS 2012 LECTURE 9

CS100: DISCRETE STRUCTURES. Lecture 3 Matrices Ch 3 Pages:

1 Positive definiteness and semidefiniteness

Discretization of PDEs and Tools for the Parallel Solution of the Resulting Systems

Time-dependent variational forms

Numerical Analysis and Methods for PDE I

Introduction to Boundary Value Problems

This ensures that we walk downhill. For fixed λ not even this may be the case.

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

Solution of Linear Equations

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Operations On Networks Of Discrete And Generalized Conductors

(f(x) P 3 (x)) dx. (a) The Lagrange formula for the error is given by

Linear Algebra. Solving Linear Systems. Copyright 2005, W.R. Winfrey

Gaussian Elimination without/with Pivoting and Cholesky Decomposition

Math 260: Solving the heat equation

Lecture 1 INF-MAT : Chapter 2. Examples of Linear Systems

Computer Problems for Fourier Series and Transforms

Math113: Linear Algebra. Beifang Chen

Transcription:

Poisson Solvers William McLean April 21, 2004 Return to Math3301/Math5315 Common Material 1 Introduction Many problems in applied mathematics lead to a partial differential equation of the form a 2 u + b u + cu = f in Ω (1) Here, Ω is an open subset of R d for d = 1, 2 or 3, the coefficients a, b and c together with the source term f are given functions on Ω and we want to determine the unknown function u : Ω R The definition of the vector differentiation operator means that, on the left hand side, b u = d i=1 b i u x i and 2 u, the Laplacian of u, is given by 2 u = ( u) = div grad u = d i=1 2 u x 2 i In addition to satisfying (1) the solution must obey some boundary conditions on Ω, the boundary of Ω The simplest is a Dirichlet boundary condition: u = g on Ω, (2) for a given function g A Neumann boundary condition takes the form u n where n is the outward unit normal to Ω = g on Ω, (3) 1

We will assume that (1) is uniformly elliptic This means that there exist positive constants a min and a max such that the coefficient a satisfies a min a(x) a max for all x Ω The simplest example is Poisson s equation, which arises when a is a positive constant, b = 0 and c = 0: a 2 u = f in Ω (4) An elliptic PDE like (1) together with suitable boundary conditions like (2) or (3) constitutes an elliptic boundary value problem The main types of numerical methods for solving such problems are as follows Finite difference methods are the simplest to describe and the easiest to implement provided the domain Ω has a reasonably simple shape; Finite element methods are capable of handling very general domains; Spectral methods can achieve very high accuracy but are practical only for very simple domains We will discuss only the first of these methods 2 Central Difference Approximations in 1D In the 1-dimensional case (d = 1), the set Ω is just an open interval and the PDE (1) reduces to an ODE: au + bu + cu = f in Ω (5) We assume Ω = (0, L) and then divide this interval into N subintervals, each of length h = L/N, by defining grid points x i = ih for i = 0, 1, 2,, N The next Lemma shows that the finite-difference approximations are accurate to O(h 2 ) u (x) u (x) u(x + h) u(x h), 2h u(x h) 2u(x) + u(x + h), h 2 2 (6)

Lemma 1 If u is sufficiently differentiable in a neighbourhood of x then, as h 0, u(x + h) u(x h) 2h u(x h) 2u(x) + u(x + h) h 2 Proof Make the Taylor expansion = u (x) + u (x) h 2 + O(h 4 ), 6 (7) = u (x) + u(4) (x) h 2 + O(h 4 ) 12 (8) u(x+h) = u(x)+u (x)h+ u (x) 2 h 2 + u (x) 3! and then replace h with h to obtain u(x h) = u(x) u (x)h+ u (x) 2 h 2 u (x) 3! h 3 + u(4) (x) 4! h 3 + u(4) (x) 4! Subtracting these two expansions gives [ u(x + h) u(x h) = 2 u (x)h + u (x) 3! h 4 + u(5) (x) 5! h 4 u(5) (x) 5! ] h 3 + O(h 5 ) from which (7) follows, whereas by adding them we see that [ ] u(x + h) + u(x h) = 2 u(x) + u (x) h 2 + u(4) (x) h 4 + O(h 6 ) 2 4! and so (8) holds h 5 +O(h 6 ) h 5 +O(h 6 ) Applying the approximations (6) to the ODE (5) we arrive at a finite difference equation a i u i 1 + 2u i u i 1 h 2 Here, and, we hope, + b i u i+1 u i 1 2h + c i u i = f i for 1 i N 1 (9) a i = a(x i ), b i = b(x i ), c i = c(x i ), f i = f(x i ) u i u(x i ) At this stage we have N + 1 unknowns u 0, u 1,, u N but only N 1 equations (9) To arrive at a square linear system we need two more equations, which will come from the boundary conditions at x = 0 and at x = 1 For simplicity, suppose that we have a Dirichlet problem, ie, u(0) = g(0) and u(l) = g(l) 3

We therefore set leaving an (N 1) (N 1) system u 0 = g 0 and u N = g N, Ku = f + g whose solution is the vector of unknowns u 1 u 2 u = The coefficient matrix has the form with diagonal matrices u N 1 K = AD (2) + BD (1) + C (10) A = diag(a 1, a 2,, a N 1 ), B = diag(b 1, b 2,, b N 1 ), and tri-diagonal matrices D (1) = 1 2h C = diag(c 1, c 2,, c N 1 ), 0 1 1 0 1 1 0 1 2 1 1 2 1 D (2) = 1 h 2 The vectors on the right hand side are f 1 f 2 f 3 f = f N 2 f N 1 1 2 1 and g = 1 2h, 1 0 1 1 0 1 2 1 1 2 b 1 g 0 0 0 0 b N 1 g N 4

3 The Five-Point Discrete Laplacian In this section we will treat a simple, two-dimensional problem (d = 2) Consider a rectangular domain in R 2, Ω = (0, L 1 ) (0, L 2 ) We choose positive integers N 1 and N 2, define step sizes in the horizontal and vertical directions, and introduce the grid points h 1 = L 1 N 1 and h 2 = L 2 N 2, x ij = (ih 1, jh 2 ) for 0 i N 1 and 0 j N 2 Let us solve Poisson s equation (4) subject to a Dirichlet boundary condition (2) We write u ij u(x ij ), f ij = f(x ij ), g ij = g(x ij ), and note that, in view of the second approximation in (6), 2 u x 2 1 2 u x 2 2 u(x 1 h 1, x 2 ) 2u(x 1, x 2 ) + u(x 1 + h 1, x 2 ), h 2 1 u(x 1, x 2 h 2 ) 2u(x 1, x 2 ) + u(x 1, x 2 + h 2 ), h 2 2 so ( 2 u)(x ij ) may be approximated by the discrete Laplacian 2 hu ij = u i 1,j 2u ij + u i+1,j h 2 1 + u i,j 1 2u ij + u i,j+1, h 2 2 where h = (h 1, h 2 ) The finite difference stencil for 2 h consists of five points arranged in a star shaped like a + sign Denote the set of interior grid points by Ω h = { (ih 1, jh 2 ) : 1 i N 1 1, 1 j N 2 1 } and the set of boundary grid points by Ω h = { (ih 1, jh 2 ) : i = 0, 0 j N 2 } { (ih 1, jh 2 ) : i = N 1, 0 j N 2 } { (ih 1, jh 2 ) : 1 i N 1 1, j = 0 } { (ih 1, jh 2 ) : 1 i N 1 1, j = N 2 } 5

Our discrete approximation to (4) and (2) may then be written as a 2 hu ij = f ij for x ij Ω h, u ij = g ij for x ij Ω h This discrete problem is equivalent to an M M linear system, where In fact, if we put u 1j u 2j u j = u N1 1,j M = (N 1 1)(N 2 1) and f j = f 1j f 2j f N1 1,j for 1 j N 2 1, (11) then where with and ad (2) h 1 u 1 + a 2u 1 u 2 h 2 2 ad (2) h 1 u j + a u j 1 + 2u j u j+1 h 2 2 ad (2) h 1 u N2 1 + a u N 2 2 + 2u N2 1 h 2 2 g j = a h 2 1 g 0j 0 0 g N1,j g 1 = a h 2 1 g N2 1 = a h 2 1 g 01 0 0 g N1,1 g 0,N2 1 0 0 g N1,N 2 1 = f 1 + g 1, = f j + g j, 2 j N 2 2, = f N2 1 + g N2 1, for 2 j N 2 2 + a h 2 2 6 + a h 2 2 g 10 g 20 g N1 2,0 g N1 1,0 g 1,N2 g 2,N2 g N1 2,N 2 g N1 1,N 2

Thus, if we define the M-dimensional vectors u 1 f 1 u 2 u =, f = f 2, g = then u N2 1 f N2 1 g 1 g 2 g N2 1, Ku = f + g, (12) where, with I 1 denoting the (N 1 1) (N 1 1) identity matrix, D (2) h 1 2I D (2) 1 I 1 h 1 K = a + a I 1 2I 1 I 1 D (2) h 2 2 I 1 2I 1 I 1 I 1 2I 1 h 1 D (2) h 1 The M M matrix K can be expressed succinctly using the Kronecker product For any matrices A = [a ij ] R m n and B R p q we define a 11 B a 12 B a 1n B a 21 B a 22 B a 2n B A B = R(mp) (nq) a m1 B a m2 B a mn B Thus, if I 2 denotes the (N 2 1) (N 2 1) identity matrix, then K = ai 2 D (2) h 1 + ad (1) h 2 I 2 (13) 4 Matrices with Special Structure When solving linear systems it is important to take advantage of any special structure present in the coefficient matrix Lapack provides many routines that exploit such structure 41 Bandwidth We say that the matrix A = [a ij ] R n n has upper bandwidth q if a ij = 0 whenever j > i + q 7

Likewise, A has lower bandwidth p if a ij = 0 whenever i > j + p, When p = q we refer to their common value simply as a bandwidth of A For instance, the matrix (10) arising from the 1D elliptic equation (5) has bandwidth 1, whereas for the 2D Poisson equation (4) we obtain a matrix (13) with bandwidth N 1 1 Theorem 2 Suppose that A R n n has LU-factorization A = LU 1 If A has upper bandwidth q then so does U 2 If A has lower bandwidth p then so does L Proof We use induction on n The case n = 1 is trivial, so let n > 1 and assume the theorem holds for any matrix in R (n 1) (n 1) Write a 21 a 12 a 22 a 2n v =, w =, B = a n1 a 1n a n2 a nn so that then [ ] a11 w A = T v B [ ] [ ] [ ] 1 0 T 1 0 T a11 w A = T v/a 11 I 0 B vw T /a 11 0 I If A has upper bandwidth q and lower bandwidth p, then so does the matrix B vw T /a 11 R (n 1) (n 1) But [ ] [ ] 1 0 T a11 w L = and U = T v/a 11 L 1 0 U 1 with B vw T /a 11 = L 1 U 1 Hence, the induction hypothesis implies that L 1 had lower bandwidth p and U 1 has upper bandwidth q It follows that the same is true for L and U, so the induction goes through 8

so For A as in Theorem 2 we see that (LU) ij = n l ik u kj k=1 = u ij + l ij = 1 ( a ij u jj u ij = a ij j k=max(1,i p,j q) i 1 l ik u kj if 1 j < i n and i j + p, k=max(1,i p,j q) 0 otherwise, j 1 k=max(1,i p,j q) i 1 k=max(1,i p,j q) l ik u kj if 1 i j n and j i + q, l ik u kj ) for 1 j < i n and i j + p, l ik u kj for 1 i j n and j i + q (14) Suppose for simplicity that p = q The number of flops to compute L is then n 1 min(n,j+p) j=1 i=j+1 [j max(1, i p)] n j+p j=1 i=j+1 (j i + p) 1 2 np2, and similarly we need about 1 2 np2 flops to compute U, so the LU-factorization costs O(np 2 ) flops In particular, if p n then this cost is much less that the O(n 3 /3) cost of factoring a full (ie, non-banded) n n matrix In general, the band LU-factorization must incorporate partial pivoting to avoid numerical instability, so that we have P A = LU In this case, it can be shown that if A has lower bandwidth p and upper bandwidth q, then U has upper bandwidth p + q but L might have lower bandwidth as large as n 1 42 Symmetry and Positivity Recall that a matrix A = [a ij ] R n n is symmetric if A T = A, ie, if a ji = a ij for all i, j In this case, A is said to be positive-definite if x T Ax > 0 for all non-zero x R m 9

Theorem 3 A matrix A R n n is symmetric and positive-definite iff there exists a non-singular, lower-triangular matrix L such that A = LL T Proof Omitted We call A = LL T the Cholesky factorization of A Given the Cholesky factorization, we can solve a linear system Ax = b by forward elimination and back substitution just as we would using an LU-factorization since L T is upper triangular The Cholesky factor L is unique: (LL T ) ij = n l ik l jk = k=1 min(i,j) k=1 l ik l jk so A = LL T iff i 1 l ii = aii l 2 ik for 1 i n, k=1 l ij = 1 ( j 1 ) a ij l ik l jk l jj k=1 for 1 j < i n These formulae may be used to compute L; of course, if the matrix A is not positive-definite then the algorithm will fail because one of the l ii is the square root of a negative number It is easy to verify that the Cholesky factorization costs about n 3 /6 flops, ie, about half the cost of the LU-factorization If A is positive-definite then so l 2 ij i l 2 ik = a ii k=1 l ij a ii for 1 j i n Thus, we cannot encounter the sort of numerical instability that may happen in an LU-factorization without pivoting It can also be shown that if Ax = b is solved numerically via the Cholesky factorization of A then the computed solution ˆx is the exact solution of a perturbed linear system (A + E)ˆx = b with E 2 C n ɛ A 2 10

Furthermore, no square roots of negative numbers will arise provided cond 2 (A) C n ɛ In both cases, the constant C n is of moderate size If A is not only symmetric and positive-definite but also has bandwidth p, then the Cholesky factor L has bandwidth p and can be computed in about np 2 /2 flops 43 Diagonal Dominance The matrix A = [a ij ] R n n is row-diagnally dominant if a ii > n a ij for 1 i n j=1 j i The following result shows that in this case, pivoting is not required Theorem 4 If A R n n is row-diagonally dominant then it has an LUfactorization A = LU with l ij 1 for all i > j Proof Omitted 5 Direct Solution of a Poisson Problem We return to the problem discussed in Section 3 The Fortran module poisson solversf95 contains several subroutines that set up and solve the linear system (12) The subroutine direct_solve constructs a full M M matrix together with the vector of right hand sides and then calls the Lapack routine sposv to compute the solution This approach is very inefficient, both in memory usage and CPU time, because the bandwidth N 1 1 of the coefficient matrix K is much smaller than M = (N 1 1)(N 2 1) A better approach is used by band_solve which constructs K in band storage mode: the ij-entry is stored in position (i j+1, j) of an N 1 M array Thus, the main diagonal of K is stored in the first row of the array, the leading subdiagonal in the second row, and so on until the (N 1 1)-th diagonal is stored in the row N 1 The program test poissonf95 solves the problem in three ways: using direct_solve, band_solve and a routine fast_solve The third routine 11

will be explained later The data f and g are chosen so that the exact solution is known, and the program prints the maximum error in the finite difference solution together with the CPU time taken by each routine The two routines produce nearly identical results but, as expected, the second is much faster Also, using direct_solve you will soon run out of memory if you try to increase the grid resolution Use the makefile poissonmk to compile and link test_poison If, for simplicity, we assume that L 1 = L 2 = L and N 1 = N 2 = N then the cost of solving the full M M system is O(M 3 /3) The band solver reduces this to O(MN1 2 ) = O(M 2 ) It can be shown that u(ih, jh) = u ij + O(h 2 ), h = h 1 = h 2 = L/N, and since h = O(N1 1 ) = O(M 1/2 ) we see that the error is O(M 1 ) Thus, every time we double the value of N we expect the error to be reduced to about 1 of its previous value but the number of unknowns M grows by a 4 factor of 4 so that the solution cost grows by a factor of 4 3 = 48 if we do not exploit the band structure Even if we use a band solver the cost grows by a factor of 4 2 = 16 6 A Fast Poisson Solver There exist a number of algorithms that reduce the cost of solving Poisson s equation from O(M 2 ) for a band Cholesky solver to O(M) or something that is only a bit worse than O(M) In this section, we will see how the fast Fourier transform or FFT may be exploited to obtain a reasonably simple Poisson solver with O(M log M) complexity For simplicity, we will assume that the Dirichlet data is identically zero, so that a 2 u = f in Ω, u = 0 on Ω (15) 61 The Fast Fourier Transform The discrete Fourier transform of a sequence of n complex numbers a 0, a 1,, a N 1 is the sequence b 0, b 1,, b N 1 defined by b p = N 1 j=0 a j W pj for 0 p N 1, where W = e i2π/n Notice that W is an nth root of unity, ie, W n = 1 12

Lemma 5 The original sequence a j can be reconstructed from its discrete Fourier transform using the inversion formula a j = 1 N N 1 p=0 b p W pj for 0 j N 1 To compute a discrete Fourier transform in the obvious way by evaluating N sums, each with N terms, requires O(N 2 ) flops but this cost can be reduced to O(N log 2 N) by using an algorithm called the fast Fourier transform or FFT We will use an open-source library called FFTW (Fastest Fourier Transform in the West) that provides efficient implementations of a variety of transforms The library is coded in C but also has a Fortran 77 interface The FFTW home page is wwwfftworg and on the lab PCs you will find libfftw3a in /usr/local/numeric/lib The programs test fourierc and test fourierf95 use both methods to reconstruct a random sequence, printing out the CPU time and the maximum error in the reconstructed sequence for each case A rule is provided in poissonmk to compile and link the each program The FFT may be used to evaluate various other types of sums In the next section, we will need to compute a discrete sine transform Since where we see that N 1 ( ) πpj b p = 2 a j sin N j=1 ( ) πpj sin = eπpj/n e πpj/n N 2i for p = 1, 2,, N 1 W = e i2π/n and N = 2N, ib p = N 1 j=1 a j W pj = W pj/2 W pj/2 2i N 1 j=1 a j W pj The substitution k = N j gives W pj = W pn pk = W pk so N 1 j=1 a j W pj = N 1 k=n+1 a N kw pk 13

and where b p = N 1 j=0 c j W pj 0 if j = 0 or j = N, c j = ia j if 1 j N 1, ia N j if N + 1 j N 1, The FFTW library implements a fast sine transform in this way 62 Fourier Expansion in the Space Variable We expand the solution u into a Fourier sine series in x: ( ) πp u(x, y) = û(p, y) sin x for 0 < x < L 1, (16) L 1 p=1 where the Fourier coefficients are given by û(p, y) = 2 L1 ( ) πp u(x, y) sin x dx for p = 1, 2, 3, L 1 L 1 0 We also expand the source term in the same way: ( ) πp f(x, y) = ˆf(p, y) sin x, (17) L p=1 1 ˆf(p, y) = 2 L1 ( ) πp f(x, y) sin x dx (18) L 1 L 1 Since ( ) 2 πp x sin x = 2 L 1 0 ( πp L 1 ) 2 ( ) πp sin x L 1 we see that (15) is equivalent to the sequence of ordinary differential equations ( ) 2 πp a û(p, y) a 2 û y = ˆf(p, y) for 0 < y < L 2 2, (19) L 1 with boundary conditions for p = 1, 2, 3, û(p, 0) = 0 and û(p, L 2 ) = 0, 14

We apply the trapezoidal rule with step-size h 1 to the integral (18) with y = kh 2 and obtain the approximation ˆf(p, kh 2 ) ˆf pk = 2 N 1 1 N 1 Next, we generate approximations j=1 û pk û(p, kh 2 ) ( ) πp f(jh 1, kh 2 ) sin jh 1 L 1 by applying a central difference approximation in (19): ( ) 2 πp a û pk + a û p,k 1 + 2û pk û p,k 1 L 1 h 2 2 = ˆf pk for k = 1, 2,, N 2 1, with boundary conditions û p0 = 0 = û pn2 Thus, we can compute û pk by solving an (N 2 1) (N 2 1) symmetric, positive-definite, tri-diagonal linear system for each p Having obtained û pk we truncate the Fourier series in (16) to obtain an approximation to the solution of (15), u(jh 1, kh 2 ) u jk = N 1 1 p=1 The cost of computing u jk in this way is ( ) πp û pk sin jh 1 L 1 1 O(N 2 N 1 log 2 N 1 ) flops using FFTs to compute ˆf(p, kh 2 ) for 0 p N 1 1 and 1 k N 2 1; 2 O(N 2 N 1 ) flops using a band Cholesky solver to find û pk for 1 p N 1 1 and 0 k N 2 1; 3 O(N 2 N 1 log 2 N 1 ) flops using FFTs to compute u jk for 0 j N 1 1 and 1 k N 2 1 Assuming that N 1 = O(N 2 ) so that log 2 N 1 = O(log 2 M) = O(log2 M) it follows that the total cost is O(M log 2 M) Here, we have not counted the work to evaluate f at each interior grid point, but this will be O(M) provided it costs O(1) flops to evaluate f at each individual point The routine fast_solve from poisson solversf95 implements the algorithm described above 15

7 Tutorial Exercises 71 Comparison of Computational Costs Modify the module Dirichlet_problem in test poissonf95 so that the exact solution is ( ) πy u(x, y) = x(l 1 x) sin Obviously, you must also modify f and g accordingly 1 Verify that the maximum error is O(h 2 1 +h 2 2) for fast_solve and O(h 2 2) for band_solve and direct_solve 2 Modify the program test\_poisson so that you can estimate the constants in the asymptotic complexity bounds O(M log 2 M), O(M 2 ) and O(M 3 ) for fast_solve, band_solve and direct_solve, respectively For example, in the case of fast_solve the constant should be approximately equal to the CPU time divided by M log 2 M 3 Determine the minimum error achievable in under 60 seconds using fast_solve Estimate how long it would take to achieve the same accuracy using band_solve and direct_solve How many Megabytes of RAM would you need in each case? L 2 16