Lecture4 INF-MAT : 5. Fast Direct Solution of Large Linear Systems

Similar documents
Lecture 2 INF-MAT : A boundary value problem and an eigenvalue problem; Block Multiplication; Tridiagonal Systems

Lecture 2 INF-MAT : , LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky

Lecture 1 INF-MAT3350/ : Some Tridiagonal Matrix Problems

Lecture 1 INF-MAT : Chapter 2. Examples of Linear Systems

3.5 Finite Differences and Fast Poisson Solvers

Lecture Notes for Inf-Mat 3350/4350, Tom Lyche

9. Iterative Methods for Large Linear Systems

Computing Eigenvalues and/or Eigenvectors;Part 1, Generalities and symmetric matrices

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Poisson Solvers. William McLean. April 21, Return to Math3301/Math5315 Common Material.

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Computing Eigenvalues and/or Eigenvectors;Part 1, Generalities and symmetric matrices

Scientific Computing: An Introductory Survey

CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform

Solving linear systems (6 lectures)

Computing Eigenvalues and/or Eigenvectors;Part 2, The Power method and QR-algorithm

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)

ACM106a - Homework 2 Solutions

Parallel Numerical Algorithms

Notes for CS542G (Iterative Solvers for Linear Systems)

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Radial Basis Functions I

Math 405: Numerical Methods for Differential Equations 2016 W1 Topics 10: Matrix Eigenvalues and the Symmetric QR Algorithm

Lecture # 11 The Power Method for Eigenvalues Part II. The power method find the largest (in magnitude) eigenvalue of. A R n n.

6.4 Krylov Subspaces and Conjugate Gradients

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

Appendix C: Recapitulation of Numerical schemes

Lecture 18 Classical Iterative Methods

Scientific Computing

EE364a Review Session 7

Direct Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le

Numerical Methods I Non-Square and Sparse Linear Systems

A Cholesky LR algorithm for the positive definite symmetric diagonal-plus-semiseparable eigenproblem

Orthonormal Transformations

9. Numerical linear algebra background

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Iterative Methods for Solving A x = b

Matrix Multiplication Chapter IV Special Linear Systems

Orthonormal Transformations and Least Squares

Sequential Fast Fourier Transform (PSC ) Sequential FFT

Foundations of Matrix Analysis

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

x x2 2 + x3 3 x4 3. Use the divided-difference method to find a polynomial of least degree that fits the values shown: (b)

Lecture 1: Systems of linear equations and their solutions

The Singular Value Decomposition and Least Squares Problems

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9

Math 577 Assignment 7

Numerical Methods - Numerical Linear Algebra

Linear Algebra, part 3 QR and SVD

CSE 548: Analysis of Algorithms. Lecture 4 ( Divide-and-Conquer Algorithms: Polynomial Multiplication )

Orthogonal Transformations

Computing Eigenvalues and/or Eigenvectors;Part 2, The Power method and QR-algorithm

NORMS ON SPACE OF MATRICES

Numerical Methods I: Eigenvalues and eigenvectors

Computational Linear Algebra

Differential equations

Notes on PCG for Sparse Linear Systems

Solving large scale eigenvalue problems

Numerical tensor methods and their applications

M.A. Botchev. September 5, 2014

JACOBI S ITERATION METHOD

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Lecture 3: QR-Factorization

Solving linear equations with Gaussian Elimination (I)

13-2 Text: 28-30; AB: 1.3.3, 3.2.3, 3.4.2, 3.5, 3.6.2; GvL Eigen2

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

CAAM 335: Matrix Analysis

Block-tridiagonal matrices

A Brief Outline of Math 355

9. Numerical linear algebra background

Divide and Conquer. Maximum/minimum. Median finding. CS125 Lecture 4 Fall 2016

Numerical Methods I: Numerical linear algebra

Orthogonal Symmetric Toeplitz Matrices

On the Preconditioning of the Block Tridiagonal Linear System of Equations

AN ITERATION. In part as motivation, we consider an iteration method for solving a system of linear equations which has the form x Ax = b

Sparse Matrix Techniques for MCAO

AMS526: Numerical Analysis I (Numerical Linear Algebra)

4.6 Iterative Solvers for Linear Systems

Introduction to PDEs and Numerical Methods Lecture 7. Solving linear systems

CS 246 Review of Linear Algebra 01/17/19

Section 4.4 Reduction to Symmetric Tridiagonal Form

AMS526: Numerical Analysis I (Numerical Linear Algebra)

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

Cayley-Hamilton Theorem

Algebra Workshops 10 and 11

Algebra C Numerical Linear Algebra Sample Exam Problems

Mathematics and Computer Science

9.6: Matrix Exponential, Repeated Eigenvalues. Ex.: A = x 1 (t) = e t 2 F.M.: If we set

EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

Numerical Methods for Random Matrices

A Fast Fourier transform based direct solver for the Helmholtz problem. AANMPDE 2017 Palaiochora, Crete, Oct 2, 2017

Chapter 7 Iterative Techniques in Matrix Algebra

Scientific Computing: Solving Linear Systems

We wish to solve a system of N simultaneous linear algebraic equations for the N unknowns x 1, x 2,...,x N, that are expressed in the general form

AIMS Exercise Set # 1

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

Lecture Notes for Mat-inf 4130, Tom Lyche

Name: INSERT YOUR NAME HERE. Due to dropbox by 6pm PDT, Wednesday, December 14, 2011

Transcription:

Lecture4 INF-MAT 4350 2010: 5. Fast Direct Solution of Large Linear Systems Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 16, 2010

Test Matrix T 1 R m,m T 1 := d a 0 a d a 0 a d a 0 a d where a, d R. We call this the 1D test matrix. Second Derivative Matrix T: a = 1, d = 2 symmetric, tridiagonal, diagonally dominant if d 2 a, nonsingular if diagonally dominant and a 0, strictly diagonally dominant if d > 2 a., (1)

Properties of T 2 = T 1 I + I T 1 T 2 := 2d a 0 a 0 0 0 0 0 a 2d a 0 a 0 0 0 0 0 a 2d 0 0 a 0 0 0 a 0 0 2d a 0 a 0 0 0 a 0 a 2d a 0 a 0 0 0 a 0 a 2d 0 0 a 0 0 0 a 0 0 2d a 0 0 0 0 0 a 0 a 2d a 0 0 0 0 0 a 0 a 2d 1. Poisson matrix A for a = 1, d = 2. 2. It is symmetric positive definite if d > 0 and d 2 a. 3. It is banded. 4. It is block-tridiagonal. 5. We know the eigenvalues and eigenvectors. 6. The eigenvectors are orthogonal..

Fast Methods Consider solving Ax = b, where A = T 1 or A = T 2. O(n) algorithm for T 1. What about T 2? Consider d = 2, a = 1, the Poisson matrix. Call it A.

Full Cholesky Symmetric positive definite system Ax = b of order n = m 2 A = R T R with R upper triangular. Full Cholesky: #flops O(n 3 /3). If m = 10 3 then it takes years of computing time. Need to use matrix structure. 0 5 10 15 20 25 0 5 10 15 20 25 nz = 105

Banded Cholesky System Ax = b of order n = m 2 and bandwidth d = m = n. band Cholesky: #flops is O(nd 2 ) = O(n 2 ). But if m = 10 3 then it still takes hours of computing time 0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100 nz = 1009 We get fill-inn and the O(n 2 ) count is realistic.

Use Blockstructure of A A = Block tridiagonal 4 1 0 1 0 0 0 0 0 1 4 1 0 1 0 0 0 0 0 1 4 0 0 1 0 0 0 1 0 0 4 1 0 1 0 0 0 1 0 1 4 1 0 1 0 0 0 1 0 1 4 0 0 1 0 0 0 1 0 0 4 1 0 0 0 0 0 1 0 1 4 1 0 0 0 0 0 1 0 1 4 A = T + 2I I 0 I T + 2I I 0 I T + 2I

LU of Block Tridiagonal Matrix [ ] D1 C 1 0 A 2 D 2 C 2 = 0 A 3 D 3 [ ] [ ] I 0 0 R1 C 1 0 L 2 I 0 0 R 2 C 2. 0 L 3 I 0 0 R 3 Given A i, D i, C i with D i square. Find L i, R i. Compare (i, j) block entries on both sides (1, 1) : D 1 = R 1 R 1 = D 1 (2, 1) : A 2 = L 2 R 1 L 2 = A 2 R 1 1 (2, 2) : D 2 = L 2 C 1 + R 2 R 2 = D 2 L 2 C 1 (2, 3) : A 3 = L 3 R 2 L 3 = A 3 R 1 2 (3, 3) : D 3 = L 3 C 2 + R 3 R 3 = D 3 L 3 C 2 In general R 1 = D 1, L k = A k R 1 k 1, R k = D k L k C k 1, k = 2, 3,..., m.

LU of Block Tridiagonal Matrix Ax = b = [b T 1,..., b T m] T D 1 C 1 A 2 D 2 C 2......... A m 1 D m 1 C m 1 A m D m = I L 2 I...... L m I R 1 C 1...... R m 1 C m 1 R m. R 1 = D 1, 2, 3,..., m. L k = A k R 1 k 1, R k = D k L k C k 1, k = y 1 = b 1, y k = b k L k y k 1, k = 2, 3,..., m, x m = R 1 m y m, x k = R 1 k (y k C k x k+1 ), k = m 1,..., 2, 1. (2) The solution is then x T = [x T 1,..., xt m].

BlockCholesky factor R for n = 100 0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100 nz = 1018 We get fill-inn in the R i blocks Computing R i requires O(m 3 ) flops Need m O(m 3 ) = O(m 4 ) flops and the O(n 2 ) count remains. Block methods can be faster because we replace some loops with matrix multiplications.

Other Methods Iterative methods. Multigrid. Fast solvers based on diagonalization of T = tridiag( 1, 2, 1) and the Fast Fourier Transform. For this solve Ax = b in the form TV + VT = h 2 F.

Eigenpairs of T = tridiag( 1, 2, 1) R m,m Let h = 1/(m + 1). We know that Ts j = λ j s j for j = 1,..., m, where s j = [sin (jπh), sin (2jπh),..., sin (mjπh)] T, λ j = 4 sin 2 ( jπh 2 ), s T j s k = 1 2h δ j,k, j, k = 1,..., m.

The sine matrix S S := [s 1,..., s m ], D = diag(λ 1,..., λ n ), TS = SD, S T S = S 2 = 1 2h I, S is almost, but not quite orthogonal.

Find V using diagonalization We define a matrix X by V = SXS, where V is the solution of TV + VT = h 2 F TV + VT = h 2 F V=SXS TSXS + SXST = h 2 F S( )S STSXS 2 + S 2 XSTS = h 2 SFS TS=SD S 2 DXS 2 + S 2 XS 2 D = h 2 SFS S 2 =I/(2h) DX + XD = 4h 4 SFS =: B. λ j x jk + x jk λ k = b jk. x jk = b jk /(λ j + λ k ) for all j, k.

Algorithm [A Simple Fast Poisson Solver] 1. h = 1/(m + 1); F = ( f (jh, kh) ) m j,k=1 ; S = ( sin(jkπh) ) m j,k=1 ; σ = ( sin 2 ((jπh)/2) ) m j=1 2. G = (g j,k ) = SFS; 3. X = (x j,k ) m j,k=1, where x j,k = h 4 g j,k /(σ j + σ k ); 4. V = SXS; (3) Output is the exact solution of the discrete Poisson equation on a square computed in O(n 3/2 ) operations. Only a couple of m m matrices are required for storage. Next: Use FFT to reduce the complexity to O(n log 2 n)

Discrete Sine Transform, DST Given v = [v 1,..., v m ] T R m we say that the vector w = [w 1,..., w m ] T given by w j = m k=1 ( ) jkπ sin v k, m + 1 is the Discrete Sine Transform (DST) of v. j = 1,..., m In matrix form we can write the DST as the matrix times vector w = Sv, where S is the sine matrix. We can identify the matrix B = SA as the DST of A R m,n, i.e. as the DST of the columns of A.

The product B = AS The product B = AS can also be interpreted as a DST. Indeed, since S is symmetric we have B = (SA T ) T which means that B is the transpose of the DST of the rows of A. It follows that we can compute the unknowns V in Algorithm 15 by carrying out Discrete Sine Transforms on 4 m-by-m matrices in addition to the computation of X.

The Euler formula e iφ = cos φ + i sin φ, e iφ = cos φ i sin φ φ R, i = 1 Consider Note that cos φ = eiφ + e iφ, sin φ = eiφ e iφ 2 2i ω N := e 2πi/N = cos ( 2π N ) (2π ) i sin, ω N N N = 1 ω jk 2m+2 = e 2jkπi/(2m+2) = e jkπhi = cos(jkπh) i sin(jkπh).

Discrete Fourier Transform (DFT) If z j = N k=1 ω (j 1)(k 1) N y k, j = 1,..., N then z = [z 1,..., z N ] T is the DFT of y = [y 1,..., y N ] T. z = F N y, where F N := ( ω (j 1)(k 1) ) N N j,k=1, RN,N F N is called the Fourier Matrix

Example ω 4 = exp 2πi/4 = cos( π 2 ) i sin( π 2 ) = i F 4 = 1 1 1 1 1 ω 4 ω 2 4 ω 3 4 1 ω 2 4 ω 4 4 ω 6 4 1 ω 3 4 ω 6 4 ω 9 4 = 1 1 1 1 1 i 1 i 1 1 1 1 1 i 1 i

Connection DST and DFT DST of order m can be computed from DFT of order N = 2m + 2 as follows: Lemma Given a positive integer m and a vector x R m. Component k of S m x is equal to i/2 times component k + 1 of F 2m+2 z where In symbols z = (0, x 1,..., x m, 0, x m, x m 1,..., x 1 ) T R 2m+2. (S m x) k = i 2 (F 2m+2z) k+1, k = 1,..., m.

(0, x 1,..., x m, 0, x m, x m 1,..., x 1 ) T Proof Let ω = ω 2m+2. Component k + 1 of F 2m+2 z is given by (F 2m+2 z) k+1 = = m m x j ω jk x j ω (2m+2 j)k j=1 j=1 m x j (ω jk ω jk ) j=1 = 2i m x j sin(jkπh) = 2i(S m x) k. j=1 Dividing both sides by 2i proves the Lemma.

Fast Fourier Transform (FFT) Suppose N is even. Express F N in terms of F N/2. Reorder the columns in F N so that the odd columns appear before the even ones. P N := (e 1, e 3,..., e N 1, e 2, e 4,..., e N ) F 4 = 1 1 1 1 1 i 1 i 1 1 1 1 1 i 1 i F 4P 4 = 1 1 1 1 1 1 i i 1 1 1 1 1 1 i i [ ] [ ] [ 1 1 1 1 1 0 F 2 = =, D 1 ω 2 1 1 2 = diag(1, ω 4 ) = 0 i [ ] F2 D F 4 P 4 = 2 F 2 F 2 D 2 F 2. ].

F 2m, P 2m, F m and D m Theorem If N = 2m is even then [ Fm D F 2m P 2m = m F m F m D m F m ], (4) where D m = diag(1, ω N, ω 2 N,..., ωm 1 N ). (5)

Proof Fix integers j, k with 0 j, k m 1 and set p = j + 1 and q = k + 1. Since ωm m = 1, ωn 2 = ω m, and ωn m = 1 we find by considering elements in the four sub-blocks in turn (F 2m P 2m ) p,q = ω j(2k) N = ω jk m = (F m ) p,q, (F 2m P 2m ) p+m,q = ω (j+m)(2k) N = ω m (j+m)k = (F m ) p,q, (F 2m P 2m ) p,q+m = ω j(2k+1) N = ω j N ωjk m = (D m F m ) p,q, (F 2m P 2m ) p+m,q+m = ω (j+m)(2k+1) N = ω j(2k+1) N = ( D m F m ) p,q It follows that the four m-by-m blocks of F 2m P 2m have the required structure.

The basic step Using Theorem 2 we can carry out the DFT as a block multiplication. Let y R 2m and set w = P T 2my = (w T 1, wt 2 )T, where w 1, w 2 R m. Then F 2m y = F 2m P 2m P T 2my = F 2m P 2m w and [ ] [ ] [ Fm D F 2m P 2m w = m F m w1 q1 + q = 2 F m D m F m w 2 q 1 q 2 where q 1 = F m w 1 and q 2 = D m (F m w 2 ). In order to compute F 2m y we need to compute F m w 1 and F m w 2. Note that w T 1 = [y 1, y 3,..., y N 1 ], while w T 2 = [y 2, y 4,..., y N ]. This follows since w T = [w T 1, wt 2 ] = yt P 2m and post multiplying a vector by P 2m moves odd indexed components to the left of all the even indexed components. ],

Recursive FFT Algorithm (Recursive FFT) Recursive Matlab function when N = 2 k function z=fftrec(y) n=length(y); if n==1 z=y; else q1=fftrec(y(1:2:n-1)); q2=exp(-2*pi*i/n).ˆ (0:n/2-1).*fftrec(y(2:2:n)); z=[q1+q2 q1-q2]; end Such a recursive version of FFT is useful for testing purposes, but is much too slow for large problems. A challenge for FFT code writers is to develop nonrecursive versions and also to handle efficiently the case where N is not a power of two.

FFT Complexity Let x k be the complexity (the number of flops) when N = 2 k. Since we need two FFT s of order N/2 = 2 k 1 and a multiplication with the diagonal matrix D N/2 it is reasonable to assume that x k = 2x k 1 + γ2 k for some constant γ independent of k. Since x 0 = 0 we obtain by induction on k that x k = γk2 k = γn log 2 N. This also holds when N is not a power of 2. Reasonable implementations of FFT typically have γ 5

Improvement using FFT The efficiency improvement using the FFT to compute the DFT is impressive for large N. The direct multiplication F N y requires O(8n 2 ) flops since complex arithmetic is involved. Assuming that the FFT uses 5N log 2 N flops we find for N = 2 20 10 6 the ratio 8N 2 5N log 2 N 84000. Thus if the FFT takes one second of computing time and if the computing time is proportional to the number of flops then the direct multiplication would take something like 84000 seconds or 23 hours.

Poisson solver based on FFT requires O(n log 2 n) flops, where n = m 2 is the size of the linear system Ax = b. Why? 4m sine transforms: G 1 = SF, G = G 1 S, V 1 = SX, V = V 1 S. A total of 4m FFT s of order 2m + 2 are needed. Since one FFT requires O(γ(2m + 2) log 2 (2m + 2)) flops the 4m FFT s amounts to 8γm(m + 1) log 2 (2m + 2) 8γm 2 log 2 m = 4γn log 2 n, This should be compared to the O(8n 3/2 ) flops needed for 4 straightforward matrix multiplications with S. What is faster will depend on the programming of the FFT and the size of the problem.

Compare exact methods System Ax = b of order n = m 2. Assume m = 1000 so that n = 10 6. Method # flops Storage T = 10 9 flops 1 Full Cholesky 3 n3 = 1 3 1018 n 2 = 10 12 10 years Band Cholesky 2n 2 = 2 10 12 n 3/2 = 10 9 1/2 hour Block Tridiagonal 2n 2 = 2 10 12 n 3/2 = 10 9 1/2 hour Diagonalization 8n 3/2 = 8 10 9 n = 10 6 8 sec FFT 20n log 2 n = 4 10 8 n = 10 6 1/2 sec