The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

Similar documents
Solving Sparse Linear Systems: Iterative methods

Solving Sparse Linear Systems: Iterative methods

Iterative methods for Linear System

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

9.1 Preconditioned Krylov Subspace Methods

Chapter 7 Iterative Techniques in Matrix Algebra

Iterative Methods for Solving A x = b

Notes on Some Methods for Solving Linear Systems

Iterative Methods for Sparse Linear Systems

Iterative Methods for Linear Systems of Equations

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Iterative Methods and Multigrid

Comparison of Fixed Point Methods and Krylov Subspace Methods Solving Convection-Diffusion Equations

6.4 Krylov Subspaces and Conjugate Gradients

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

4.6 Iterative Solvers for Linear Systems

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

The Conjugate Gradient Method

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

M.A. Botchev. September 5, 2014

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

AMSC 600 /CMSC 760 Advanced Linear Numerical Analysis Fall 2007 Krylov Minimization and Projection (KMP) Dianne P. O Leary c 2006, 2007.

Lab 1: Iterative Methods for Solving Linear Systems

From Stationary Methods to Krylov Subspaces

FEM and sparse linear system solving

Algorithms that use the Arnoldi Basis

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

On the influence of eigenvalues on Bi-CG residual norms

Preface to the Second Edition. Preface to the First Edition

PETROV-GALERKIN METHODS

Jordan Journal of Mathematics and Statistics (JJMS) 5(3), 2012, pp A NEW ITERATIVE METHOD FOR SOLVING LINEAR SYSTEMS OF EQUATIONS

CAAM 454/554: Stationary Iterative Methods

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Classical iterative methods for linear systems

Notes for CS542G (Iterative Solvers for Linear Systems)

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

KRYLOV SUBSPACE ITERATION

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

AMS526: Numerical Analysis I (Numerical Linear Algebra)

IDR(s) Master s thesis Goushani Kisoensingh. Supervisor: Gerard L.G. Sleijpen Department of Mathematics Universiteit Utrecht

9. Iterative Methods for Large Linear Systems

Lecture 18 Classical Iterative Methods

Introduction to Iterative Solvers of Linear Systems

Stabilization and Acceleration of Algebraic Multigrid Method

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

1 Extrapolation: A Hint of Things to Come

CLASSICAL ITERATIVE METHODS

A parameter tuning technique of a weighted Jacobi-type preconditioner and its application to supernova simulations

7.2 Steepest Descent and Preconditioning

Background. Background. C. T. Kelley NC State University tim C. T. Kelley Background NCSU, Spring / 58

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

Course Notes: Week 1

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White

Computational Linear Algebra

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

Iterative Methods. Splitting Methods

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

Krylov Subspace Methods that Are Based on the Minimization of the Residual

Computational Linear Algebra

DELFT UNIVERSITY OF TECHNOLOGY

EECS 275 Matrix Computation

Numerical Methods - Numerical Linear Algebra

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU

Contents. Preface... xi. Introduction...

Finite-choice algorithm optimization in Conjugate Gradients

The Lanczos and conjugate gradient algorithms

Introduction to Scientific Computing

EXAMPLES OF CLASSICAL ITERATIVE METHODS

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Algebraic Multigrid as Solvers and as Preconditioner

JACOBI S ITERATION METHOD

A DISSERTATION. Extensions of the Conjugate Residual Method. by Tomohiro Sogabe. Presented to

Scientific Computing: An Introductory Survey

Course Notes: Week 4

Solving PDEs with Multigrid Methods p.1

Krylov Space Solvers

Conjugate Gradient Method

Key words. linear equations, polynomial preconditioning, nonsymmetric Lanczos, BiCGStab, IDR

ON ORTHOGONAL REDUCTION TO HESSENBERG FORM WITH SMALL BANDWIDTH

The conjugate gradient method

Scientific Computing WS 2018/2019. Lecture 9. Jürgen Fuhrmann Lecture 9 Slide 1

Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods

Preconditioning Techniques Analysis for CG Method

Introduction. Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods. Example: First Order Richardson. Strategy

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Here is an example of a block diagonal matrix with Jordan Blocks on the diagonal: J

Lecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University

COURSE Iterative methods for solving linear systems

The Conjugate Gradient Method

In order to solve the linear system KL M N when K is nonsymmetric, we can solve the equivalent system

Krylov Subspaces. Lab 1. The Arnoldi Iteration

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Linear Solvers. Andrew Hazel

RESIDUAL SMOOTHING AND PEAK/PLATEAU BEHAVIOR IN KRYLOV SUBSPACE METHODS

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.

Transcription:

AMSC/CMSC 661 Scientific Computing II Spring 2005 Solution of Sparse Linear Systems Part 2: Iterative methods Dianne P. O Leary c 2005 Solving Sparse Linear Systems: Iterative methods The plan: Iterative methods: Basic (slow) iterations: Jacobi, Gauss-Seidel, SOR. Krylov subspace methods Preconditioning (where direct meets iterative) A special purpose method: Multigrid Basic iterations The idea: Given an initial guess x (0) for the solution to Ax = b, construct a sequence of guesses {x (0),x (1),x (2),...} converging to x. The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A. Stationary Iterative Methods These methods grew up in the engineering and mathematical literature. They were very popular in the 1960s and are still sometimes used. Today, they are almost never the best algorithms to use (because they take too many iterations), but they are useful preconditioners for Krylov subspace methods. We will define three of them: Jacobi (Simultaneous displacement) Gauss-Seidel (Successive displacement) SOR 1

Theme: All of these methods split A as M N for some nonsingular matrix M. Other splittings of this form are also useful. The Jacobi iteration Idea: The ith component of the residual vector r is defined by r i = b i a i1 x 1 a i2 x 2... a i,i 1 x i 1 a ii x i a i,i+1 x i+1... a in x n. Let s modify x i to make r i =0. Given x (k), construct x (k+1) by x (k+1) i i 1 =(b i j=1 a ij x (k) j n j=i+1 a ij x (k) j )/a ii, i =1,...,n. Observations: We must require A to have nonzeros on its main diagonal. The algorithm is easy to program! We only need to store two x vectors, x (k) and x (k+1). The iteration may or may not converge, depending on the properties of A. We should only touch the nonzeros in A otherwise the work per iteration would be O(n 2 ) instead of O(nz). If we partition A as L + D + U, where D contains the diagonal entries, U contains the entries above the diagonal, and L contains the entries below the diagonal, then we can express the iteration as Dx (k+1) = b (L + U)x (k) and this is useful for analyzing convergence. (M = D, N = (L + U)) The Gauss-Seidel iteration Idea: If we really believe that we have improved the ith component of the solution by our Jacobi iteration, then it makes sense to use its latest value in the iteration: 2

Given x (k), construct x (k+1) by x (k+1) i i 1 =(b i j=1 a ij x (k+1) j n j=i+1 a ij x (k) j )/a ii,i=1,...,n. Observations: We still require A to have nonzeros on its main diagonal. The algorithm is easier to program, since we only need to keep one x vector around! The iteration may or may not converge, depending on the properties of A. We should only touch the nonzeros in A otherwise the work per iteration would be O(n 2 ) instead of O(nz). If we partition A as L + D + U, where D contains the diagonal entries, U contains the entries above the diagonal, and L contains the entries below the diagonal, then we can express the iteration as (D + L)x (k+1) = b Ux (k) and this is useful for analyzing convergence. (M = D + L, N = U) The SOR (Successive Over-Relaxation) iteration Idea: People who used these iterations on finite difference matrices discovered that Gauss-Seidel (GS) converged faster than Jacobi (J), and they could improve its convergence rate by going a little further in the GS direction: Given x (k), construct x (k+1) by where ω is a number between 1 and 2. x (k+1) =(1 ω)x (k) +ωx (k+1) GS Unquiz: Suppose n =2and our linear system can be graphed as in the figure. Draw the first 3 Jacobi iterates and the first 3 Gauss-Seidel iterates using the pointmarkedwithastarasx (0). Does either iteration depend on the ordering of the equations or unknowns? [] Convergence of Stationary iterative methods 3

5 4 3 2 x 2 1 0 1 2 2 1 0 1 2 3 4 5 x 1 4

All of these iterations can be expressed as x (k+1) = Gx (k) + c where G = M 1 N is a matrix that depends on A and c is a vector that depends on A and b. For all of these iterations, x = Gx + c. Subtracting, we see that the error e (k) = x (k) x satisfies e (k+1) = Ge (k), and it can be shown that the error converges to zero for any initial x (0) if and only if all of the eigenvalues of G lie inside the unit circle. Many conditions on A have been found that guarantee convergence of these methods; see the SIM notes. This is enough about slow methods. From SIM to Krylov subspace methods So far: Stationary iterative methods. Ax = b is replaced by x = Gx + c. x (k+1) = Gx (k) + c If x (0) =0, then x (1) = c x (2) span{c, Gc} x (3) span{c, Gc, G 2 c} x (k) span{c, Gc, G 2 c,...,g k 1 c} K k (G, c) and we call K k (G, c) a Krylov subspace. The work per iteration is O(nz) plus a small multiple of n. Note that K k (G, c) =K k (Ĝ, c) if Ĝ = I G. The idea behind Krylov subspace methods: Instead of making the GS choice (for example) from the Krylov subspace, let s try to pick the best vector without doing a lot of extra work. What is best? 5

The variational approach: Choose x (k) K k (G, c) to minimize x x Z where y 2 Z = yt Zy and Z is a symmetric positive definite matrix. The Galerkin approach: Choose x (k) K k (G, c) to make the residual r (k) = b Ax (k) orthogonal to every vector in K k (G, c) for some choice of inner product. Practicalities Note that we expand the subspace K at each iteration. This gives us a very important property: After at most n iterations, Krylov subspace iterations terminate with the true solution. This finite termination property is less useful than it might seem, since we think of applying these methods when n is so large (thousands, millions, billions, etc.) that we can t afford more than a few hundred iterations. The only way to make the iteration practical is to make a very clever choice of basis for K. If we use the obvious choice of c, Gc, G 2 c,..., then after just a few iterations, our algorithm will lose accuracy. We need to choose between (variational) minimization and (Galerkin) projection. Krylov Ingredient 1: A practical basis An orthonormal basis for K makes the iteration practical. We say that a vector v is B-orthogonal to a vector u if u T Bv =0. where B is a symmetric positive definite matrix. Similarly, we define u 2 B = ut Bu. Let s construct our basis for K k (Ĝ, c). Our first basis vector is p (0) = c/ c B 6

Now suppose that we have j +1basis vectors p (0),...,p (j) for K j+1 (Ĝ, c), and that we have some vector r K j+2 (Ĝ, c) but r/ Kj+1(Ĝ, c). Often, we take r to be Ĝp(j). We define the next basis vector by the process of Gram-Schmidt orthogonalization: p (j+1) =(r h 0,j p (0)... h j,j p (j) )/h j+1,j where h i,j = p (i)t Br (i =0,...,j) and h j+1,j is chosen so that p (j+1)t Bp (j+1) =1. In matrix form, we can express this relation as Ĝp (j) = [ p (0) p (1)... p ] (j+1) h 0,j h 1,j.. h j+1,j so after k steps we have ĜP k = P k+1 H k ( ) where H k is a (k +1) k matrix with entries h ij (zero if i>j+1) and P k is n k and contains the first k basis vectors as its columns., Equation (*) is very important: Since the algorithm terminates after n vectors have been formed, we have actually factored our matrix (and note that P 1 n = P T n if B = I) Ĝ = P n H n P 1 n Therefore, the matrix H n is closely related to Ĝ it has the same eigenvalues. In fact, the leading k k piece of H n (available after k steps) is in some sense a good approximation to Ĝ. This is the basis of algorithms for solving linear systems of equations involving Ĝ. finding approximations to eigenvalues and eigenvectors of Ĝ. We have just constructed the Arnoldi algorithm. The Arnoldi algorithm 7

[P, H] = Arnoldi(k, Ĝ, B, p(0) ) Given a positive integer k, a symmetric positive definite matrix B, a matrix Ĝ, and a vector p(0) with p (0) B =1. for j =0,1,...,k 1, end (for j) p (j+1) = Ĝp (j). for i =0,...,j, h ij = p (i)t Bp (j+1) p (j+1) = p (j+1) h ij p (i) end (for i) h j+1,j =(p (j+1)t Bp (j+1) ) 1/2 p (j+1) = p (j+1) /h j+1,j. Notes: In practice, B is either the identity matrix or a matrix closely related to Ĝ. Note that we need only 1 matrix-vector product by Ĝ per iteration. After k iterations, we have done O(k 2 ) inner products of length n each, and this work becomes significant as k increases. If BĜ is symmetric, then by (*), so is H k, so all but 2 of the inner products at step j are zero. In this case, we can let the loop index i = j 1:j and the number of inner products drops to O(k). Then the Arnoldi algorithm is called Lanczos tridiagonalization. In writing this algorithm, we took advantage of the fact that p (i)t Bp (j+1) is mathematically the same, whether we use the original vector p (j+1) or the updated one. Numerically, using the updated one works a bit better, but both eventually lose orthogonality, and the algorithm sometimes needs to be restarted to overcome this. Krylov Ingredient 2: A definition of best Two good choices: (variational) minimization (Galerkin) projection. Using Krylov minimization 8

Problem: Find x (k) K k so that x (k) minimizes over all choices of x K k. x x Z Solution: Let x (k) = P k y (k), where y (k) is a vector with k components. Then x (k) x 2 Z =(P k y (k) x ) T Z(P k y (k) x ). Differentiating with respect to the components of y (k), and setting the derivative to zero yields Pk T ZP k y (k) = Pk T Zx. Since y (k) and x are both unknown, we usually can t solve this. But we can if we are clever about our choice of Z. 1st special choice of Z Recall: We need to solve P T k ZP ky (k) = P T k Zx, and Ĝx = c. ĜP k = P k+1 H k ( ) P T k+1 BP k+1 = I k+1 Let Z = ĜT BĜ. (This is symmetric, and positive definite if Ĝ is nonsingular.) Then Pk T Zx = Pk T ĜT BĜx = Pk T ĜT Bc = H T k P T k+1bc is computable! The left-hand side also simplifies: P T k ZP k = P T k ĜT BĜP k = H T k P T k+1 BP k+1h k = H T k H k. So we need to solve Hk T H k y (k) = H T k P T k+1bc. This algorithm is called GMRES (generalized minimum residual), due to Saad and Schultz in 1986, and is probably the most often used Krylov method. 2nd special choice of Z Recall: We need to solve P T k ZP ky (k) = P T k Zx and Ĝx = c. ĜP k = P k+1 H k ( ) P T k+1 BP k+1 = I k+1 9

Added assumption: Assume Z = BĜ is symmetric and positive definite. (Note that we need that Z be symmetric and positive definite in order to be minimizing a norm of the error. Our assumption here is that BĜ is symmetric and positive definite.) P T k Zx = P T k BĜx = P T k Bc is computable. The left-hand side also simplifies: P T k ZP k = P T k BĜP k = P T k BP k+1 H k = H k where H k contains the first k rows of H k. So we need to solve H k y (k) = P T k Bc. This algorithm is called conjugate gradients (CG), due to Hestenes and Stiefel in 1952. It is the most often used Krylov method for symmetric problems. Using Krylov projection Ĝx = c. ĜP k = P k+1 H k ( ) P T k+1 BP k+1 = I k+1 Problem: Find x (k) K k so that r (k) = c Ĝx(k) is B-orthogonal to the columns of P k. Solution: 0=Pk T B(c Ĝx(k) )=Pk T B(c ĜP ky (k) ) so we need to solve Pk T BĜP ky (k) = Pk T Bc. or H k y (k) = Pk T Bc. This algorithm is called the Arnoldi iteration. (Note: Same equation as cg, but no assumption of symmetry or positive definiteness.) An important alternative The algorithms we have discussed (GMRES, CG, and Arnoldi) all use the Arnoldi basis. There is another convenient basis, derived using the nonsymmetric Lanczos algorithm. This basis gives rise to several useful algorithms: 10

CG (alternate derivation) bi-conjugate gradients (Bi-CG) (Fletcher 1976) Bi-CGStab (van der Vorst 1992) quasi-minimum residual (QMR) (Freund and Nachtigal 1991) transpose-free QMR (Freund and Nachtigal 1993) (the most useful) CGStab (CG-squared stabilized) (van der Vorst 1989) Advantages: Only a fixed number of vectors are saved, not k. Disadvantages: The basic algorithms can break down terminate without obtaining the solution to the linear system. Fixing this up is messy. We won t take the time to discuss these methods in detail, but they can be useful. The conjugate gradient method In general, we need to save all of the old vectors in order to accomplish the projection of the residual. For some special classes of matrices, we only need a few old vectors. The most important of these classes is symmetric positive definite matrices, just as we need for self-adjoint elliptic PDEs. The resulting algorithm is called conjugate gradients (CG). It is both a minimization algorithm (in the energy norm) and a projection algorithm (B = I). There is a very compact and practical form for the algorithm. The conjugate gradient algorithm 11

[x, r] =cg(a, M, b, tol) Given symmetric positive definite matrices A and M, a vector b, and a tolerance tol, compute an approximate solution to Ax = b. Let r = b, x =0,solveMz = r for z, and let γ = r T z, p = z. for k =0,1,...,, until r <tol, α=γ/(p T Ap) x = x + αp r = r αap Solve Mz = r for z. ˆγ = r T z β =ˆγ/γ, γ =ˆγ p=z+βp end (for k) The matrix M is called the preconditioner, and our next task is to understand what it does and why we might need it. 12