Notes for CS542G (Iterative Solvers for Linear Systems)

Similar documents
Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

Lecture 18 Classical Iterative Methods

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

9. Iterative Methods for Large Linear Systems

Iterative Methods for Solving A x = b

Kasetsart University Workshop. Multigrid methods: An introduction

Solving Linear Systems

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

The Conjugate Gradient Method

CS 542G: The Poisson Problem, Finite Differences

7.2 Steepest Descent and Preconditioning

Iterative Methods. Splitting Methods

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

Numerical Methods I Non-Square and Sparse Linear Systems

CLASSICAL ITERATIVE METHODS

CAAM 454/554: Stationary Iterative Methods

Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods

Introduction. Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods. Example: First Order Richardson. Strategy

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

1 Extrapolation: A Hint of Things to Come

Course Notes: Week 1

Background. Background. C. T. Kelley NC State University tim C. T. Kelley Background NCSU, Spring / 58

Math/Phys/Engr 428, Math 529/Phys 528 Numerical Methods - Summer Homework 3 Due: Tuesday, July 3, 2018

LINEAR SYSTEMS (11) Intensive Computation

Solving PDEs with Multigrid Methods p.1

Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore

Elliptic Problems / Multigrid. PHY 604: Computational Methods for Physics and Astrophysics II

Chapter 7 Iterative Techniques in Matrix Algebra

COURSE Iterative methods for solving linear systems

Eigenvalues & Eigenvectors

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination

Eigenvalues and eigenvectors

Lab 1: Iterative Methods for Solving Linear Systems

Review of matrices. Let m, n IN. A rectangle of numbers written like A =

Iterative Methods and Multigrid

From Stationary Methods to Krylov Subspaces

Notes on Some Methods for Solving Linear Systems

Solving PDEs with CUDA Jonathan Cohen

Next topics: Solving systems of linear equations

EXAMPLES OF CLASSICAL ITERATIVE METHODS

Jordan Journal of Mathematics and Statistics (JJMS) 5(3), 2012, pp A NEW ITERATIVE METHOD FOR SOLVING LINEAR SYSTEMS OF EQUATIONS

MATH 310, REVIEW SHEET 2

9.1 Preconditioned Krylov Subspace Methods

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

Iterative methods for Linear System

5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns

Numerical Programming I (for CSE)

30.5. Iterative Methods for Systems of Equations. Introduction. Prerequisites. Learning Outcomes

Stabilization and Acceleration of Algebraic Multigrid Method

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Numerical Solution Techniques in Mechanical and Aerospace Engineering

GAUSSIAN ELIMINATION AND LU DECOMPOSITION (SUPPLEMENT FOR MA511)

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Solving Sparse Linear Systems: Iterative methods

Solving Sparse Linear Systems: Iterative methods

Systems of Linear ODEs

Physics 202 Laboratory 5. Linear Algebra 1. Laboratory 5. Physics 202 Laboratory

In this section again we shall assume that the matrix A is m m, real and symmetric.

Today s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn

Solving Linear Systems

1 Computing with constraints

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Iterative Methods and Multigrid

Lecture 10: Powers of Matrices, Difference Equations

Math 411 Preliminaries

The Solution of Linear Systems AX = B

Iterative Methods and Multigrid

Applied Mathematics 205. Unit I: Data Fitting. Lecturer: Dr. David Knezevic

Numerical Methods I Solving Nonlinear Equations

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Lecture 11. Fast Linear Solvers: Iterative Methods. J. Chaudhry. Department of Mathematics and Statistics University of New Mexico

4.6 Iterative Solvers for Linear Systems

Eigenvectors and Hermitian Operators

CHAPTER 8: MATRICES and DETERMINANTS

MITOCW ocw-18_02-f07-lec02_220k

Numerical Methods - Numerical Linear Algebra

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

6.4 Krylov Subspaces and Conjugate Gradients

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White

Quasi-Newton Methods

Process Model Formulation and Solution, 3E4

MAT1302F Mathematical Methods II Lecture 19

Scientific Computing: An Introductory Survey

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Extreme Values and Positive/ Negative Definite Matrix Conditions

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

Linear Algebra. PHY 604: Computational Methods in Physics and Astrophysics II

JACOBI S ITERATION METHOD

LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12,

Iterative solvers for linear equations

MATH 320, WEEK 6: Linear Systems, Gaussian Elimination, Coefficient Matrices

Section 4.5 Eigenvalues of Symmetric Tridiagonal Matrices

Algebraic Multigrid as Solvers and as Preconditioner

3.2 Iterative Solution Methods for Solving Linear

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Transcription:

Notes for CS542G (Iterative Solvers for Linear Systems) Robert Bridson November 20, 2007 1 The Basics We re now looking at efficient ways to solve the linear system of equations Ax = b where in this course, we are particularly interested in the case where A is sparse and SPD (symmetric positive-definite). This is the case for the linear systems we have derived for solving the Poisson equation either from finite differences or the Galerkin Finite Element Method. In the one dimensional case, we saw that A was ust tridiagonal for second-order accurate finite differences, or piecewise linear finite elements. This is a very special class of matrices, which we can solve using Cholesky in the optimal O(n) time in fact, LAPACK provides a fast routine for us. However, in two or higher dimensions, A has a much more complicated structure and cannot be so easily factored with Cholesky: the resulting lower triangular factor L must (as has been proven) have significantly more nonzeros than A. Making this work efficiently is nontrivial, and has some fairly severe theoretical limits on time and space efficiency for some problems (in particular, solving PDE s on 3D meshes). An alternative is to look at iterative methods. Here we ll take an initial guess at the solution of the linear system, usually x 0 = 0, and in each iteration improve on it to get a sequence of guesses, or iterates, x k which hopefully will converge to the correct answer. Before getting to methods of constructing these iterates, we need to have an idea of when to stop: how to measure convergence. Compared to general nonlinear optimization, where it s hard to say in general if a method has converged or not, we have it pretty easy here. Define the residual r = b Ax for a guess x; we have the exact solution if and only if r = 0. More to the point, looking at the norm of r is a fairly reliable way of determining if we are close to a solution. In fact, going back to our earlier error analysis of linear systems, the relative error in the solution can be bounded in terms of the residual: x xexact xexact κ(a) r b where κ(a) is the condition number of the matrix. Assuming that the person who set up the linear system is confident κ(a) isn t too large (otherwise, no numerical method can reasonably hope to get an accurate solution!) then it should be fine to say we are converged when r is small enough. Typically the condition is something like r 10 9 b. Another look at the SPD problem, which will be useful later, is that the linear system is equivalent to minimizing the following quadratic: min 2 xt Ax x T b Taking the gradient of this with respect to x gives us the negative residual r in fact, and so the solution (where r = 0) is a minimum of the quadratic. This of course relies crucially on A being positive definite, as otherwise it might not have a unique minimum. x 1 1

Another question we ll put to rest now is the question of the initial guess, x 0. One of the possible attractions for iterative methods is that if for some reason you have a good initial guess, you should be able to save a lot of work. However, it s usually convenient for all these methods to be able to assume that the initial guess is all zero: x 0 = 0. In fact, if we do have a nonzero initial guess, we can change the linear system to solve for the x we need to add to that first guess: Ax = b A(x 0 + x) = b A x = b Ax 0 A x = r 0 So now we ve reduced it to a new linear system for which the initial guess is x 0 = 0, and the right-hand side is actually the initial residual from the original system. Therefore, we ll from now on assume that the initial guess is zero. 2 Stationary Methods The first class of iterative methods we ll look at for solving linear systems are called stationary. This technically means the iterative procedure is actually a linear operator: the function which maps the right hand side b to the k th guess x k is linear. 2.1 Jacobi Jacobi s method can be understood by considering the equations one at a time. The i th equation i.e. the i th row from Ax = b is this: A i1 x 1 + A i2 x 2 +... + x i +... + A in x n = b i (Here I ve switched to using subscripts for indices into a vector, not labelling which iteration we re on.) If we already have a reasonable guess for the solution, we can improve it by solving this equation for an improved new x i : A i1 x old 1 + A i2 x old 2 +... + i +... + A in x old n = b i We can simplify the formula a little if we throw in the old value for x i too: i = b i A ix old = x old i = x old i + 1 r i i = b i i A ix old + x old i + b i A ix old That s the residual popping up in the last formula. In fact, we can use this now to write the iteration for the whole vector. Let D = diag(a) be the diagonal part of A: a diagonal matrix with A 11,..., A nn down the diagonal. Then Jacobi iteration is: Or, now using subscripts for the iteration again, = x old + D 1 r x k+1 = x k + D 1 r k 2

The big question that looms over this is: does this converge? Do we get closer to the solution as we iterate? Let s take a look at what happens to the residual: r k+1 = b Ax k+1 = b A(x k + D 1 r k ) = b Ax k AD 1 r k = r k AD 1 r k = (I AD 1 )r k Aha we now see the k th residual is ust (I AD 1 ) k b, which converges to zero only if all the eigenvalues of the iteration matrix I AD 1 have magnitude less than one. Unfortunately, this is not always guaranteed to hold and in general, it s harder to check if this condition holds (which might require finding eigenvalues) that ust to run Jacobi and see if it works. To get a better feel for what Jacobi might be doing, let s take a look at a model problem, our 2nd order finite difference method for solving the Poisson equation, in one dimension. Assuming Dirichlet boundary conditions, and after rescaling by x 2, we end up with the following linear system: 2 1 1 2 1......... 1 2 x 1 x 2. x n b 1 = b 2. Entering this into MATLAB for a reasonably large n, and taking a look at the eigenvectors it finds, we can make the guess that the eigenvectors are sine waves, i.e. guess that the th eigenvector has entry i equal to sin(πi/(n + 1)), where both i and range over 1 to n. Sure enough, if we plug this into the eigen equation Au = λu 1 (sin(π(i 1)/(n + 1))) + 2 (sin(πi/(n + 1))) 1 (sin(π(i + 1)/(n + 1))) = λ sin(πi/(n + 1)) and use some handy trigonometric formulas, we find it does indeed work, and the eigenvalue is: λ = 2 2 cos(π/(n + 1)) In particular, the eigenvalues range strictly between 0 and 4. The minimum eigenvalue corresponds to the lowest frequency sine wave; the maximum eigenvalue corresponds to the highest frequency. As an aside, something we ll come back to later, it s not too hard to see that the smallest eigenvalue is O(1/n 2 ), from the Taylor series expansion of cosine, and since the largest eigenaluve of O(1) the condition number of A is κ(a) = O(n 2 ). (As another aside: the trig formulas can be a royal pain to work with. One convenient shortcut which gets us most of the way there is to forget about boundary conditions, and ust guess the complex exponential exp( 1ωi) for some frequency ω. Exponentials are much simpler to work with, and of course, sine and cosine really are ust linear comobinations of complex exponentials.) Back to Jacobi: for this model problem D is ust 2I, twice the identity. So the iteration matrix is ust I 1 2A, which must have eigenvalues strictly between -1 and 1. Thus Jacobi converges! Moreover, we can see that the eigencomponents of the residual decay at different rates: for medium frequency components, where λ 2, Jacobi is extremely efficient. But for low frequency components, where λ 0, Jacobi hardly changes things at all, and for high frequency components, where λ 4, Jacobi essentially ust flips the sign back and forth also with very slow decay. Referring back to n, the low and high freqency components are multiplied by ±1 O(1/n 2 ), necessitating O(n 2 ) iterations to converge. b n 2.2 Gauss-Seidel One of the obvious criticisms that could be made of Jacobi is that when we derived the new guess at the i th entry of the solution, we ust used old values for all the other entries. If we re computing them one after the other (as opposed to independently in 3

parallel), why not use updated solution values as they become available? This gives Gauss-Seidel, a considerably more powerful iterative method. The Gauss-Seidel update can be written as a loop over i, from 1 up to n, solving the i th equation for the i th unknown: A i1 1 + A i2 2 +... + i +... + A in x old n = b i i = b i <i A i > ia i x old Compare this with Jacobi to see the difference. Just like with Jacobi we can write it in vector form, by also introducing the strictly lower triangular part of A, labelled L, and the strictly upper triangular of A, labelled U. Note that A = L + D + U. Rearranging the solution formula gives: A i1 1 + A i2 2 +... + i = b +1 x old i... A in x old n or simply: and thus Gauss-Seidel iteration is: (L + D) = b Ux old = (L + D) 1 (b Ux old ) = (L + D) 1 (b (L + D + U)x old + (L + D)x old ) = x old + (L + D) 1 (b Ax old ) = x old + (L + D) 1 r One way to view this is to first observe that the perfect (though impractical) iteration = x old + A 1 r would instantly solve the problem; Jacobi approximated A 1 with D 1 ; Gauss-Seidel uses a better approximation (L + D) 1. Indeed, while we won t prove it in this course, it can be shown that Gauss-Seidel always works for SPD matrices A, making it much more robust than Jacobi. For problems where both converge Gauss-Seidel is usually, though not necessarily always, faster too. For the PDE problem we examined, it turns out to be roughly twice as fast asymptotically still taking O(n 2 ) iterations though. Incidentally, one of the downsides to Gauss-Seidel is that it is apparently inherently sequential, whereas Jacobi was nice and parallel. However, if A is sparse, it may be possible to use some artful graph theory to expose parallelism in Gauss-Seidel. View the nonzero pattern of A as the adacency matrix of a graph on n nodes, i.e. a graph where there is an edge between nodes i and if and only if A i 0. An independent set of nodes is a set where there are no edges between nodes in the set: you should be able to convince yourself that in Gauss-Seidel we can update an independent set in parallel. If we then colour the nodes of the graph so that each colour induces an independent set (i.e. the usual notion of graph colouring), we see we can solve each colour in parallel. For the finite difference discretization of the Poisson problem, it turns out you only need two colours, alternating as on a chessboard. This is called the red-black version of Gauss-Seidel. For reasons we won t get to in this course, red-black Gauss-Seidel can be significantly more effective, in some contexts, than regular Gauss-Seidel even when running in a single thread of execution. 2.3 Successive Over-Relaxation (SOR) Recall from the preamble that, when we look at the linear system as a quadratic minimization problem, the residual was exactly the negative gradient. We can interpret both Jacobi and Gauss-Seidel as approximations to Newton s method: x k+1 = x k H 1 g = x k A 1 ( r) 4

where H = A is the Hessian and g = r is the gradient. In Jacobi the Hessian is approximated with ust its diagonal part, and in Gauss-Seidel is approximating with the lower triangular part. In both cases we use the natural Newton step length of α = 1. However, we saw that Newton can be made a bit more robust or powerful by allowing more general step lengths. This gives us the idea behind Successive Over-Relaxation, SOR for short. (This peculiar name goes back to a tradition of calling iterative methods for linear systems relaxation methods, probably due to a physical interpretation in terms of underlying PDE problems, and the notion that Gauss-Seidel successively relaxes each equation.) Here we figure if the Gauss-Seidel step takes us towards the answer, going aggressively a little further might get us there faster: x k+1 = x k + ω(l + D) 1 r k Here ω is a parameter; ω = 1 corresponds to plain Gauss-Seidel. Usually faster convergence is possible for well-chosen 1 < ω < 2, but the actual range in which a significant improvement can be had is typically very small and hard to find in general. That said, for some PDE problems people have worked out optimal parameters, which lead to asymptotically faster convergence (i.e. more than ust a constant factor improvement in the number of iterations). We ll end stationary methods here, though I should point out there are plenty more that people have derived and can be extremely useful in practice. For example, there is a method called MultiGrid which, for the Poisson problem we ve studied, gives an optimal solution (converging in O(1) iterations, independent of n, with O(n) work per iteration). However, MultiGrid and similarly powerful methods are rather more complicated than we have time for. 5