The Conjugate Gradient Method

Similar documents
Chapter 7 Iterative Techniques in Matrix Algebra

9.1 Preconditioned Krylov Subspace Methods

Lab 1: Iterative Methods for Solving Linear Systems

Iterative Methods for Solving A x = b

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Preface to the Second Edition. Preface to the First Edition

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

1. Fast Iterative Solvers of SLE

Stabilization and Acceleration of Algebraic Multigrid Method

Numerical Methods I Non-Square and Sparse Linear Systems

Linear Solvers. Andrew Hazel

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

M.A. Botchev. September 5, 2014

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Introduction to Iterative Solvers of Linear Systems

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Computational Linear Algebra

Lecture 18 Classical Iterative Methods

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

Course Notes: Week 1

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Conjugate Gradient Method

Iterative methods for Linear System

APPLIED NUMERICAL LINEAR ALGEBRA

Contents. Preface... xi. Introduction...

AMS526: Numerical Analysis I (Numerical Linear Algebra)

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

FEM and sparse linear system solving

4.6 Iterative Solvers for Linear Systems

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

Simple iteration procedure

Solving Sparse Linear Systems: Iterative methods

Solving Sparse Linear Systems: Iterative methods

Numerical Methods in Matrix Computations

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Algebraic Multigrid as Solvers and as Preconditioner

6.4 Krylov Subspaces and Conjugate Gradients

Iterative Methods for Sparse Linear Systems

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preconditioning Techniques Analysis for CG Method

Conjugate Gradients: Idea

1 Extrapolation: A Hint of Things to Come

Notes for CS542G (Iterative Solvers for Linear Systems)

Notes on Some Methods for Solving Linear Systems

The Conjugate Gradient Method

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2

Kasetsart University Workshop. Multigrid methods: An introduction

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White

Iterative Methods. Splitting Methods

In order to solve the linear system KL M N when K is nonsymmetric, we can solve the equivalent system

CLASSICAL ITERATIVE METHODS

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

9. Iterative Methods for Large Linear Systems

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Conjugate Gradient (CG) Method

Classical iterative methods for linear systems

From Stationary Methods to Krylov Subspaces

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems

FEM and Sparse Linear System Solving

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU

Applied Linear Algebra

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Iterative Methods and Multigrid

Solving PDEs with Multigrid Methods p.1

Numerical Mathematics

Background. Background. C. T. Kelley NC State University tim C. T. Kelley Background NCSU, Spring / 58

Lecture 10b: iterative and direct linear solvers

Master Thesis Literature Study Presentation

Index. higher order methods, 52 nonlinear, 36 with variable coefficients, 34 Burgers equation, 234 BVP, see boundary value problems

A MULTIGRID ALGORITHM FOR. Richard E. Ewing and Jian Shen. Institute for Scientic Computation. Texas A&M University. College Station, Texas SUMMARY

1 Conjugate gradients

On the interplay between discretization and preconditioning of Krylov subspace methods

2.29 Numerical Fluid Mechanics Spring 2015 Lecture 9

Lecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University

Motivation: Sparse matrices and numerical PDE's

Numerical Methods - Numerical Linear Algebra

Computational Linear Algebra

Linear algebra II Homework #1 due Thursday, Feb. 2 A =. 2 5 A = When writing up solutions, write legibly and coherently.

Solving Ax = b, an overview. Program

7.2 Steepest Descent and Preconditioning

Computational Linear Algebra

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Algebra C Numerical Linear Algebra Sample Exam Problems

Theory of Iterative Methods

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

INTRODUCTION TO MULTIGRID METHODS

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti

LARGE SPARSE EIGENVALUE PROBLEMS

Solving Linear Systems

Lecture 17: Iterative Methods and Sparse Linear Algebra

Conjugate Gradient algorithm. Storage: fixed, independent of number of steps.

Notes on PCG for Sparse Linear Systems

Review problems for MA 54, Fall 2004.

Transcription:

The Conjugate Gradient Method

Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large as possible, because this gives the most accurate solution. It is not necessary to solve the system with higher accuracy than the discretization error. Therefore, an approximation such that is acceptable if is comparable with the discretization error.

Matrix properties Properties of the matrix: A is symmetric and positive definite, defines an inner product. A can be factorized A has a Cholesky factorization, is sparse, is not so sparse,.., where, and the corresponding norm, is lower triangular...

Classical Iterations Before we go to the Conjugate Gradient (CG) method, we give a short review of classical iterations. This will hopefully illustrate the differences between these iterative methods. The classical iterations consist of splitting the matrix: The linear system is then, which suggests the iteration or The Jacobi, Gauss-Seidel and SOR methods are variants of this method.

Classical Iterations II We introduce a more specific matrix splitting: where D is a diagonal matrix, E is a strictly lower triangular matrix, F is a strictly upper triangular matrix. The classical iterations then read: Jacobi: Damped Jacobi: Gauss-Seidel:

Classical Iterations III SOR: Other representations can also be found.

Classical Iterations IV However, it is more efficient to implement these methods as algorithms. We then avoid to construct the matrices and their inverses. Jacobi: Damped Jacobi: Gauss-Seidel:

Classical Iterations V SOR:

Framework for Classical Iterations All the above methods can be represented as, or and The iteration (1) is usually called the preconditioned simple or Richardson iteration.

Generalization All linear iterations can be expressed like this. The algorithms produce an output vector, given an input vector Hence, the notation makes sense. Multigrid, domain decomposition as well as Jacobi, Gauss-Seidel, and SOR can be represented as a matrix and the iteration (1) or (2). The algorithmic representation is more efficient than the matrix formulation..

Richardson iteration The simplest of all iterations is The residual is. This method is called the simple or the Richardson iteration. Hence, Jacobi, Gauss-Seidel, SOR, domain decomposition and multigrid can be represented as a preconditioned Richardson iteration: We have introduced a damping parameter maximal performance.., that can be tuned for

with respect to the Convergence A sufficient condition for convergence is that such that, Lets choose are the extreme eigenvalues of and where. -inner product. The condition number is

Convergence II we get an error estimate, With this choice of

. Spectral Equivalence I and are both SPD and If and are spectrally equivalent, denoted by,.. then then and Then If If

Spectral Equivalence II : If. Then the condition number

Spectral Equivalence II There are four equivalent forms,

? What should Nonlinear iterations Nonlinear iterations can not be represented as a matrix. A general iteration can be written, is the search direction. is the length of the step in the search direction. How can we determine and and fulfill? We should find the solution in at most number of unknowns. iterations, where We should be able to utilize a stop criteria such that usually needed. is the iterations is not

FEM: The Galerkin Method We seek the solution of the Poisson problem, We know that if In general, then a solution is hard to construct explicitly. Instead we seek a numerical approximation, In FEM we construct a subspace of, exists. The numerical approximation we seek in the Galerkin approach is the -projection of on. In other words,..

Minimization Problem We have two equivalent formulations of the Poisson problem. is the minimizer of The solution That is,

Minimization Problem II projection, is the Similarly the approximate solution or (the minimizer of the energy functional) or (the solution that cause the least error in the energy norm)

CG in a Nutshell The Conjugate-Gradient method employs these properties. be a subspace defined somehow. Let, satisfy: A is a SPD matrix. Then the kth approximation in the CG method, and and

CG in a Nutshell II Of course these observations does not necessarily lead us to a "fast" solution algorithm. What are the subspaces and the basis of We do not want to solve a linear system in each iteration (at least not a large one).?

Cayley-Hamilton is a polynomial in, then and Given In other words,

Krylov Subspaces The Krylov subspaces consist of these A-polynomial. Let, This is a nested sequence of subspaces, then if.

A Basis for the Krylov Spaces, form an orthogonal basis, for The residuals, Proof:. as the -projection of 1. The Galerkin method defines 2. This means that,., is orthogonal to The residual,. This means that for., 3. Since. for 4. It follows that

A Basis for the Krylov Spaces II Hence, we have a candidate for the basis in our Krylov spaces. Therefore let,. The coefficients determined from the Galerkin property. are to be This does not help us much since However, we can switch the product, : is the unknown. inner product with the -inner The right hand side is known

A Basis for the Krylov spaces III Hence, we must solve a linear system, where coefficients. is a dense matrix. It would be much better if, and the s are the expansion The matrix would then have been diagonal and easy to invert. The approximation However, the somehow. should not be expanded in terms of s span the Krylov subspaces and should be used.

Search vectors Lets go back to the original iteration: Lets assume that we have a set of search vectors, What properties should How can we determine have??.

Minimization Problem can be restated as a minimization is SPD and the problem problem. If is then the minimizer of F. The solution

Minimization Problem II Hence, we can consider as a search direction and seek the solution of a 1-dimensional minimization problem: This leads to Since, Therefore, is determined by And

The Galerkin Method With the spaces we can determine an approximation by Another way to write this is This is precisely what the Conjugate Gradient method does. The vectors are then the search vector and. We will now see how they can be determined and used in a clever way.

Search vectors Given a search vector, we can determine the length We know that the residual vectors span the appropriate Krylov spaces. The residuals were orthogonal with respect to the but not with respect to. A dense matrix had to be solved. We should instead use -orthogonal search vectors.. inner product,

Conjugate Directions? We have assumed: How to construct This leads to : is optimal with respect to Since, However, is determined by a 1-dimensional search. This may potentially lead to trouble.

? Conjugate Directions II optimal with respect to direction. Is The search was done in the The residual can be expressed as: Hence, we can write

Conjugate Directions III We know that This is how optimal to all the previous search vectors if was determined. Therefore we see that the residual will be The term conjugate means "A-orthogonal". We should chose in this fashion How can we do that? We know that The new search vectors are contained in the residual. We use the Gram-Schmidt orthogonalization process. We expand the residual in terms of "old" search vectors: and subtract the expansion in

Conjugate Directions V. are linearly independent and and The residuals for Thus the search vectors can be computed iteratively.

Algorithm CG 1. 2. 3. 4. while do (a) (b) (c) (d) (e) (f)

Comments on the Algorithm. is the residual. is the preconditioned residual. This is often used as convergence criterion, because it can be used to determine the error. We have:, Therefore, if

Convergence The convergence estimate of the Conjugate Gradient method is: we have: and Additionally, for some We will now see where these estimates come from.

Polynomials Let be a polynomial in terms of (univariate) Let denote the class of such polynomials of degree. We have an orthonormal basis of eigenvectors, symmetric). The corresponding eigenvalues, positive). The polynomial in terms of the eigenfunctions are, (the matrix is, are all positive (the matrix is, since

Polynomials II solves the minimization problem. Then If Proof:, 1. We have for an arbitrary. where

Polynomials III 2. From (3) we have, 3. Similarly, 4. Combining (4), (5) and (6) we have for are the eigenvectors for symmetric matrix. Since

with Polynomials IV Minimizing over is the same as minimizing over all and the proof is complete, is by definition spanned by polynomials on the form

with Polynomials V We have that, can be where is an interval containing all the eigenvalues of and any polynomial in. Proof: 1. 2. 3. Therefore

Chebyshev Polynomials The Chebyshev polynomials are defined recursively by, They can be represented as, 1. 2.

with Finally, the Convergence Estimate We can now prove the following convergence estimate of the Conjugate Gradient method: Proof: 1. We have already shown,. For all polynomials in 2. Therefore we choose the Chebyshev polynomials to derive an estimate. Let, 3. We have,

Finally, the Convergence Estimate 4. We have, 5. We use the Chebyshev polynomial to derive an estimate from point 1.

Preconditioning We had a linear system to be solved, However, A had a large condition number and this resulted in slow convergence for the CG. The idea of preconditioning is simply to replace the system with another equivalent system. with the same solution. This can improve the convergence rate of CG dramatically, if B is well designed.

Preconditioning and are is not necessarily symmetric, even though However, if B is positive and symmetric then:. bas a Cholesky factorization, We can apply CG to the system and We can define the inner products inner product. is symmetric with respect to the We can apply CG to the system -inner product. in the

Desired properties of B should be cheap in storage, O(n) fast to evaluate, O(n) similar to does not need to be a matrix. However, it should be a linear operator. The only action needed by is. Additionally, we usually want the operator to be positive and symmetric such that it defines an inner product.

Algorithm PCG 1. 2. 3. 4. 5. while do (a) (b) (c) (d) (e) (f) (g) (h)

Comments on the Algorithm Even though algorithm. Only the action of Hence,, and enter the analysis they do not enter the on a vector is needed. does not need to be formed as a matrix. It can be an algorithm like multigrid or Gauss-Seidel.

Multigrid preconditioner Assuming that is a discretization of the Poisson equation. The multigrid preconditioner is then an operator spectrally equivalent with the inverse. That is or, independent of the grid size., independent of the grid size.

Multigrid preconditioner II Multigrid is an optimal preconditioner for where is a matrix from a discretization of What about Giving the linear system From the anisotropic problem and the problem with the jumping coefficient we know that multigrid does not necessarily work well. This problem is without rigorous theoretical justification

Multigrid preconditioner III Let us use the multigrid preconditioner for the Poisson problem as a preconditioner for the problem with a variable coefficient. We can easily verify, Therefore, using (9) we have, We have:. is The condition number (slide 14) of. The convergence of CG is determined by

Jumping coefficients The problem reads, where,,. Such problems occur in, e.g., groundwater flow and reservoir simulation, where is the permeability of the media, consisting of rock and sand.

The jump f1 k1 k0 f0 f1 is a source, where, e.g., the water is pumped in, f2 is a drain, where oil is supposed to come out, k1 is sand, k0 is rock.

Preconditioning vs. Solver IV Multigrid as solver: "# # " $ # # " $ % # # # " $ # # # " $ &' # # # " $ ( # Multigrid as preconditioner: ) ) "# # " $ # # " $ ' # # # " $ * # # # " $ & # # # " $ #

Preconditioning vs. Solver V Using the Conjugate Gradient method did pay of Multigrid did not even converge. Multigrid manages to reduce most of the error components. Some remain essensially unchanged. These are picked up efficiently by the Conjugate Gradient method.

) ) ) $ $ $ # ) $ CG in Diffpack A matter of editing the input file. $ # # # # # # # ) # # ) # $ " # # ) # # " # $ # # " #