The Conjugate Gradient Method

Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large as possible, because this gives the most accurate solution. It is not necessary to solve the system with higher accuracy than the discretization error. Therefore, an approximation such that is acceptable if is comparable with the discretization error.

Matrix properties Properties of the matrix: A is symmetric and positive definite, defines an inner product. A can be factorized A has a Cholesky factorization, is sparse, is not so sparse,.., where, and the corresponding norm, is lower triangular...

Classical Iterations Before we go to the Conjugate Gradient (CG) method, we give a short review of classical iterations. This will hopefully illustrate the differences between these iterative methods. The classical iterations consist of splitting the matrix: The linear system is then, which suggests the iteration or The Jacobi, Gauss-Seidel and SOR methods are variants of this method.

Classical Iterations II We introduce a more specific matrix splitting: where D is a diagonal matrix, E is a strictly lower triangular matrix, F is a strictly upper triangular matrix. The classical iterations then read: Jacobi: Damped Jacobi: Gauss-Seidel:

Classical Iterations III SOR: Other representations can also be found.

Classical Iterations IV However, it is more efficient to implement these methods as algorithms. We then avoid to construct the matrices and their inverses. Jacobi: Damped Jacobi: Gauss-Seidel:

Classical Iterations V SOR:

Framework for Classical Iterations All the above methods can be represented as, or and The iteration (1) is usually called the preconditioned simple or Richardson iteration.

Generalization All linear iterations can be expressed like this. The algorithms produce an output vector, given an input vector Hence, the notation makes sense. Multigrid, domain decomposition as well as Jacobi, Gauss-Seidel, and SOR can be represented as a matrix and the iteration (1) or (2). The algorithmic representation is more efficient than the matrix formulation..

Richardson iteration The simplest of all iterations is The residual is. This method is called the simple or the Richardson iteration. Hence, Jacobi, Gauss-Seidel, SOR, domain decomposition and multigrid can be represented as a preconditioned Richardson iteration: We have introduced a damping parameter maximal performance.., that can be tuned for

with respect to the Convergence A sufficient condition for convergence is that such that, Lets choose are the extreme eigenvalues of and where. -inner product. The condition number is

Convergence II we get an error estimate, With this choice of

. Spectral Equivalence I and are both SPD and If and are spectrally equivalent, denoted by,.. then then and Then If If

Spectral Equivalence II : If. Then the condition number

Spectral Equivalence II There are four equivalent forms,

? What should Nonlinear iterations Nonlinear iterations can not be represented as a matrix. A general iteration can be written, is the search direction. is the length of the step in the search direction. How can we determine and and fulfill? We should find the solution in at most number of unknowns. iterations, where We should be able to utilize a stop criteria such that usually needed. is the iterations is not

FEM: The Galerkin Method We seek the solution of the Poisson problem, We know that if In general, then a solution is hard to construct explicitly. Instead we seek a numerical approximation, In FEM we construct a subspace of, exists. The numerical approximation we seek in the Galerkin approach is the -projection of on. In other words,..

Minimization Problem We have two equivalent formulations of the Poisson problem. is the minimizer of The solution That is,

Minimization Problem II projection, is the Similarly the approximate solution or (the minimizer of the energy functional) or (the solution that cause the least error in the energy norm)

CG in a Nutshell The Conjugate-Gradient method employs these properties. be a subspace defined somehow. Let, satisfy: A is a SPD matrix. Then the kth approximation in the CG method, and and

CG in a Nutshell II Of course these observations does not necessarily lead us to a "fast" solution algorithm. What are the subspaces and the basis of We do not want to solve a linear system in each iteration (at least not a large one).?

Cayley-Hamilton is a polynomial in, then and Given In other words,

Krylov Subspaces The Krylov subspaces consist of these A-polynomial. Let, This is a nested sequence of subspaces, then if.

A Basis for the Krylov Spaces, form an orthogonal basis, for The residuals, Proof:. as the -projection of 1. The Galerkin method defines 2. This means that,., is orthogonal to The residual,. This means that for., 3. Since. for 4. It follows that

A Basis for the Krylov Spaces II Hence, we have a candidate for the basis in our Krylov spaces. Therefore let,. The coefficients determined from the Galerkin property. are to be This does not help us much since However, we can switch the product, : is the unknown. inner product with the -inner The right hand side is known

A Basis for the Krylov spaces III Hence, we must solve a linear system, where coefficients. is a dense matrix. It would be much better if, and the s are the expansion The matrix would then have been diagonal and easy to invert. The approximation However, the somehow. should not be expanded in terms of s span the Krylov subspaces and should be used.

Search vectors Lets go back to the original iteration: Lets assume that we have a set of search vectors, What properties should How can we determine have??.

Minimization Problem can be restated as a minimization is SPD and the problem problem. If is then the minimizer of F. The solution

Minimization Problem II Hence, we can consider as a search direction and seek the solution of a 1-dimensional minimization problem: This leads to Since, Therefore, is determined by And

The Galerkin Method With the spaces we can determine an approximation by Another way to write this is This is precisely what the Conjugate Gradient method does. The vectors are then the search vector and. We will now see how they can be determined and used in a clever way.

Search vectors Given a search vector, we can determine the length We know that the residual vectors span the appropriate Krylov spaces. The residuals were orthogonal with respect to the but not with respect to. A dense matrix had to be solved. We should instead use -orthogonal search vectors.. inner product,

Conjugate Directions? We have assumed: How to construct This leads to : is optimal with respect to Since, However, is determined by a 1-dimensional search. This may potentially lead to trouble.

? Conjugate Directions II optimal with respect to direction. Is The search was done in the The residual can be expressed as: Hence, we can write

Conjugate Directions III We know that This is how optimal to all the previous search vectors if was determined. Therefore we see that the residual will be The term conjugate means "A-orthogonal". We should chose in this fashion How can we do that? We know that The new search vectors are contained in the residual. We use the Gram-Schmidt orthogonalization process. We expand the residual in terms of "old" search vectors: and subtract the expansion in

Conjugate Directions V. are linearly independent and and The residuals for Thus the search vectors can be computed iteratively.

Algorithm CG 1. 2. 3. 4. while do (a) (b) (c) (d) (e) (f)

Comments on the Algorithm. is the residual. is the preconditioned residual. This is often used as convergence criterion, because it can be used to determine the error. We have:, Therefore, if

Convergence The convergence estimate of the Conjugate Gradient method is: we have: and Additionally, for some We will now see where these estimates come from.

Polynomials Let be a polynomial in terms of (univariate) Let denote the class of such polynomials of degree. We have an orthonormal basis of eigenvectors, symmetric). The corresponding eigenvalues, positive). The polynomial in terms of the eigenfunctions are, (the matrix is, are all positive (the matrix is, since

Polynomials II solves the minimization problem. Then If Proof:, 1. We have for an arbitrary. where

Polynomials III 2. From (3) we have, 3. Similarly, 4. Combining (4), (5) and (6) we have for are the eigenvectors for symmetric matrix. Since

with Polynomials IV Minimizing over is the same as minimizing over all and the proof is complete, is by definition spanned by polynomials on the form

with Polynomials V We have that, can be where is an interval containing all the eigenvalues of and any polynomial in. Proof: 1. 2. 3. Therefore

Chebyshev Polynomials The Chebyshev polynomials are defined recursively by, They can be represented as, 1. 2.

with Finally, the Convergence Estimate We can now prove the following convergence estimate of the Conjugate Gradient method: Proof: 1. We have already shown,. For all polynomials in 2. Therefore we choose the Chebyshev polynomials to derive an estimate. Let, 3. We have,

Finally, the Convergence Estimate 4. We have, 5. We use the Chebyshev polynomial to derive an estimate from point 1.

Preconditioning We had a linear system to be solved, However, A had a large condition number and this resulted in slow convergence for the CG. The idea of preconditioning is simply to replace the system with another equivalent system. with the same solution. This can improve the convergence rate of CG dramatically, if B is well designed.

Preconditioning and are is not necessarily symmetric, even though However, if B is positive and symmetric then:. bas a Cholesky factorization, We can apply CG to the system and We can define the inner products inner product. is symmetric with respect to the We can apply CG to the system -inner product. in the

Desired properties of B should be cheap in storage, O(n) fast to evaluate, O(n) similar to does not need to be a matrix. However, it should be a linear operator. The only action needed by is. Additionally, we usually want the operator to be positive and symmetric such that it defines an inner product.

Algorithm PCG 1. 2. 3. 4. 5. while do (a) (b) (c) (d) (e) (f) (g) (h)

Comments on the Algorithm Even though algorithm. Only the action of Hence,, and enter the analysis they do not enter the on a vector is needed. does not need to be formed as a matrix. It can be an algorithm like multigrid or Gauss-Seidel.

Multigrid preconditioner Assuming that is a discretization of the Poisson equation. The multigrid preconditioner is then an operator spectrally equivalent with the inverse. That is or, independent of the grid size., independent of the grid size.

Multigrid preconditioner II Multigrid is an optimal preconditioner for where is a matrix from a discretization of What about Giving the linear system From the anisotropic problem and the problem with the jumping coefficient we know that multigrid does not necessarily work well. This problem is without rigorous theoretical justification

Multigrid preconditioner III Let us use the multigrid preconditioner for the Poisson problem as a preconditioner for the problem with a variable coefficient. We can easily verify, Therefore, using (9) we have, We have:. is The condition number (slide 14) of. The convergence of CG is determined by

Jumping coefficients The problem reads, where,,. Such problems occur in, e.g., groundwater flow and reservoir simulation, where is the permeability of the media, consisting of rock and sand.

The jump f1 k1 k0 f0 f1 is a source, where, e.g., the water is pumped in, f2 is a drain, where oil is supposed to come out, k1 is sand, k0 is rock.

Preconditioning vs. Solver IV Multigrid as solver: "# # " $ # # " $ % # # # " $ # # # " $ &' # # # " $ ( # Multigrid as preconditioner: ) ) "# # " $ # # " $ ' # # # " $ * # # # " $ & # # # " $ #

Preconditioning vs. Solver V Using the Conjugate Gradient method did pay of Multigrid did not even converge. Multigrid manages to reduce most of the error components. Some remain essensially unchanged. These are picked up efficiently by the Conjugate Gradient method.

) ) ) $ $ $ # ) $ CG in Diffpack A matter of editing the input file. $ # # # # # # # ) # # ) # $ " # # ) # # " # $ # # " #