Conjugate Gradient (CG) Method

Similar documents
Chapter 7 Iterative Techniques in Matrix Algebra

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Iterative Methods for Solving A x = b

EECS 275 Matrix Computation

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Course Notes: Week 1

The Conjugate Gradient Method

Notes on Some Methods for Solving Linear Systems

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

PETROV-GALERKIN METHODS

Conjugate Gradient Method

Iterative methods for Linear System

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

The conjugate gradient method

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

Chap 3. Linear Algebra

Review of Linear Algebra

Linear Solvers. Andrew Hazel

Linear Algebra: Matrix Eigenvalue Problems

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

Conjugate Gradient Method

Econ Slides from Lecture 7

CS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3

Jordan Journal of Mathematics and Statistics (JJMS) 5(3), 2012, pp A NEW ITERATIVE METHOD FOR SOLVING LINEAR SYSTEMS OF EQUATIONS

Math 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019

JACOBI S ITERATION METHOD

Lecture 7. Econ August 18

Numerical Optimization

Numerical Methods - Numerical Linear Algebra

Iterative techniques in matrix algebra

Lecture 8 : Eigenvalues and Eigenvectors

Notes on PCG for Sparse Linear Systems

the method of steepest descent

The Conjugate Gradient Method

Properties of Matrices and Operations on Matrices

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

CS 246 Review of Linear Algebra 01/17/19

Jae Heon Yun and Yu Du Han

Eigenvalues and eigenvectors

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Lecture 11. Fast Linear Solvers: Iterative Methods. J. Chaudhry. Department of Mathematics and Statistics University of New Mexico

Numerical Linear Algebra Homework Assignment - Week 2

Lecture 10: Eigenvectors and eigenvalues (Numerical Recipes, Chapter 11)

Here is an example of a block diagonal matrix with Jordan Blocks on the diagonal: J

Math 489AB Exercises for Chapter 2 Fall Section 2.3

4.6 Iterative Solvers for Linear Systems

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

This property turns out to be a general property of eigenvectors of a symmetric A that correspond to distinct eigenvalues as we shall see later.

6.4 Krylov Subspaces and Conjugate Gradients

Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε,

Iterative Methods. Splitting Methods

Solving Linear Systems

MATRIX ALGEBRA. or x = (x 1,..., x n ) R n. y 1 y 2. x 2. x m. y m. y = cos θ 1 = x 1 L x. sin θ 1 = x 2. cos θ 2 = y 1 L y.

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Problems in Linear Algebra and Representation Theory

2. Matrix Algebra and Random Vectors

7.2 Steepest Descent and Preconditioning

Course Notes: Week 4

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

9.1 Preconditioned Krylov Subspace Methods

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.

Jim Lambers MAT 610 Summer Session Lecture 1 Notes

R-Linear Convergence of Limited Memory Steepest Descent

Matrix Algebra: Summary

Introduction to Matrix Algebra

The Solution of Linear Systems AX = B

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

The Conjugate Gradient Method

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Gradient Descent Methods

Symmetric Matrices and Eigendecomposition

B553 Lecture 5: Matrix Algebra Review

Numerical Linear Algebra

The Conjugate Gradient Algorithm

Lecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

Math 577 Assignment 7

From Stationary Methods to Krylov Subspaces

MAT Linear Algebra Collection of sample exams

ELE/MCE 503 Linear Algebra Facts Fall 2018

The Conjugate Gradient Method

Math 108b: Notes on the Spectral Theorem

Programming, numerics and optimization

Chapter 3 Transformations

7.3 The Jacobi and Gauss-Siedel Iterative Techniques. Problem: To solve Ax = b for A R n n. Methodology: Iteratively approximate solution x. No GEPP.

Algebra C Numerical Linear Algebra Sample Exam Problems

M.A. Botchev. September 5, 2014

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

STAT200C: Review of Linear Algebra

1 Linear Algebra Problems

Math 411 Preliminaries

Econ Slides from Lecture 8

Nonlinear Optimization: What s important?

Linear Algebra Massoud Malek

Positive Definite Matrix

Spectral Theorem for Self-adjoint Linear Operators

Chapter 10 Conjugate Direction Methods

Linear System Theory

Math 5630: Iterative Methods for Systems of Equations Hung Phan, UMass Lowell March 22, 2018

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Transcription:

Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous equations. 2 Minimization problem Consider the quadratic function given by Q (x) = 1 2 xt A x b T x, A R n n, b, x R n, (1) where A is a symmetric positive definite matrix. This quadratic function attains its minimum at the solution of the linear algebraic equation A x = b, which will be denoted by x. A variety of methods for solving this minimization problem have been derived. The general form of these algorithms is x k+1 = x k α k p k, k = 0, 1,..., (2) where p k is the direction vector at the kth iteration. The parameter α k is determined so as to minimize the objective function Q(x), that is, α k is the value satisfying Since in the expression Q (x k α k p k ) = min α Q (x k α p k ). (3) Q (x k α k p k ) = 1 2 (pt k A p k ) α 2 k p T k (A x k b) α k + 1 2 xt k (Ax k 2 b), (4) the coefficients of α 2 k is positive, this function attains its minimum at α k = p T k (A x k b)/(pt k A p k ), (5) for the fixed x k and p k. Next we consider the method for choosing the direction vectors p k. 3 Univariate iteration If we put p k = e i (the unit vector whose ith entry is and the others are 0) in (2), then with the α k given by (5) we have ( ) x k+1 = x k α k e i = x k 1 a a ij x k j b i e i, (6) ii j=1 1

since e T i A e i = a ii and e T i (A x b) = a ij x j b i. This means that the point x k+1 given by (6) is the minimum point which can be obtained by changing only the ith component of x k. Thus, the first n successive iterations of (6) with j=1 p 0 = e 1, p 1 = e 2,..., p n 1 = e n (7) is equivalent to the one iteration of the well-known Gauss Seidel method. 4 Steepest descent method On the other hand, if we choose p k as the gradient vector of Q(x) at the point x k, then we have the iteration method given by since where ( Q Q(x) =,..., Q ) T. x 1 x n This method is called the steepest descent method. 5 Conjugate direction method Let us consider the case that p k are chosen as x k+1 = x k α k (A x k b), (8) p k = Q(x k ) = A x k b, (9) p T i A p j = 0, i j. (10) When the direction vectors p k are taken in this way, the method is called the conjugate direction method. The vectors p i and p j satisfying (10) are said to be A-conjugate. The conjugate direction method converges in n steps, if no roundoff occurs. Here we show the convergence of the method. First of all, notice that (A x k+1 b) T p j = (A x k α k A p k b) T p j = (A x k b) T p j α k (A p k ) T p j = (A x k b) T p j, and as a result, we have (A x k+1 b) T p j = { (A x k b) T p j, j < k, 0, j = k. (11) 2

Thus we have (A x n b) T p j = (A x n 1 b) T p j = = (A x j+1 b) T p j = 0, j = 0,...,n 1. Therefore, the vector A x n b is orthogonal to the n linearly independent vectors p 0,...,p n 1, that is, A x n b = 0. Here we consider the method of generating the conjugate vectors p j. One might think of the method of selecting eigenvectors of A as p j (j = 1,..., n), since the two distinct eigenvectors of A, say p i, p j, satisfy p T i A p j = λ j pt i p j = 0. However, in general, the computation of all eigenvectors is far more expensive than solving linear equations, so that this method is not practical at all. The conjugate gradient method to be explained in the next section generates the conjugate direction vectors by using a relatively cheap procedure. 6 Conjugate gradient method Here we show the algorithm: CG-1 1: Choose a small value ε > 0 and an initial guess x 0 2: p 0 = r 0 = b A x 0, and compute (r 0, r 0 ) 3: k = 0 4: while r k / b ε do 5: α k = (r k, p k )/(p k, A p k ) 6: x k+1 = x k α k p k 7: r k+1 = r k + α k A p k 8: β k = (p k, A r k+1 )/(p k, A p k ) 9: p k+1 = r k+1 + β k p k 10: k = k + 1 11: end while The vectors p k generated by the algorithm satisfy the conjugacy condition (p k, A p j ) = 0, 0 j < k, k = 1,..., n 1, (12) and the residuals r j (= b A x j ) satisfy the orthogonality condition(see Appendix) (r k, r j ) = 0, 0 j < k, k = 1,...,n 1. (13) Moreover, from the definitions of α j and β j we have (p j, r j+1 ) = (p j, r j ) + α j (p j, A p j ) = 0, j = 0, 1,... (14) (p j, A p j+1 ) = (p j, A r j+1 ) + β j (p j, A p j ) = 0, j = 0, 1,... (15) Using the relations above, we have a variant of CG-1: 3

CG-2 1: Choose a small value ε > 0 and an initial guess x 0 2: p 0 = r 0 = b A x 0, and compute (r 0, r 0 ) 3: k = 0 4: while r k / b ε do 5: α k = (r k, r k )/(p k, A p k ) 6: x k+1 = x k α k p k 7: r k+1 = r k + α k A p k 8: β k = (r k+1, r k+1 )/(r k, r k ) 9: p k+1 = r k+1 + β k p k 10: k = k + 1 11: end while The number of the operations to be performed per step in this algorithm is shown in Table 1, which is a tremendous improvement over CG-1. Table 1.Number of the operations per step in CG-2. matrix-vector product 1 inner product 2 vector addition 3 scalar division 1 scalar comparison 1 4

Here we show the experimantal result for some 10000-dimensional equation. 0-2 -4 ( xk x k log ) 10 x -6-8 -10 ( ) rk log 10 b -12-14 -16-18 0 2000 4000 6000 8000 10000 k Relative residual and error versus iteration number k. References [1] James M. Ortega, Introduction to Parallel and Vector Solution of Linear Systems, 1989, Prenum Press, New York. [2] Gilbert W. Stewart, Afternotes Goes to Graduate School, 1997, SIAM, Philadelphia. 5

Problem 1. For the function P(x) given by P(x) = 1 2 (x x ) T A (x x ), show that the relation P(x) = Q(x) + 1 2 (x ) T Ax, where x is the solution of A x = b, and Q(x) is the function given by (1). 2. Prove equation (4). 3. Show that the equation (9). 4. Show that the vectors p 0,...,p n 1 given by (10) are linearly independent. 5. Show that equation (11) is valid. 6. Show that diagonal elements of symmetric positive definite matrices are positive. 7. For CG-1 make the table as Table 1. 8. Show that the two algorithms, CG-1 and CG-2, are equivalent. 9. Show that the matrix given by 2 1 1 2 1 A = is symmetric positive definite. 1 2 1......... 0......... 0 1 2 1 1 2 10. Solve the equation A x = b by CG-1 and CG-2, and compare the CPU times for the case that A is the matrix given by above and the vector b is given by b = A u, where the dimension of the equation is n = 10000 and each component of u is a random number in the range (0, 1). 6

Appendix 1 Inner product Let x = (x 1,...,x n ) T and y = (y 1,...,y n ) T be real vectors in R n. Then we define the inner product of x and y by (x, y) = x T y = x i y i. (16) The inner product has the following properties: i=1 (x, y) = (y, x) (α x, y) = α (x, y), α is scalar (x, y + z) = (x, y) + (x, z) (x, x) 0, = holds, only if x = 0. (17) For nonzero vectors x, y R n, if (x, y) = 0 then the vectors x and y are said to be orthogonal. If the vector x is orthogonal with the linearly independent n vectors p i (i = 1,...,n) in R n, then x = 0, since from the assumption we have p T 1 p T 2. x = 0, p T n and the matrix in the left-hand side is nonsingular, so that x = 0. Next we show the set of n vectors p i (i = 1,..., n) which are orthogonal with each other are linearly independent. Since, if not so, that is, if p i (i = 1,...,n) are linearly dependent, then there exist constants c i (i = 1,...,n), not necessarily all zero, such that From this we have for all j c 1 p 1 + + c n p n = 0. 0 = (c 1 p 1 + + c n p n, p j ) = c j (p j, p j ), which means c j = 0. This contradicts the assumption. 2 Eigenvalues and eigenvectors of real symmetric matrices When a real matrix A satisfies a ij = a ji, that is, A T = A, the matrix A is called real symmetric matrix. The eigenvalues and eigenvectors of real symmetric matrices have the following properties: 7

1. Eigenvalues are real Let λ and x be any eigenvalue end eigenvector of A, respectively. Then we have (A x, x) = (λx, x) = λ( x, x), where denotes complex conjugate. On the other hand, from the property of inner product we have (A x, x) = ( x, Ax) = λ ( x, x). These two expressions mean λ = λ, since ( x, x) 0. 2. Orthogonality of eigenvectors Let λ i and λ j be the eigenvalues of A and assume λ i λ j. Then we have and since A is symmetric, which implies Thus we have (x i, x j ) = 0. (Ax i, x j ) = λ i (x i, x j ), (Ax i, x j ) = (x i, Ax j ) = λ j (x i, x j ), (λ i λ j ) (x i, x j ) = 0. 3 Quadratic form For a real symmetric matrix A = (a ij ) and a real vector x = (x 1,...,x n ) T, the quantity given by Q (x) = x T A x = (x, A x) = a ij x i x j (18) is called quadratic form. For any x 0, if Q (x) > 0 ( 0), then the matrix A is said to be positive (semi-)definite. Positive definite matrices have the following properties: 1. The diagonal elements of symmetric positive definite matrix is positive. This is clear from a ii = (e i, A e i ). 2. All the eigenvalues of symmetric positive definite matrix A are positive. For any x 0, if we transform x by y = T x, where T = (u 1,...,u n ), and u i is the eigenvector of A corresponding to λ i, then i=1 j=1 which means λ i > 0 for all i. 0 < x T A x = y T (T T A T) y = y T (T 1 A T) y = λ i yi 2, i=1 8

4 Theorem Here we show again the algorithm CG-1: 1: Choose a small value ε > 0 and initial guess x 0 2: p 0 = r 0 = b A x 0, and compute (r 0, r 0 ) 3: k = 0 4: while (r k, r k ) ε do 5: α k = (r k, p k )/(p k, A p k ) 6: x k+1 = x k α k p k 7: r k+1 = r k + α k A p k 8: β k = (p k, A r k+1 )/(p k, A p k ) 9: p k+1 = r k+1 + β k p k 10: k = k + 1 11: end while Theorem Let A be an n n symmetric positive definite matrix, and x be the solution of the equation A x = b. Then the vectors p k generated by CG-1 algorithm satisfy and p k 0 unless x k = x. p T k A p j = 0, 0 j < k, k = 1,...,n 1, (19) r T k r j = 0, 0 j < k, k = 1,...,n 1, (20) Proof By the definitions of α j, β j and those of p j, r j+1, we have p T j r j+1 = pt j r j + α j p T j A p j = 0, j = 0, 1,..., (21) p T j A p j+1 = p T j A r j+1 + β j p T j A p j = 0, j = 0, 1,... (22) Here we assume, as an induction hypothesis, that (19) and (20) hold for some k < n 1. Then we must show these hold for k +1. Since p 0 = r 0, these hold for k = 1. For any j < k, using the 7th and 9th lines in CG-1, we have r T j r k+1 = rt j (r k + α k A p k ) = r T j r k + α k p T k A r j = r T j r k + α k p T k A (p j β j 1 p j 1 ) = 0, (23) since all three terms in the last line vanish by induction hypothesis. Moreover, we have from (21) and (22) r T k r k+1 = (p k β k 1 p k 1 ) T r k+1 = β k 1 p T k 1 r k+1 = β k 1 p T k 1 (r k + α k A p k ) = 0. 9

Thus we have shown that (20) is true also for k + 1. Next we have for any j < k p T j A p k+1 = pt j A (r k+1 + β k p k ) = pt j A r k+1 = α 1 j (r j+1 r j ) T r k+1 = 0, (24) if α j 0, which we will show later. Therefore (19) is true also for k + 1. Finally, we show α j 0. By the definition of p j and (22) we have Hence we have r T j p j = rt j (r j + β j 1 p j 1 ) = rt j r j. α j = r T j r j /p T j A p j. Therefore, if α j = 0, then r j = 0, that is, x j = x so that the process stops with x j. 5 Differentiation by vectors Here we define the differentiation of the scalar-valued function J(x) with respect to its vector argument by J (x) := J x 1 J x 2. J x n According to this definition, we have ( ) 1 Q (x) = 2 xt A x b T x = A x b,. (25) (26) since and Q(x) = 1 2 b T x = a ii x 2 i + a ij x i x j, i j i=1 b i x i, i=1 x k Q(x) = a kk x k + j k x k (b T x) = b k. a kj x j = a kj x j, j=1 10