The conjugate gradient method

Size: px

Start display at page:

Download "The conjugate gradient method"

Esther Stewart
5 years ago
Views:

1 The conjugate gradient method Michael S. Floater November 1, 2011 These notes try to provide motivation and an explanation of the CG method. 1 The method of conjugate directions We want to solve the linear system Ax = b, for the vector x R n, where b R n and A R n,n is a symmetric, positive definite matrix. We want to design an iterative algorithm that requires at most n iterations. We choose n search directions p 0,p 1,...,p n 1 that are orthogonal (or conjugate ), with respect to some inner product,, i.e., p i,p j = 0 for i j. Then, starting from some initial vector x 0 R n, we let, for = 0, 1,..., n 1, for some step length α. Denoting the -th error by x +1 = x + α p, (1) e = x x, if we choose α so that the new error e +1 is orthogonal to the current search direction p the algorithm will terminate after n steps. To see this observe that since e +1 = e + α p, 1

2 forcing p,e +1 = 0 gives Then, since we have α = p,e p,p. (2) 1 x = x 0 + α j p j, 1 e = e 0 + α j p j, and so by the orthogonality of the p j, we find Therefore, p,e = p,e 0. α = p,e 0 p,p, which we recognize as 1 times the -th coefficient of the expansion of e 0 in terms of the orthogonal basis {p 0,...,p n 1 }, i.e., Therefore, e 0 = α j p j. 1 e = e 0 + α j p j = α j p j, (3) and in particular, e n = 0. Thus indeed x n = x. 2 Which inner product? j= If we tae, to be the standard inner product in R n, i.e., x,y := x T y, 2

3 we will not be able to use the the algoritm. Since x is the vector we are trying to find, we do not now the error e, and so we would not be able to compute the step length α in (2). Fortunately, however, we can use the algorithm if we use the tric of taing, to be the inner product x,y := x T Ay. It is easy to chec that this is in fact an inner product in R n due to our assumptions on A. Now we can compute α because (2) becomes α = pt Ae p T Ap, and we can compte the numerator in this fraction because Ae = A(x x) = Ax b. (4) This tric has allowed us to avoid needing to now x. We have now established a worable algorithm. It requires only n iterations and each iteration only requires multiplying the two vectors x and p by A. In fact we can even avoid the first of these by iterating on the residuals r := b Ax. Since r +1 = b A(x + α p ) = r α Ap, (5) we can use this equation to update the residuals, once we have computed the first residual r 0 = b Ax 0, and since Ae = r, we can compute α from the formula 3 Which search directions? α = pt r p T Ap. (6) Given a sequence of A-orthogonal search directions p 0,p 1,...,p n 1 we have designed an algorithm that requires n iterations and since each iteration requires multiplying a vector by A it costs in general O(n 2 ) operations, and so the algorithm requires O(n 3 ) in total. However, if A is sparse, with m nonzero elements, the total cost will only be O(mn) which will be considerably less if m << n 2. 3

4 However, the algorithm also requires finding some A-orthogonal directions p. To do this we could choose linearly independent vectors u 0,u 1,...,u n 1 and use the Gram Schmidt process with respect to the A inner product to generate the p. Thus we let p 0 = u 0 and for 0 let p +1 = u +1 β j p j, (7) where β j = u +1,p j p j,p j. (8) An obvious choice of the u is the unit vectors u 0 = (1, 0,..., 0) T, u 1 = (0, 1, 0,..., 0) T, etc. However, with this choice, the computation of the p would be considerable, so the conjugate algorithm would not be very attractive. We would lie to mae a choice of the u that leads to a simplification of the Gram Schmidt process. Lucily there is a choice that maes most of the coefficients β j zero, namely the residuals, u = r. This choice is what we call the Conjugate Gradient method. To see the simplification observe that from (3), r = Ae = α j Ap j, and so j= r T p i = 0, i <, i.e., the residual r is orthogonal, in the usual sense, to all previous search directions p i, whatever the choice of the u. Therefore, r is also orthogonal, in the usual sense, to all previous vectors u i (i < ). It follows that if we mae the choice u = r, then all the residuals r 0,r 1,...,r n 1 are orthogonal to each other: r T i r j = 0, i j. Because of this, many of the β j are zero: from (5), and noting that α j 0, the numerator of β j in (8) is u T +1Ap j = u T +1(r j r j+1 )/α j, and so if u = r, β j = 0 if j <, and, from (6), β = rt +1 r +1 p T r. 4

5 Hence, where p +1 = r +1 + β p, (9) β = β = rt +1 r +1 p T r. Since p T r = r T r we can also write α and β in the more symmetric form α = rt r p T Ap, β = rt +1 r +1 r T r. (10) In summary, the CG method is initialized by setting p 0 = r 0, and iterating with the two equations (5) and (9) with α and β given by (10). As written, the CG method is a direct method, converging to the exact solution in n iterations, but in practice, the vector x is often a good enough approximation to the true solution x for much smaller than n, and so it is typically used as an iterative method, lie the Gauss-Seidel and Jacobi methods. The iteration can be stopped when the residual r falls below some threshold. 5

The Conjugate Gradient Method

The Conjugate Gradient Method Jason E. Hicken Aerospace Design Lab Department of Aeronautics & Astronautics Stanford University 14 July 2011 Lecture Objectives describe when CG can be used to solve Ax