The Conjugate Gradient Method

Size: px

Start display at page:

Download "The Conjugate Gradient Method"

Horace Powers
5 years ago
Views:

1 The Conjugate Gradient Method The minimization problem We are given a symmetric positive definite matrix R n n and a right hand side vector b R n We want to solve the linear system Find u R n such that u = b Since is nonsingular there exists a unique solution The linear system is equivalent to the minimization problem Find u R n such that Fu) = 1 u u b u is minimal Note that u,v) := u,v) = v u defines an inner product on R n, and a norm u = u u Note that u u = u,u) u,u) {{ +u,u ) = Fu)+ u b,u) Hence minimizing Fu) is equivalent to minimizing u u In applications Fu) corresponds to an energy internal energy minus work of external forces) which is minimized by the solution Therefore u is also called the energy norm, and this norm is the natural way to measure errors for this problem Subspace corrections Subspace correction with Ṽ = span{d 0 We start at u 0 R n and pick a search direction d 0 We define u new := u 0 +α 0 d 0 where α 0 is chosen such that u new u becomes minimal: The normal equations state that u new u,d 0 ) = 0 or α 0 = u u 0,d 0 ) = u u 0 ),d 0 ) = r 0,d 0 ), hence α 0 = d 0,r 0 ) 1) with the residual r 0 := b u 0 In this case we perform a subspace correction with the 1-dimensional subspace Ṽ = span{d 0 Subspace correction with Ṽ = span{d 0,,d k We start at u 0 R n and pick linearly independent vectors d 0,,d k R n We define u new := u 0 +α 0 d 0 + +α k d k where α 0,,α k are chosen such that u new u becomes minimal: The normal equations state that u new u,d j ) = 0 for j = 0,k Therefore we can find α 0,,α k by solving the linear system d k,d 0 ) d 0,d k ) d k,d k ) α 0 α k = r 0,d 0 ) r 0,d k ) ) Here we used u u 0, d j ) = u u 0 ), d j ) = r 0, d j ) on the right hand side Note that the normal equations state that u new u ),d j ) = 0 for j = 0,k, hence the new residual r new := b u new = u u new ) satisfies r new d 0,,d k

2 Steepest descent method The current guess is u j We choose the direction vector d 0 to be the steepest descent direction of the function Fu): The gradient is Fu) = u b, so the steepest descent direction is given by the residual r j = b u j We define u j+1 := u j +α j r j where α j is chosen so that u j+1 u becomes minimal: For k = 0,1,, do Convergence: The errors satisfy r k := b u k, α k := r k,r k ) r k,r k ), u k+1 := u k +α k r k u k+1 u = u k +α k u u k ) {{ r k u = I α k )u k u ) Since is symmetric we have an orthonormal basis v 1,,v n of eigenvectors with v j = λ j v j We write the old error u k u using this basis u k u = c j v j, u k u = c j λ j and get for the new error u k+1 u = I α)u k u ) = c j 1 αλ j )v j ) uk u k+1 u = 1 αλ j c j λ j max 1 αλ j u,,n The value of α which minimizes max,,n 1 αλ j is α = q = max 1 α λ j = λ max λ min = κ 1,,n λ max +λ min κ+1 = 1 κ+1 λ min +λ max where withκ := λ max λ min = cond ) Since α k is chosen such that u k+1 u is minimal we obtain for u k+1 = u k +α k r k the bounds u k+1 u 1 ) u k u κ+1, u k u 1 κ+1) k u 0 u This means that we need C δ κ iterations to achieve u k u δ Conjugate gradient method, version 0 We start with an initial guess u 0 For k = 0,1,, do Let r k := b u k If r k = 0 stop since u k is the exact solution) Perform a subspace correction with V k := span{r 0,,r k : Solve ) with d j := r j ), let u k+1 := u 0 +α 0 r 0 + +α k r k The normal equations are in this case u k+1 u, r j ) {{ r k+1,r j ) = 0 for j = 0,,k 3)

3 ie, we have r k+1 V k s long as we have r k 0 the vectors r 0,,r k are therefore linearly independent Observation 1: We will have r k = 0, ie, u k = u for k K with some K n But the CG method is typically used as an iterative method with k n iterations to find an approximate solution u k, rather than the exact solution Observation : We have V k = span { r 0,r 0,, k r 0 4) Proof: It is obvious for k = 0 ssume it holds for k 1 We have r k = b u k = b u {{ 0 α 0 r 0 + +α k 1 r k 1 ) {{ r 0 span { r 0,, k 1 r 0 hence r k span { r 0,r 0,, k r 0 The subspace V k = span { r 0,r 0,, k r 0 is called a Krylov space, and it plays a central role in understanding CG, GMRES and related iterative methods Conjugate gradient method, version 1 We can make the method more efficient by orthogonalizing the vectors r 0,r 1,,r k with respect to, ) using the Gram-Schmidt method, yielding vectors d 0,d 1,,d k such that V k = span{r 0,,r k = span{d 0,,d k and d j,d k ) = 0 for j k We let u k+1 = u 0 + α 0 d α k d k where α 0,,α k are chosen such that u j u = α 0 d 0 + +α k d k u u 0 ) is minimal Since the directions d 0,,d k are -orthogonal aka conjugate ) the normal equations ) decouple: d j,d j ) α j = r 0,d j ) for j = 0,,k We can write the right hand side as ) u u 0,d j ) = u u 0 +α 0 d 0 + +α j 1 d {{ j 1 ),d j d j = r j,d j ) yielding α j = r j,d j ) d j,d j ) and u k+1 = u 0 +α 0 d 0 + +α k 1 d k 1 +α k d k = u k +α k d k Therefore the algorithm can be written as follows: For k = 0,1,, do: the new steepest descent direction is given by the residual: r k := b u k, if r k = 0 : stop since u k is exact solution) modify this to make it conjugate to all previous search directions d k 1,d k,,d 0 : d k := r k r k,d k 1 ) d k 1,d k 1 ) d k 1 r k,d k ) d k,d k ) d k r k,d 0 ) d 0 5) perform optimal step in direction d k : actually minimizes u 0 +α 0 d 0 + +α k d k ) u over all α 0,,α k ) α k := r k,d k ) d k,d k ), u k+1 := u k +α k d k 6)

4 There is one additional simplification: By 4) we have d k V k 1, and by the normal equations r k V k 1 : r k,d k ) = ) r k, d {{ k = 0 V k 1 By this argument all the red terms in 5) are zero and we have d k := r k r k,d k 1 ) d k 1,d k 1 ) d k 1 7) where we orthogonalize only with respect to the previous direction d k 1 Final version of the conjugate gradient method By 7) we have d k,r k ) = r k,r k ) as r k,d k 1 ) = 0 by the normal equations r k V k 1 Hence We have from 6) that α k d k = r k r k+1 and hence α k = r k,r k ) d k,d k ) 8) α k r k+1d k = r k+1r k r k+1 ) = r k+1r k+1 sincer k,r k+1 ) = 0 by the normal equationsr k+1 V k Using this in the numerator and 8) in the denominator we get so that we can write 7) for k +1 as We then have the following algorithm: r 0 := b u 0, d 0 := r 0 For k = 0,1,, do r k+1 r k+1 r k r k = α kr k+1 d k α k d k d k = r k+1,d k ) d k,d k ) β k := r k+1 r k+1 rk r, d k+1 := r k+1 +β k d k k α k : = r k r k d k d k), u k+1 := u k +α k d k, r k+1 := r k α k d k ), if r k = 0 : stop β k : = r k+1 r k+1 rk r, d k+1 := r k+1 +β k d k k The cost of each step: 1 matrix vector product: compute d k dot products: compute d k d k), r k+1 r k+1 fter we compute rk+1 r k+1 = u k+1 b we can compare this with a given tolerance and terminate the iteration if the norm of the residual is sufficiently small Error estimate: k u k u 1 u κ +1) 1/ 0 u This means that we need C δ κ 1/ iterations to achieve u k u δ

5 Proof of the error estimate for the Conjugate Gradient Method We have u k = u 0 +w where w V k 1 = span { r 0,r 0,, k 1 r 0 is chosen such that uk u is minimal Therefore k 1 u k u = u 0 u + β j j u u 0 ) = p)u {{ u 0 ) withpλ) = 1+β 0 λ+ +β k λ k j=0 r 0 Using the eigenvectors v 1,,v n of we can write the initial error as u 0 u = n c jv j Then k ) u0 u k u = pλ j ) λ j c j max pλ j) u,,n We now try to choose a polynomial pλ) = 1+β 0 λ+ +β k λ k+1 which makes q := max,,n pλ j ) small We want a polynomial p P k with p0) = 1, q := max pλ) is small λ [λ 1,λ n] We can actually determine the polynomial p which minimizes q We start with the Chebyshev polynomial T k x) which has max x [ 1,1] T k x) = 1 We then use a linear change of variables x [ 1,1] to λ [λ 1,λ n ] λ = λ 1 +λ n +x λ n λ 1, x = λ λ 1 λ n λ n λ 1 =: gλ) pλ) := T k gλ)), q = max λ [λ 1,λ n] pλ) = 1 p0), pλ) := pλ) p0) p0) = T k λ ) n +λ 1 λ n λ 1 Note that x := λn+λ 1 λ n λ 1 = κ+1 > 1 We need a lower bound T κ 1 k x) = T k x) Note that with x = cost = 1z +z 1 ) T k x) = coskt) = 1 z k +z k) Note that for z := κ 1/ +1 ) / κ 1/ 1 ) we have hence with ρ := z 1 = 1 yielding 1 ) z +z 1 = 1 κ 1/ +1 ) + κ 1/ 1) ) κ 1/ +1)κ 1/ 1)) κ 1/ +1 Note that for k = 1 we have q = the steepest descent method T k x) = 1 z k +z k) = 1 ρ k +ρ k) = κ+1 κ 1 ) k q = T k x) 1 = ρ k +ρ k ρk = 1 κ 1/ +1 k u k u 1 u κ +1) 1/ 0 u ρ 1 +ρ = κ 1 = z +z 1 κ+1 = 1 κ+1 which is the bound we obtained for

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give