The Conjugate Gradient Method

Size: px

Start display at page:

Download "The Conjugate Gradient Method"

Linette Banks
5 years ago
Views:

1 CHAPTER The Conjugate Gradient Method Exercise.: A-norm Let A = LL be a Cholesy factorization of A, i.e.l is lower triangular with positive diagonal elements. The A-norm then taes the form x A = p x T LL x = L x. Let us verify the three properties of a vector norm: () Positivity: Clearly x A = L x. Since L in nonsingular, x A = L x =ifandonlyifl x =ifandonlyifx =. () Homogeneity: ax A = L (ax) = al x = a L (x) = a x A. () Subadditivity: x + y A = L (x + y) = L x + L y L x + L y = x A + y A. Exercise.: Paraboloid Given is a quadratic function Q(y) = yt Ay b T y,adecompositiona = UDU T with U T U = I and D =diag(,..., n), new variables v =[v,...,v n ] T := U T y,and avectorc =[c,...,c n ] T := U T b.then Q(y) = yt UDU T y b T y = vt Dv c T v = nx nx jvj c j v j, which is what needed to be shown. j= Exercise.5: Steepest descent iteration In the method of Steepest Descent we choose, at the th iteration, the search direction p = r = b Ax and optimal step length rt r := r T Ar. Given is a quadratic function Q(x, y) = x x y A y b T x y, A = j=, b =, and an initial guess x =[, /] T of its minimum. The corresponding residual is / r = b Ax = / =. Performing the steps in Equation (.7) twice yields / t = Ar = = /, = rt r = 9/4 r T t 9/ =, 76

2 x = / + / = /4 / /, r = / = /4 t = Ar = /4 /4 = /, = rt r = 9/6 r T t 9/8 =, x = /4 / + /4 /4 = /8, r = /4 Moreover, assume that for some onehas /4 / = /8. (?) t = 4 /, x = 4, r = 4, (??) t = 4 Then t = 4 /, x = 4 /, r = 4 / = 4 (+) /,. = rt r r T t = 9 4 ( ) 9 4 =, x + = 4 / + / 4 = 4 (+), r + = 4 / (+) 4 / = 4 (+), t + = 4 (+) = 4 (+), + = rt + r + r T + t + = 9 4 (+) 9 4 (+) =, x + = 4 (+) + 4 (+) = 4 (+) /, r + = 4 (+) 4 (+) = 4 (+) / Using the method of induction, we conclude that (?), (??), and =/ holdforany. Using x =, onefinds Exercise.8: Conjugate gradient iteration, II x = x + (b Ax ) T (b Ax ) (b Ax ) T A(b A x ) (b Ax )= bt b b T Ab b., 77

3 By Exercise.8, x = Exercise.9: Conjugate gradient iteration, III bt b b T Ab b = = /. We find, in order, p = r =, =, r =, = 4, p =, =, x =. Since the residual vectors r, r, r must be orthogonal, it follows that r = and x must be an exact solution. This can be verified directly by hand. Exercise.: The cg step length is optimal For any fixed search direction p,thesteplength is optimal if Q(x + )isassmall as possible, that is where, by (.4), Q(x + )=Q(x + p ) = min R f( ), f( ) :=Q(x + p )=Q(x ) p T r + p T Ap is a quadratic polynomial in. SinceA is assumed to be positive definite, necessarily p T Ap >. Therefore f has a minimum, which it attains at pt r = p T Ap. Applying (.6) repeatedly, one finds that the search direction p for the conjugate gradient method satisfies p = r + rt r r T r p = r + rt r r T r r + rt r r T r p = As p = r,thedi erencep r is a linear combination of the vectors r,...,r, each of which is orthogonal to r.itfollowsthatp T r = r T r and that the step length is optimal for = rt r p T Ap =. Exercise.: Starting value in cg As in the exercise, we consider the conjugate gradient method for Ay = r,with r = b Ax.Startingwith y =, s = r Ay = r, q = s = r, one computes, for any, := st s q T Aq, y + = y + q, s + = s Aq, 78

4 := st + s + s T s, q + = s + + q. How are the iterates y and x related? As remared above, s = r and q = r = p. Suppose s = r and q = p for some. Then r T s + = s Aq = r r p T Ap Ap = r Ap = r +, q + = s + + q = r + + rt + r + r T r p = p +. It follows by induction that s = r and q = p for all. In addition, y + y = q = rt r p T Ap p = x + x, for any, so that y = x x. Exercise.7: Program code for testing steepest descent Replacing the steps in (.7) by those in (.7), Algorithm.4 changes into the following algorithm for testing the method of Steepest Descent. function [V,K] = sdtest(m, a, d, tol, itmax) R=ones(m)/(m+)ˆ;rho=sum(sum(R.*R)); rho = rho; V=zeros(m,m); T=sparse(toeplitz([d, a, zeros(,m-)])); for =:itmax if sqrt(rho/rho) <= tol K=;return end T=T*R +R*T; a=rho/sum(sum(r.*t)); V = V + a*r; R = R - a*t; rhos = rho; rho = sum(sum(r.*r)); end K=itmax+; Listing.. Testing the method of Steepest Descent To chec that this program is correct, we compare its output with that of cgtest. [V, K] = sdtest(5, -,, ˆ(-8), ); [V, K] = cgtest(5, -,, ˆ(-8), ); surf(v - V); Running these commands yields Figure, which shows that the di erence between both tests is of the order of 9,wellwithinthespecifiedtolerance. As in Tables. and.5, we let the tolerance be tol = 8 and run sdtest for the m m grid for various m, tofindthenumberofiterationsk sd required before r Ksd tol r.choosinga =/9 andd =5/8 yields the averaging matrix, and we find the following table. n K sd

5 x Figure. For a 5 5 Poisson matrix and a tolerance of 8,the figure shows the di erence of the outputs of cgtest and sdtest. Choosing a = table. andd = yields the Poisson matrix, and we find the following n K sd /n K sd K J K GS K SOR K cg Here the number of iterations K J, K GS,andK SOR of the Jacobi, Gauss-Seidel and SOR methods are taen from Table., and K cg is the number of iterations in the Conjugate Gradient method. Since K sd /n seems to tend towards a constant, it seems that the method of Steepest Descent requires O(n) iterationsforsolvingthepoissonproblemforsomegivenaccuracy, as opposed to the O( p n)iterationsrequiredbytheconjugategradientmethod. The number of iterations in the method of Steepest Descent is comparable to the number of iterations in the Jacobi method, while the number of iterations in the Conjugate Gradient method is of the same order as in the SOR method. The spectral condition number of the m m Poisson matrix is = +cos( h) /( cos h).theorem.6thereforestatesthat (?) x x x A x A =cos + m + 8.

6 function [x,k]=cg_leastsquares (A,b,x,tol,itmax) r=b-a *A*x; p=r; rho=r *r; rho=rho; for =:itmax if sqrt(rho/rho)<= tol K=; return end t=a*p; a=rho /(t *t); x=x+a*p; r=r-a*a *t; rhos=rho; rho=r *r; p=r+(rho/rhos)*p; end K=itmax+; Listing.. Conjugate gradient method for least squares How can we relate this to the tolerance in the algorithm, which is specified in terms of the Euclidean norm? Since x A x = xt Ax x T x is the Rayleigh quotient of x, Lemma5.4impliesthebound minx x A max x, with min =4 cos( h) the smallest and max =4 +cos( h) the largest eigenvalue of A. CombiningtheseboundswithEquation(?) yields x x p s +cos m+ = x x + cos m+ cos. m + Replacing by the number of iterations K sd for the various values of m shows that this estimate holds for the tolerance of 8. Exercise.8: Using cg to solve normal equations We need to perform Algorithm. with A T A replacing A and A T b replacing b. For the system A T Ax = A T b,equations(.4),(.5),and(.6)become x + = x + p, = r + = r A T Ap, r T r p T AT Ap = p + = r + + p, = rt + r + r T r, r T r (Ap ) T Ap, with p = r = b A T Ax. Hence we only need to change the computation of r,, and r + in Algorithm., which yields the implementation in Listing.. 8

7 Exercise.: Krylov space and cg iterations (a) The Krylov spaces W are defined as W := span r, Ar,...,A r. Taing A, b, x =, andr = b expressed as Ax = b as in the Exercise, these vectors can be r, Ar, A r = b, Ab, A b = , , (b) As x = we have p = r = b. We have for =,,,... Equations (.4), (.5), and (.6), x + = x + p, = rt r p T Ap, r + = r Ap, p + = r + + p, = rt + r + r T r, which determine the approximations x.for =,, thesegive =, x = 4 5, r = 4 5, = 4, p = 4 5, =, x = , r = 4 5, = 4 4 9, p = , 9 = 4, x = 4 5, r = 4 5, =, p = 4 5. (c) By definition we have W = {}. From the solution of part (a) we now that W =span(b, Ab,...,A b ), where the vectors b, Ab and A b are linearly independent. Hence we have dim W = for =,,,. From (b) we now that the residual r () = b Ax () =. Hence x () is the exact solution to Ax = b. We observe that r =4e, r =e and r =(4/)e and hence the r for =,, are linear independent and orthogonal to each other. Thus we are only left to show that W is the span of r,...,r. We observe that b = r, Ab =r r and A b =5r 8r +r. Hence span(b, Ab,...,Ab )=span(r,...,r )for =,,. We conclude that, for =,,, the vectors r,...,r form an orthogonal basis for W. One can verify directly that p, p,andp are A-orthogonal. Moreover, observing that b = p, Ab =(5/)p p,anda b =7p (8/)p +p,itfollowsthat span(b, Ab,...,Ab ) = span(p,...,p ), for =,,. We conclude that, for =,,, the vectors p,...,p for W. 8 form an A-orthogonal basis

8 By computing the Euclidean norms of r, r, r, r,weget r =4, r =, r =4/, r =. It follows that the sequence (r ) is monotonically decreasing. Similarly, one finds x x = = p, p 6, p 4/9,, which is clearly monotonically decreasing. Exercise.6: Another explicit formula for the Chebyshev polynomial It is well nown, and easily verified, that cosh(x+y) =cosh(x)cosh(y)+sinh(x)sinh(y). Write P n (t) =cosh n arccosh(t) for any integer n. Writing =arccosh(t), and using that cosh is even and sinh is odd, one finds P n+ (t)+p n (t) =cosh (n +) +cosh (n ) =cosh(n )cosh( )+sinh(n )sinh( )+cosh(n )cosh( ) sinh(n )sinh( ) =cosh( )cosh(n ) =tp n (t). It follows that P n (t) satisfiesthesamerecurrencerelationast n (t). Since in addition P (t) ==T (t), necessarily P n (t) =T n (t) foranyn. Exercise.8: Maximum of a convex function This is a special case of the maximum principle in convex analysis, which states that aconvexfunction,definedonacompactconvexset,attainsitsmaximumonthe boundary of. Let f :[a, b]! R be a convex function. Consider an arbitrary point x =( )a + b [a, b], with. Since f is convex, f(x) =f ( )a + b ( )f(a)+ f(b) ( )max{f(a),f(b)} + max{f(a),f(b)} =max{f(a),f(b)}. It follows that f(x) max{f(a),f(b)} and that f attains its maximum on the boundary of its domain of definition. 8

Notes on Some Methods for Solving Linear Systems

Notes on Some Methods for Solving Linear Systems Dianne P. O Leary, 1983 and 1999 and 2007 September 25, 2007 When the matrix A is symmetric and positive definite, we have a whole new class of algorithms