The Method of Steepest Descet This is the quadratic fuctio from to that is costructed to have a miimum at the x that solves the system A x = b: g(x) = <x,ax> - 2<x,b> I the method of steepest descet, we pick a startig poit x 0 that we thik is close to the solutio. We start our search for the solutio by headig i the directio of steepest descet. This directio is the egative of the gradiet of g(x) at the startig poit x 0. v = -!g Because g(x) has a particularly simple form, we ca compute its gradiet explicitly ad otice that somethig iterestig happes whe we do:!g = Ûx 1 Ûx 2 Ú (x Ûx 0 ) If we work out the details i the partial derivatives, we get a pleasig result. This leads to (x) = 2 a Ûx k, i x i - 2 b k k i = 1!g = Ûx 1 Ûx 2 Ú (x Ûx 0 ) = 2 (A x 0 - b) = -2 r 0 where r 0 is the residual associated with x 0. 1
Now that we have a search directio, we ca costruct a ray startig at x 0 ad headig i the directio of v = -2 r 0. We wat the poit alog that ray that miimizes the quadratic form g(x). h(t) = g(x + t v) This fuctio has its miimum at the poit where h (t) = 0: h (t) = -2 <v,b-ax> + 2 t <v,av> = 0 t = <v,b-ax> <v,av> = <v,r> <v,av> Usig that value of t to compute x + t v gives us our ext poit. We the compute a residual at that poit to establish our search directio ad repeat the process. The big problem with the method of steepest descet is that it typically takes too may iteratios of the method to coverge to the solutio of the system. I fact, it takes so may iteratios that we ed up doig more work tha if we had just solved the system by Gauss elimiatio i the first place. The method does have two big advatages. The first is that the method ca be modified to coverge more quickly, as we shall see below. The secod is that a versio of this method works for oliear systems. There is o equivalet to Gauss elimiatio for oliear systems of equatios, so steepest descet is a primary method for dealig with oliear systems. The Cojugate Directio ad Cojugate Gradiet Methods The problem with the method of steepest descet is that the search directios it uses v = -2 r lead to bad covergece behavior. We ca fix this problem by pickig better search directios. The cojugate directio method solves a by system by usig a set of search vectors v (1), v (2),, v () that have a special property. The vectors v (k) are selected be A orthogoal. Vectors v (j) ad v (k) for j k are A orthogoal if <v (j),a v (k) > = 0 The problem with the cojugate directio method is that we typically wo't have a set of A orthogoal vectors just lyig aroud. We eed some scheme to geerate them. 2
The cojugate gradiet method uses a clever scheme to geerate the A orthogoal set of vectors. The process begis by pickig a startig poit for the search ad usig the steepest descet directio v (1) = r (0) as its first search directio. After coductig the first lie search we lad at the secod poit x (1) ad compute a residual r (1) there. For subsequet iteratios, we seek a ew search directio that satisfies the relatioship v (k) = r (k-1) + s k-1 v (k-1) The trick is to select s k-1 so that v (k-1) ad v (k) are A orthogoal: <v (k-1),a(r (k-1) + s k-1 v (k-1) )> = 0 <v (k-1),ar (k-1) > + s k-1 <v (k-1),a v (k-1) > = 0 s k-1 = - <v(k-1),a r (k-1) > <v (k-1),a v (k-1) > After iteratios of this scheme, we will arrive at the solutio of the origial system. Summary of the Method We have ow worked out all the details, but it might be useful to summarize ad codese our fidigs. g(x) = <x,ax> - 2<x,b> x (0) = our startig guess r (k) = b - A x (k) v (1) = r (0) 3
t k = <v(k),r (k-1) > <v (k),av (k) > x (k) = x (k-1) + t k v (k) s k = - <v(k),a r (k) > <v (k),a v (k) > v (k) = r (k-1) + s k-1 v (k-1) Acceleratig Covergece The cojugate gradiet method is effective, but we would also like to make it fast. Oe way to make it faster is to arrage for more rapid covergece of the sequece geerated by the method. It turs out that the mai thig that affects the speed of covergece of the sequece is the matrix A ad its eigevalue, eigevector structure. I a effort to make A behave more icely, a commo techique is to precoditio the matrix A by doig A $ = C -1 A (C -1 ) t for some appropriately chose C. We the use the cojugate gradiet method to compute a approximate solutio for the system where A $ x $ = b $ $ -1 b = C b x $ = C t x We use the latter equatio to solve for x at the ed of the process: x = (C t ) -1 x $ What are some ways to select precoditioig matrices C? Oe method is to set 4
1 C i, j = A i, i 0 i = j i j A secod method is to do a Cholesky decompositio o A: A = LL T C = L I practice, the actual Cholesky decompositio is ot used, because it is too expesive to compute. For A with a great may 0 etries, we ca compute a "approximate" Cholesky decompositio by doig 1. Force L to have the same patter of o-zero terms as A. 2. Use the Cholesky formulas to compute oly those etries of L that we thik should be ozero. Sice the case of positive defiite A with may 0 etries arises frequetly i applicatios, this is a useful approach. 5