Least-square iversio with iexact adjoits. Method of cojugate directios: A tutorial a a Published i SEP Report, 92, 253-365 (1996) Sergey Fomel 1 ABSTRACT This tutorial describes the classic method of cojugate directios: the geeralizatio of the cojugate-gradiet method i iterative least-square iversio. I derive the algebraic equatios of the cojugate-directio method from geeral optimizatio priciples. The derivatio explais the magic properties of cojugate gradiets. It also justifies the use of cojugate directios i cases whe these properties are distorted either by computatioal errors or by iexact adjoit operators. The extra cost comes from storig a larger umber of previous search directios i the computer memory. A simple program ad two examples illustrate the method. INTRODUCTION This paper describes the method of cojugate directios for solvig liear operator equatios i Hilbert space. This method is usually described i the umerous textbooks o ucostraied optimizatio as a itroductio to the much more popular method of cojugate gradiets. See, for example, Practical optimizatio by Gill et al. (1995) ad its bibliography. The famous cojugate-gradiet solver possesses specific properties, well-kow from the origial works of Hestees ad Stiefel (1952) ad Fletcher ad Reeves (1964). For liear operators ad exact computatios, it guaratees fidig the solutio after, at most, iterative steps, where is the umber of dimesios i the solutio space. The method of cojugate gradiets does t require explicit computatio of the objective fuctio ad explicit iversio of the Hessia matrix. This makes it particularly attractive for large-scale iverse problems, such as those of seismic data processig ad iterpretatio. However, it does require explicit computatio of the adjoit operator. Claerbout (1992, 2003) shows dozes of successful examples of the cojugate gradiet applicatio with umerically precise adjoit operators. The motivatio for this tutorial is to explore the possibility of usig differet types of precoditioig operators i the place of adjoits i iterative least-square iversio. For some liear or liearized operators, implemetig the exact adjoit may pose a 1 e-mail: sergey@sep.staford.edu
Fomel 2 Cojugate directios difficult problem. For others, oe may prefer differet precoditioers because of their smoothess (Claerbout, 1995a; Crawley, 1995), simplicity (Kleima ad va de Berg, 1991), or asymptotic properties (Sevik ad Herma, 1994). I those cases, we could apply the atural geeralizatio of the cojugate gradiet method, which is the method of cojugate directios. The cost differece betwee those two methods is i the volume of memory storage. I the days whe the cojugate gradiet method was iveted, this differece looked too large to eve cosider a practical applicatio of cojugate directios. With the evidet icrease of computer power over the last 30 years, we ca afford to do it ow. I derive the mai equatios used i the cojugate-directio method from very geeral optimizatio criteria, with miimum restrictios implied. The textbook algebra is illustrated with a simple program ad two simple examples. IN SEARCH OF THE MINIMUM We are lookig for the solutio of the liear operator equatio d = A m, (1) where m is the ukow model i the liear model space, d stads for the give data, ad A is the forward modelig operator. The data vector d belogs to a Hilbert space with a defied orm ad dot product. The solutio is costructed by iterative steps i the model space, startig from a iitial guess m 0. Thus, at the -th iteratio, the curret model m is foud by the recursive relatio m = m 1 + α s, (2) where s deotes the step directio, ad α stads for the scalig coefficiet. The residual at the -th iteratio is defied by Substitutig (2) ito (3) leads to the equatio r = d A m. (3) r = r 1 α A s. (4) For a give step s, we ca choose α to miimize the squared orm of the residual r 2 = r 1 2 2 α (r 1, A s ) + α 2 A s 2. (5) The paretheses deote the dot product, ad x = (x, x) deotes the orm of x i the correspodig Hilbert space. The optimal value of α is easily foud from equatio (5) to be α = (r 1, A s ). (6) A s 2
Fomel 3 Cojugate directios Two importat coclusios immediately follow from this fact. First, substitutig the value of α from formula (6) ito equatio (4) ad multiplyig both sides of this equatio by r, we ca coclude that (r, A s ) = 0, (7) which meas that the ew residual is orthogoal to the correspodig step i the residual space. This situatio is schematically show i Figure 1. Secod, substitutig formula (6) ito (5), we ca coclude that the ew residual decreases accordig to r 2 = r 1 2 (r 1, A s ) 2 A s 2, (8) ( Pythagoras s theorem ), uless r 1 ad A s are orthogoal. These two coclusios are the basic features of optimizatio by the method of steepest descet. They will help us defie a improved search directio at each iteratio. A s Figure 1: Geometry of the residual i the data space (a scheme). r 1 r IN SEARCH OF THE DIRECTION Let s suppose we have a geerator that provides particular search directios at each step. The ew directio ca be the gradiet of the objective fuctio (as i the method of steepest descet), some other operator applied o the residual from the previous step, or, geerally speakig, ay arbitrary vector i the model space. Let us deote the automatically geerated directio by c. Accordig to formula (8), the residual decreases as a result of choosig this directio by How ca we improve o this result? r 1 2 r 2 = (r 1, A c ) 2 A c 2. (9)
Fomel 4 Cojugate directios First step of the improvemet Assumig > 1, we ca add some amout of the previous step s 1 to the chose directio c to produce a ew search directio s ( 1), as follows: s ( 1) = c + β ( 1) s 1, (10) where β ( 1) is a adjustable scalar coefficiet. Accordig to to the fudametal orthogoality priciple (7), (r 1, A s 1 ) = 0. (11) As follows from equatio (11), the umerator o the right-had side of equatio (9) is ot affected by the ew choice of the search directio: ( ) r 1, A s ( 1) 2 [ = (r 1, A c ) + β ( 1) (r 1, A s 1 ) ] 2 = (r 1, A c ) 2. (12) However, we ca use trasformatio (10) to decrease the deomiator i (9), thus further decreasig the residual r. We achieve the miimizatio of the deomiator A s ( 1) 2 = A c 2 + 2 β ( 1) (A c, A s 1 ) + ( ) β ( 1) 2 A s 1 2 (13) by choosig the coefficiet β ( 1) to be β ( 1) = (A c, A s 1 ) A s 1 2. (14) Note the aalogy betwee (14) ad (6). Aalogously to (7), equatio (14) is equivalet to the orthogoality coditio ( A s ( 1), A s 1 ) = 0. (15) Aalogously to (8), applyig formula (14) is also equivalet to defiig the miimized deomiator as A c ( 1) 2 = A c 2 (A c, A s 1 ) 2. (16) A s 1 2 Secod step of the improvemet Now let us assume > 2 ad add some amout of the step from the ( 2)-th iteratio to the search directio, determiig the ew directio s ( 2), as follows: s ( 2) = s ( 1) + β ( 2) s 2. (17) We ca deduce that after the secod chage, the value of umerator i equatio (9) is still the same: ( ) r 1, A s ( 2) 2 [ = (r 1, A c ) + β ( 2) (r 1, A s 2 ) ] 2 = (r 1, A c ) 2. (18)
Fomel 5 Cojugate directios This remarkable fact occurs as the result of trasformig the dot product (r 1, A s 2 ) with the help of equatio (4): (r 1, A s 2 ) = (r 2, A s 2 ) α 1 (A s 1, A s 2 ) = 0. (19) The first term i (19) is equal to zero accordig to formula (7); the secod term is equal to zero accordig to formula (15). Thus we have proved the ew orthogoality equatio (r 1, A s 2 ) = 0, (20) which i tur leads to the umerator ivariace (18). The value of the coefficiet i (17) is defied aalogously to (14) as β ( 2) β ( 2) = ( A s ( 1), A s 2 ) = (A c, A s 2 ), (21) A s 2 2 A s 2 2 where we have agai used equatio (15). If A s 2 is ot orthogoal to A c, the secod step of the improvemet leads to a further decrease of the deomiator i (8) ad, cosequetly, to a further decrease of the residual. Iductio Cotiuig by iductio the process of addig a liear combiatio of the previous steps to the arbitrarily chose directio c (kow i mathematics as the Gram- Schmidt orthogoalizatio process), we fially arrive at the complete defiitio of the ew step s, as follows: j= 1 s = s (1) = c + β (j) s j. (22) j=1 Here the coefficiets β (j) are defied by equatios which correspod to the orthogoality priciples ad β (j) = (A c, A s j ) A s j 2, (23) (A s, A s j ) = 0, 1 j 1 (24) (r, A s j ) = 0, 1 j. (25) It is these orthogoality properties that allowed us to optimize the search parameters oe at a time istead of solvig the -dimesioal system of optimizatio equatios for α ad β (j).
Fomel 6 Cojugate directios ALGORITHM The results of the precedig sectios defie the method of cojugate directios to cosist of the followig algorithmic steps: 1. Choose iitial model m 0 ad compute the residual r 0 = d A m 0. 2. At -th iteratio, choose the iitial search directio c. 3. If is greater tha 1, optimize the search directio by addig a liear combiatio of the previous directios, accordig to equatios (22) ad (23), ad compute the modified step directio s. 4. Fid the step legth α accordig to equatio (6). The orthogoality priciples (24) ad (7) ca simplify this equatio to the form α = (r 1, A c ) A s 2. (26) 5. Update the model m ad the residual r accordig to equatios (2) ad (4). 6. Repeat iteratios util the residual decreases to the required accuracy or as log as it is practical. At each of the subsequet steps, the residual is guarateed ot to icrease accordig to equatio (8). Furthermore, optimizig the search directio guaratees that the covergece rate does t decrease i compariso with (9). The oly assumptio we have to make to arrive at this coclusio is that the operator A is liear. However, without additioal assumptios, we caot guaratee global covergece of the algorithm to the least-square solutio of equatio (1) i a fiite umber of steps. WHAT ARE ADJOINTS FOR? THE METHOD OF CONJUGATE GRADIENTS The adjoit operator A T projects the data space back to the model space ad is defied by the dot product test (d, A m) ( A T d, m ) (27) for ay m ad d. The method of cojugate gradiets is a particular case of the method of cojugate directios, where the iitial search directio c is c = A T r 1. (28) This directio is ofte called the gradiet, because it correspods to the local gradiet of the squared residual orm with respect to the curret model m 1. Aligig the iitial search directio alog the gradiet leads to the followig remarkable simplificatios i the method of cojugate directios.
Fomel 7 Cojugate directios Orthogoality of the gradiets The orthogoality priciple (25) trasforms accordig to the dot-product test (27) to the form (r 1, A s j ) = ( A T r 1, s j ) = (c, s j ) = 0, 1 j 1. (29) Formig the dot product (c, c j ) ad applyig formula (22), we ca see that (c, c j ) = c, s j i=j 1 i=1 β (i) s i = (c, s j ) i=j 1 i=1 β (i) (c, s i ) = 0, 1 j 1. (30) Equatio (30) proves the orthogoality of the gradiet directios from differet iteratios. Sice the gradiets are orthogoal, after iteratios they form a basis i the -dimesioal space. I other words, if the model space has dimesios, each vector i this space ca be represeted by a liear combiatio of the gradiet vectors formed by iteratios of the cojugate-gradiet method. This is true as well for the vector m 0 m, which poits from the solutio of equatio (1) to the iitial model estimate m 0. Neglectig computatioal errors, it takes exactly iteratios to fid this vector by successive optimizatio of the coefficiets. This proves that the cojugate-gradiet method coverges to the exact solutio i a fiite umber of steps (assumig that the model belogs to a fiite-dimesioal space). The method of cojugate gradiets simplifies formula (26) to the form α = (r 1, A c ) A s 2 = ( A T r 1, c ) A s 2 = c 2 A s 2, (31) which i tur leads to the simplificatio of formula (8), as follows: r 2 = r 1 2 c 4 A s 2. (32) If the gradiet is ot equal to zero, the residual is guarateed to decrease. If the gradiet is equal to zero, we have already foud the solutio. Short memory of the gradiets Substitutig the gradiet directio (28) ito formula (23) ad applyig formulas (4) ad (27), we ca see that β (j) = (A c, r j r j 1 ) α j A s j 2 = ( c, A T r j A T r j 1 ) α j A s j 2 = (c, c j+1 c j ) α j A s j 2. (33)
Fomel 8 Cojugate directios The orthogoality coditio (30) ad the defiitio of the coefficiet α j from equatio (31) further trasform this formula to the form β ( 1) = c 2 α 1 A s 1 2 = c 2 c 1 2, (34) β (j) = 0, 1 j 2. (35) Equatio (35) shows that the cojugate-gradiet method eeds to remember oly the previous step directio i order to optimize the search at each iteratio. This is aother remarkable property distiguishig that method i the family of cojugatedirectio methods. PROGRAM The program i Table 1 implemets oe iteratio of the cojugate-directio method. It is based upo Jo Claerbout s cgstep() program (?) ad uses a aalogous amig covetio. Vectors i the data space are deoted by double letters. I additio to the previous steps s j ad their cojugate couterparts A s j (array s), the program stores the squared orms A s j 2 (variable beta) to avoid recomputatio. For practical reasos, the umber of remembered iteratios ca actually be smaller tha the total umber of iteratios. EXAMPLES Example 1: Iverse iterpolatio Matthias Schwab has suggested (i a persoal commuicatio) a iterestig example, i which the cgstep program fails to comply with the cojugate-gradiet theory. The iverse problem is a simple oe-dimesioal data iterpolatio with a kow filter (?). The kow portio of the data is a sigle spike i the middle. Oe hudred other data poits are cosidered missig. The kow filter is the Laplacia (1, 2, 1), ad the expected result is a bell-shaped cubic splie. The forward problem is strictly liear, ad the exact adjoit is easily computed by reverse covolutio. However, the cojugate-gradiet program requires sigificatly more tha the theoretically predicted 100 iteratios. Figure 2 displays the covergece to the fial solutio i three differet plots. Accordig to the figure, the actual umber of iteratios required for covergece is about 300. Figure 3 shows the result of a similar experimet with the cojugate-directio solver cdstep. The umber of required iteratios is reduced to almost the theoretical oe hudred. This idicates that the orthogoality of directios implied i the cojugate-gradiet method has bee distorted by computatioal errors. The additioal cost of correctig these errors with the cojugate-directio solver comes from storig the precedig 100 directios i memory. A smaller umber of memorized steps produces smaller improvemets.
Fomel 9 Cojugate directios 1 void s f c d s t e p ( bool f o r g e t / r e s t a r t f l a g /, 2 it x / model s i z e /, 3 it y / data s i z e /, 4 float x / curret model [ x ] /, 5 cost float g / g r a d i e t [ x ] /, 6 float r r / data r e s i d u a l [ y ] /, 7 cost float gg / cojugate g r a d i e t [ y ] / ) 8 / < Step o f cojugate d i r e c t i o i t e r a t i o. 9 The data r e s i d u a l i s rr = A x dat 10 > / 11 { 12 float s, s i, s s ; 13 double alpha, beta ; 14 it i,, ix, i y ; 15 16 s = s f f l o a t a l l o c ( x+y ) ; 17 s s = s+x ; 18 19 for ( i x =0; i x < x ; i x++) { s [ i x ] = g [ i x ] ; } 20 for ( i y =0; i y < y ; i y++) { s s [ i y ] = gg [ i y ] ; } 21 22 s f l l i s t r e w i d ( s t e p s ) ; 23 = s f l l i s t d e p t h ( s t e p s ) ; 24 25 for ( i =0; i < ; i++) { 26 s f l l i s t d o w ( steps, &s i, &beta ) ; 27 alpha = c b l a s d s d o t ( y, gg, 1, s i+x, 1) / beta ; 28 c b l a s s a x p y ( x+y, alpha, s i, 1, s, 1 ) ; 29 } 30 31 beta = c b l a s d s d o t ( y, s+x, 1, s+x, 1 ) ; 32 i f ( be ta < DBL EPSILON) retur ; 33 34 s f l l i s t a d d ( steps, s, beta ) ; 35 i f ( f o r g e t ) s f l l i s t c h o p ( s t e p s ) ; 36 alpha = c b l a s d s d o t ( y, rr, 1, ss, 1) / beta ; 37 38 c b l a s s a x p y ( x, alpha, s, 1, x, 1 ) ; 39 c b l a s s a x p y ( y, alpha, ss, 1, rr, 1 ) ; 40 } Table 1: The source of this program is RSF/api/c/cdstep.c
Fomel 10 Cojugate directios Figure 2: Covergece of the missig data iterpolatio problem with the cojugategradiet solver. Curret models are plotted agaist the umber of iteratios. The three plots are differet displays of the same data. Figure 3: Covergece of the missig data iterpolatio problem with the logmemory cojugate-directio solver. Curret models are plotted agaist the umber of iteratios. The three plots are differet displays of the same data.
Fomel 11 Cojugate directios Example 2: Velocity trasform The ext test example is the velocity trasform iversio with a CMP gather from the Mobil AVO dataset (Nichols, 1994; Lumley et al., 1994; Lumley, 1994). I use Jo Claerbout s veltra program (Claerbout, 1995b) for ati-aliased velocity trasform with rho-filter precoditioig ad compare three differet pairs of operators for iversio. The first pair is the CMP stackig operator with the migratio weightig fuctio ( ) w = (t 0/t) t ad its adjoit. The secod pair is the pseudo-uitary velocity trasform with the weightig proportioal to s x, where x is the offset ad s is the slowess. These two pairs were used i the velocity trasform iversio with the iterative cojugate-gradiet solver. The third pair uses the weight proportioal to x for CMP stackig ad s for the reverse operator. Sice these two operators are ot exact adjoits, it is appropriate to apply the method of cojugate directios for iversio. The covergece of the three differet iversios is compared i Figure 4. We ca see that the third method reduces the least-square residual error, though it has a smaller effect tha that of the pseudo-uitary weightig i compariso with the uiform oe. The results of iversio after 10 cojugate-gradiet iteratios are plotted i Figures 5 ad 6, which are to be compared with the aalogous results of Lumley (1994) ad Nichols (1994). Figure 4: Compariso of covergece of the iterative velocity trasform iversio. The left plot compares cojugate-gradiet iversio with uweighted (uiformly weighted) ad pseudo-uitary operators. The right plot compares pseudo-uitary cojugate-gradiet ad weighted cojugate-directio iversio.
Fomel 12 Cojugate directios Figure 5: Iput CMP gather (left) ad its velocity trasform couterpart (right) after 10 iteratios of cojugate-directio iversio. Figure 6: The modeled CMP gather (left) ad the residual data (right) plotted at the same scale.
Fomel 13 Cojugate directios CONCLUSIONS The cojugate-gradiet solver is a powerful method of least-square iversio because of its remarkable algebraic properties. I practice, the theoretical basis of cojugate gradiets ca be distorted by computatioal errors. I some applicatios of iversio, we may wat to do that o purpose, by applyig iexact adjoits i precoditioig. I both cases, a safer alterative is the method of cojugate directios. Jo Claerbout s cgstep() program actually implemets a short-memory versio of the cojugate-directio method. Extedig the legth of the memory raises the cost of iteratios, but ca speed up the covergece. REFERENCES Claerbout, J., 1995a, Ellipsoids versus hyperboloids, i SEP-89: Staford Exploratio Project, 201 206., 2003, Image estimatio by example: Geophysical soudigs image costructio: Multidimesioal autoregressio: Staford Exploratio Project. Claerbout, J. F., 1992, Earth Soudigs Aalysis: Processig Versus Iversio: Blackwell Scietific Publicatios., 1995b, Basic Earth Imagig: Staford Exploratio Project. Crawley, S., 1995, Approximate vs. exact adjoits i iversio, i SEP-89: Staford Exploratio Project, 207 216. Fletcher, R., ad C. M. Reeves, 1964, Fuctio miimizatio by cojugate gradiets: Computer Joural, 7, 149 154. Gill, P. E., W. Murray, ad M. H. Wright, 1995, Practical optimizatio: Academic Press. Hestees, M. R., ad E. Stiefel, 1952, Methods of cojugate gradiets for solvig liear systems: J. Res. NBS, 49, 409 436. Kleima, R. E., ad P. M. va de Berg, 1991, Iterative methods for solvig itegral equatios: Radio Sciece, 26, 175 181. Lumley, D., D. Nichols, ad T. Rekdal, 1994, Amplitude-preserved multiple suppressio, i SEP-82: Staford Exploratio Project, 25 45. Lumley, D. E., 1994, Estimatig a pseudouitary operator for velocity-stack iversio, i SEP-82: Staford Exploratio Project, 63 78. Nichols, D., 1994, Velocity-stack iversio usig L p orms, i SEP-82: Staford Exploratio Project, 1 16. Sevik, A. G. J., ad G. C. Herma, 1994, Fast iterative solutio of sparsely sampled seismic iverse problems: Iverse Problems, 10, 937 948.