CHAPTER 3 UNCONSTRAINED OPTIMIZATION

Size: px

Start display at page:

Download "CHAPTER 3 UNCONSTRAINED OPTIMIZATION"

Moses Carr
5 years ago
Views:

1 . Prelmnares CHAPER 3 UNCONSRAINED OPIMIZAION.. Introducton In ths chapter we wll examne some theory for the optmzaton of unconstraned functons. We wll assume all functons are contnuous and dfferentable. Although most engneerng problems are constraned, much of constraned optmzaton theory s bult upon the concepts and theory presented n ths chapter... Notaton We wll use lower case talcs, e.g., x, to represent a scalar quantty. Vectors wll be represented by lower case bold, e.g., x, and matrces by upper case bold, e.g., H. he set of n desgn varables wll be represented by the n-dmensonal vector x. For example, prevously we consdered the desgn varables for the wo-bar truss to be represented by scalars such as dameter, d, thcness, t, heght, h; now we consder dameter to be the frst element, x, of the vector x, thcness to be the second element, x, and so forth. hus for any problem the set of desgn varables s gven by x. Elements of a vector are denoted by subscrpts. Values of a vector at specfc ponts are denoted by superscrpts. ypcally x wll be the startng vector of values for the desgn varables. We wll then move to x, x, untl we reach the optmum, whch wll be x*. A summary of notaton used n ths chapter s gven n able. able Notaton A Matrx A x, x, x * I Identty matrx a Column vector ss, a,,... Columns of A,, * e,,... Coordnate vectors (columns of I), A a transpose f ( x ), f Hx ( ), H f ( x), f( x ), f Gradent of f ( x ), A gradent evaluated at x Vector of desgn varables, vector at teraton, vector at the optmum x, x... x n Elements of vector x Search drecton, search drecton at teraton Step length, step length at teraton, step length at mnmum along search drecton f ( x), f( x ), f Objectve functon, objectve evaluated at x Hessan matrx at Determnant of A x

2 x x x Dfference n x vectors n R γ f f Dfference n gradents at, x x x All vectors whch are n n- dmensonal Eucldean space N Drecton matrx at x.3. Statement of Problem he problem we are tryng to solve n ths chapter can be stated as, Fnd x, x R o Mnmze f ( x ) n.4. Gradent Vector.4.. Defnton he gradent of f ( x) s denoted f ( x ). he gradent s defned as a column vector of the frst partal dervatves of f(x): x f x f x f x n f ( ).4.. Example: Gradent of a Functon Evaluate the gradent of the functon f ( x ) 6x x x 3x x x 4x3x f 3x x If evaluated at x, 4 f A very mportant property of the gradent vector s that t s orthogonal to the functon contours and ponts n the drecton of greatest ncrease of a functon. he negatve gradent ponts n the drecton of greatest decrease. Any vector v whch s orthogonal to f ( x) wll satsfy v f ( x ).

3 .5. Vectors hat Pont "Downhll" or "Uphll" If we have some search drecton s, then s f s proportonal to the projecton of s onto the gradent vector. We can see ths better n Fg. 3.: VALLEY down up down tangent lne up f As long as s Fg. 3.. Vectors that pont uphll or downhll. f, then s ponts, at least for some small dstance, n a drecton that ncreases the functon (ponts uphll). In le manner, f s f, then s ponts downhll. As an example, suppose at the current pont n space the gradent vector s f ( x ) [6,, ]. We propose to move from ths pont n a search drecton s [,,]. 6 =-,-, 7 - Does ths drecton go downhll? We evaluate s f So ths drecton would tae us downhll, at least for a short step..6. Hessan Matrx.6.. Defnton he Hessan Matrx, H(x) or dervatves: f ( x ) s defned to be the square matrx of second partal f f f x x xn x x n f f f Hx ( ) f ( x) xx x xxn f f f xnx xnx x n (3.) 3

4 We can also obtan the Hessan by applyng the gradent operator on the gradent transpose, f f f x x x xn x x n f f f f f f ( ) f( ) ( f( ) ) x Hx x x,,..., xx x xxn x x x n f f f x n xnx xnx x n he Hessan s a symmetrc matrx. he Hessan matrx gves us nformaton about the curvature of a functon, and tells us how the gradent s changng. For smplcty, we wll sometmes wrte H nstead of ( ) Hx..6.. Example: Hessan Matrx Fnd the Hessan matrx for the functon, f ( x) 6x x x 3x x x 4x3x f 3x x, and the Hessan s: H x f 4 x f x x 3 f 3 xx f x.7. Postve and Negatve Defnteness.7.. Defntons If for any vector, x, the followng s true for a symmetrc matrx B, xbx xbx then then B s postve defnte B s negatve defnte (3.3).7.. Checng Postve Defnteness he above defnton s not very useful n terms of checng f a matrx s postve defnte, because t would requre that we examne every possble vector x to see f the condton gven n (3.3) s true. So, how can we tell f a matrx s postve defnte? here are three ways we wll menton, 4

5 . A symmetrc matrx B s postve defnte f all egenvalues of B are postve.. A symmetrc matrx s postve defnte f and only f the determnant of each of ts prncpal mnor matrces s postve. 3. A n n matrx B s symmetrc and postve defnte f and only f t can be wrtten as B = LL where L s a lower trangular matrx wth postve dagonal elements. he L matrx can be developed through Choles decomposton. he matrx we wll be most nterested n checng s the Hessan matrx, Hx What does t mean for the Hessan to be postve or negatve defnte? If postve defnte, t means curvature of the functon s everywhere postve. hs wll be an mportant condton for checng f we have a mnmum. If negatve defnte, curvature s everywhere negatve. hs wll be a condton for verfyng we have a maxmum Example: Checng f a Matrx s Postve Defnte Usng Prncpal Mnor Matrces Is the matrx gven below postve defnte? We need to chec the determnants of the prncpal mnor matrces, found by tang the determnant of a x matrx along the dagonal, the determnant of a x matrx along the dagonal, and fnally the determnant of the entre matrx. If any one of these determnants s not postve, the matrx s not postve defnte he determnants of the frst two prncpal mnors are postve. However, because the determnant of the matrx as a whole s negatve, ths matrx s not postve defnte. We also note that the egenvalues are -.5, 4.6, 8.9. hat these are not all postve also ndcates the matrx s not postve defnte Checng Negatve Defnteness How can we chec to see f a matrx s negatve defnte? here are two ways we wll menton,. A symmetrc matrx B s negatve defnte f all egenvalues of B are negatve. 5

6 . A symmetrc matrx s negatve defnte f we reverse the sgn of each element and the resultng matrx s postve defnte. Note: A symmetrc matrx s not negatve defnte f the determnant of each of ts prncpal mnor matrces s negatve. Rather, n the negatve defnte case, the sgns of the determnants alternate mnus and plus, so the easest way to chec for negatve defnteness usng prncpal mnor matrces s to reverse all sgns and see f the resultng matrx s postve defnte. It s also possble for a matrx to be postve sem-defnte, or negatve sem-defnte. hs occurs when one or more of the determnants or egenvalues are equal to zero, and the others are all postve (or negatve, as the case may be). hese are specal cases we won t worry about here. If a matrx s nether postve defnte nor negatve defnte (nor sem-defnte) then t s ndefnte. If usng prncpal mnor matrces, note that we need to chec both cases before we reach a concluson that a matrx s ndefnte Example: Checng f a Matrx s Negatve Defnte Usng Prncpal Mnor Matrces Is the matrx gven above negatve defnte? We reverse the sgns and see f the resultng matrx s postve defnte: Because the frst determnant s negatve there s no reason to go further. We also note that the egenvalues of the reversed sgn matrx are not all postve. Because ths matrx s nether postve nor negatve defnte, t s ndefnte..8. aylor Expanson.8.. Defnton he aylor expanson s an approxmaton to a functon at a pont x and can be wrtten n vector notaton as: f x f x f x xx xx f( x ) xx (3.4) If we note that x x can be wrtten as x, and usng notaton f x f, we can wrte (3.4) more compactly as, 6

7 f f f x x f x (3.5) he aylor expanson allows us to approxmate any contnuous functon as a polynomal n terms of ts dervatves at the pont x. We can mae a lnear approxmaton by tang the frst two terms of the expanson. We can mae a quadratc approxmaton by tang the frst three terms of the expanson..8.. Example: Quadratc Approxmaton of a ranscendental Functon Suppose f ( x ) x / 3ln x at x [5, 4] f x / x 3, f.447,.75 f x x ( 3/) f xx f x x = f 3 x x H x ( 3/) x at x x x 5 f x 8.63 [.447,.75] [ x5, x 4] x x 4 If we wsh, we can stop here wth the equaton n vector form. o see the equaton n scalar form we can carry out the vector multplcatons and combne smlar terms: x f x.35.75x 3. x 5.45x.5,.88x.75 x 4 f x x.75x.45x.45x.5.88x.54x 3.8 f x.3.67x.5x.3x.94x Evaluatng and comparng ths approxmaton to the orgnal: 7

8 x Quadratc Actual Error [5,4] [5,5] [6,4] [7,6] [,] [9,] We notce that the further the pont gets from the expanson pont, the greater the error that s ntroduced. We also see that at the pont of expanson the approxmaton s exact.. Propertes and Characterstcs of Quadratc Functons A lot of optmzaton theory s based on optmzng quadratc functons. It s therefore helpful to nvestgate some of the propertes of these functons... Representaton We can represent a quadratc functon three ways as a scalar equaton, a general vector equaton, and as a aylor expanson. Although these representatons loo dfferent, they gve exactly the same results. For example, consder the equaton, f ( x) 6x x x 3x x x (3.6) hs s a scalar representaton of a quadratc. As a another representaton, we can wrte a quadratc n general vector form, f( x) ab x x Cx (3.7) By nspecton, the example gven n (3.6), n the form of (3.7), s: 4 3 f ( x) 6 [, ] x x 3 x (3.8) where, x x x We also observe that C n (3.7) ends up beng H. 8

9 A thrd form s a aylor representaton, f x f f x x Hx (3.9) 4x3x 4 3 We note for (3.6), f 3xx and H 3 4 We wll assume a pont of expanson, x, f. (It may not be apparent, but f we are approxmatng a quadratc, t doesn t matter what pont of expanson we assume. he aylor expanson wll be exact.) he example n (3.6), as a aylor representaton, becomes, where, 4 3 f ( x) [ 4, ] x x 3 x (3.) x x x hese three representatons are equvalent. If we pc the pont x.,. representatons gve f 8 at ths pont, as you can verfy by substtuton... Characterstcs of Quadratc Functons It s useful to note the followng characterstcs of quadratc equatons:, all three he equatons for the gradent vector of a quadratc functon are lnear. hs maes t easy to solve for where the gradent s equal to zero. he Hessan for a quadratc functon s a matrx of constants (so we wll wrte as H or f nstead of H(x) or f ( x )). hus the curvature of a quadratc s everywhere the same. Excludng the cases where we have a sem-defnte Hessan, quadratc functons have only one statonary pont,.e. only one pont where the gradent s zero. Gven the gradent and Hessan at some pont x, the gradent at some other pont, x, s gven by, f f H x x (3.) hs expresson s developed n Secton 9. of the Appendx by dfferentatng a aylor expanson n vector form. 9

10 Gven the gradent some pont x, Hessan, H, and a search drecton, s, the optmal step length, *, n the drecton s s gven by, * f shs s (3.) hs expresson s derved n Secton 9. of the Appendx. he best methods of optmzaton are methods of conjugate drectons. A method of conjugate drectons wll solve for the optmum of a quadratc functon of n varables n n steps, provdng mnmzng steps are taen n each search drecton. We wll learn more about these methods n sectons whch follow..3. Examples We start wth the example, 4 4 f x x x x x x x (3.3) Snce ths s a quadratc, we now t only has one statonary pont. We note that the Hessan, 4 H 4 s ndefnte (egenvalues are -. and 6.). hs means we should have a saddle pont. he contour plots n Fg 3. and Fg. 3.3 confrm ths. Fg. 3. Contour plot of Eq (3.3).

Fg. 3.3. 3D contour plot of (3.3). We wll do a second example. Suppose we have the functon, 4 f x x x x x x x (3.

11 Fg D contour plot of (3.3). We wll do a second example. Suppose we have the functon, 4 f x x x x x x x (3.4) 8xx f x 4x and 8 H 4 By nspecton, we see that the determnants of the prncpal mnor matrces are all postve. hus ths functon should have a mn and loo le a bowl. he contour plots follow. Fg Contour plot for (3.4)

12 Fg D contour plot for (3.4) 3. Necessary and Suffcent Condtons for an Unconstraned Optmum Wth some prelmnares out of the way, we are now ready to begn dscussng the theory of unconstraned optmzaton of dfferentable functons. We start wth the mathematcal condtons whch must hold at an unconstraned, local optmum. 3.. Defntons 3... Necessary Condtons for an Unconstraned Optmum he necessary condtons for an unconstraned optmum at x * are, f( x*) and f x be dfferentable at x * (3.5) hese condtons are necessary but not suffcent, nasmuch as f ( x) can apply at a max, mn or a saddle pont. However, f at a pont f ( x ), then that pont cannot be an optmum Suffcent Condtons for a Mnmum he suffcent condtons nclude the necessary condtons but add other condtons such that when satsfed we now we have an optmum. For a mnmum, x f x x, plus f * f( *), dfferentable at * Suffcent Condtons for a Maxmum For a maxmum, x s postve defnte. (3.6)

13 x f x x, plus f * f( *), dfferentable at * x s negatve defnte. (3.7) 3.. Examples: Applyng the Necessary, Suffcent Condtons Apply the necessary and suffcent condtons to fnd the optmum for the quadratc functon, x 4 f x x x x Snce ths s a quadratc functon, the partal dervatves wll be lnear equatons. We can solve these equatons drectly for a pont that satsfes the necessary condtons. he gradent vector s, x f x x x 8 x f ( ) f x x When we solve these two equatons, we have a soluton, x, x --ths a pont where the gradent s equal to zero. hs represents a mnmum, a maxmum, or a saddle pont. At ths pont, the Hessan s, H 8 Snce ths Hessan s postve defnte (egenvalues are.4, 8.6), ths must be a mnmum. As a second example, apply the necessary and suffcent condtons to fnd the optmum for the quadratc functon, x 4 4 f x x x x x x As n example, we wll solve the gradent equatons drectly for a pont that satsfes the necessary condtons. he gradent vector s, f x x 4x 4 f 4xx x When we solve these two equatons, we have a soluton, x.333, x.667. he Hessan s, 3

14 4 H 4 he egenvalues are -, 6. he Hessan s ndefnte. hs means ths s nether a max nor a mn t s a saddle pont. Comments: As mentoned, the equatons for the gradent for a quadratc functon are lnear, so they are easy to solve. Obvously we don t usually have a quadratc objectve, so the equatons are usually not lnear. Often we wll use the necessary condtons to chec a pont to see f we are at an optmum. Some algorthms, however, solve for an optmum by solvng drectly where the gradent s equal to zero. Sequental Quadratc Programmng (SQP) s ths type of algorthm. Other algorthms search for the optmum by tang downhll steps and contnung untl they can go no further. he GRG (Generalzed Reduced Gradent) algorthm s an example of ths type of algorthm. In the next secton we wll study one of the smplest unconstraned algorthms that steps downhll: steepest descent. 4. Steepest Descent wth a Quadratc Lne Search 4.. Descrpton One of the smplest unconstraned optmzaton methods s steepest descent. Gven an ntal startng pont, the algorthm moves downhll untl t can go no further. he search can be broen down nto stages. For any algorthm, at each stage (or teraton) we must determne two thngs:. What should the search drecton be?. How far should we go n that drecton? Answer to queston : For the method of steepest descent, the search drecton s Answer to queston : A lne search s performed. "Lne" n ths case means we search along a drecton vector. he lne search strategy presented here, bracetng the functon wth quadratc ft, s one of many that have been proposed, and s one of the most common. General Approach for each step: Gven some startng pont, x, we wsh to determne, f x x s (3.8) x where s s the search drecton vector, usually normalzed, and s the step length, a scalar. 4

15 We wll step n drecton s wth ncreasng values of untl the functon starts to get worse. hen we wll curve ft the data wth a parabola, and step to the mnmum of the parabola. 4.. Example: Steepest Descent wth Lne Search Mn f x x x x x f startng at x f 4 s normalzed s.86 x.86 We wll fnd *, whch s the optmal step length, by tral and error. Guess * =.4 for step number : Lne Search Step.4 x x s f x x We see that the functon has decreased; we decde to double the step length and contnue doublng untl the functon begns to ncrease: Lne Search Step x x s f x x x x he objectve functon has started to ncrease; therefore we have gone too far. We wll cut the change n the last step by half: 5.4 x

16 A graph of our progress s shown n Fg. 3.6: 4. x f x x x 4x.. Start pont (- f 4 5 th pont (-.8,-.6) x -AXIS Fg. 3.6 Progress n the lne search shown on a contour plot. If we plot the objectve value as a functon of step length as shown n Fg 3.7: Lne Search Example Objectve Value y x 4.897x R. 3 * 3 4 Step Length 5 4 Fg. 3.7 he objectve value vs. step length for the lne search. 6

17 We see that the data plot up to be a parabola. We would le to estmate the mnmum of ths curve. We wll curve ft ponts, 5, 3. hese ponts are equally spaced and bracet the mnmum. 3 5 Renumberng these ponts as,, 3 the mnmum of the parabola s gven by f f * f f f * *.97 9 where f ( x ) 3. When we step bac, after the functon has become worse, we have four ponts to choose from (ponts, 3, 5, 4). How do we now whch three to pc to mae sure we don t lose the bracet on the mnmum? he rule s ths: tae the pont wth the lowest functon value (pont 3) and the two ponts to ether sde (ponts and 5). In summary, the lne search conssts of steppng along the search drecton untl the mnmum of the functon n ths drecton s braceted, fttng three ponts whch bracet the mnmum wth a parabola, and calculatng the mnmum of the parabola. If necessary the parabolc ft can be carred out several tmes untl the change n the mnmum s very small (although the are then no longer equally spaced, so the followng formula must be used): f f f * f 3 f 3 f 3 Each sequence of obtanng the gradent and movng along the negatve gradent drecton untl a mnmum s found (.e. executng a lne search) s called an teraton. he algorthm conssts of executng teratons untl the norm of the gradent drops below a specfed tolerance, ndcatng the necessary condtons have been met. df As shown n Fg. 3.7, at *,. he process of determnng * d wll be referred to as tang a mnmzng step, or, executng an exact lne search Pros and Cons of Steepest Descent Steepest descent has several advantages. It usually maes good progress when far from the optmum (n the above example the objectve decreased from 9 to 3 n the frst teraton), and t s very smple to mplement. It always goes downhll. It s also guaranteed to converge to a local optmum f enough steps are taen. 7

18 However, f the functon to be mnmzed s eccentrc, convergence of steepest descent can be very slow, as ndcated by the followng theorem from Luenberger. HEOREM. Convergence of Steepest Descent. For a quadratc functon, f we tae enough steps, the method of steepest descent converges to the unque mnmum pont x * of f. If we defne the error n the objectve functon at the current value of x as, E( x) xx* Hxx * (3.) there holds at every step, A a Ex Ex A a where A Largest egenvalue of H a Smallest egenvalue of H (3.) hus f A=5 and a=, we have that the error at the + step s only guaranteed to be less than the error at the step by, 49 E E 5 and thus the error may be reduced very slowly. Roughly speang, the above theorem says that the convergence rate of steepest descent s slowed as the contours of f become more eccentrc. If a A, correspondng to crcular contours, convergence occurs n a sngle step. Note, however, that even f n of the n egenvalues are equal and the remanng one s a great dstance from these, convergence wll be slow, and hence a sngle abnormal egenvalue can destroy the effectveness of steepest descent. he above theorem s based on a quadratc functon. If we have a quadratc, and we do rotaton and translaton of the axes, we can elmnate all of the lnear and cross product terms. We then have only the pure second order terms left. he egenvalues of the resultng Hessan are equal to twce the coeffcents of the pure second order terms. hus the functon, f x x would have equal egenvalues of (, ) and would represent the crcular contours as mentoned above, shown n Fg Steepest descent would converge n one step. Conversely the functon, Luenberger and Ye, Lnear and Nonlnear Programmng, hrd Edton, 8 8

19 f 5x x has egenvalues of (, ). he contours would be hghly eccentrc and convergence of steepest descent would be very slow. A contour plot of ths functon s gven n Fg 3.9, Fg Contours of the functon, f x x. Fg Contours of the functon, been stretched out. f 5x x. Notce how the contours have 9

20 5. he Drectonal Dervatve It s sometmes useful to calculate df along some search drecton s. From the chan rule d for dfferentaton, df f dx = d x d Notng that x x s, or, for a sngle element of vector x, x x s, we have dx s d, so df f dx f s f d x d x s (3.3) As an example, we wll fnd the drectonal dervatve, df, for the problem gven n Secton d df.5 4. above, at =. From (3.3): f s d.86 hs gves us the change n the functon for a small step n the search drecton,.e., df f (3.4) d If., the predcted change s.64. he actual change n the functon s.599. Equaton (3.3) s the same equaton for checng f a drecton goes downhll, gven n Secton.4. Before we just looed at the sgn; f negatve we new we were gong downhll. Now we see that the value has meanng as well: t represents the expected change n the df functon for a small step. If, for example, the value of s less than some epslon, we d could termnate the lne search, because the predcted change n the objectve functon s below a mnmum threshold. Another mportant value of df occurs at * d. If we locate the mnmum exactly, then df d * f s (3.5) As we have seen n examples, when we tae a mnmzng step we stop where the search drecton s tangent to the contours of the functon. hus the gradent at ths new pont s orthogonal to the prevous search drecton.

21 6. Newton s Method 6.. Dervaton Another classcal method we wll brefly study s called Newton's method. It smply maes a quadratc approxmaton to a functon at the current pont and solves for where the necessary condtons (to the approxmaton) are satsfed. Startng wth a aylor seres: f f f x x H x (3.6) Snce the gradent and Hessan are evaluated at, they are just a vector and matrx of constants. ang the gradent (Secton 9.), f f H x and settng f, we have, Solvng for x : H x f f x H (3.7) Note that we have solved for a vector,.e. x, whch has both a step length and drecton. 6.. Example: Newton's Method 3 We wsh to optmze the functon, f x x xx 4x from the pont x. 8 At ths pont f 4 and the Hessan s, H 8. he Hessan nverse s gven by: hus x So, 3 3 x x x

22 4. x f x x x 4x.. 3 x x -AXIS Fg. 3.. he operaton of Newton s method Pros and Cons of Newton's Method We can see n the above example Newton s method solved the problem n one step. hs s true n general: Newton s method wll drve to the statonary pont of a quadratc n one step. On a non-quadratc, f we are near an optmum, Newton s method wll drve very close to the optmum n one step. However we should note some drawbacs. Frst, t requres second dervatves. Normally we compute dervatves numercally, and computng second dervatves s computatonally expensve, on the order of n functon evaluatons, where n s the number of desgn varables. he dervaton of Newton s method solved for where the gradent s equal to zero. he gradent s equal to zero at a mn, a max or a saddle, and nothng n the method dfferentates between these. hus Newton s method can dverge, or fal to go downhll (ndeed, not only not go downhll, but go to a maxmum!). hs s obvously a serous drawbac. 7. Quas-Newton Methods 7.. Introducton Let s summarze the pros and cons of Newton's method and Steepest Descent: Pros Cons Always goes downhll Slow on eccentrc functons Steepest Always converges Descent Smple to mplement Newton s Method Solves quadratc n one step. Very fast when close to optmum on non quadratc. Requres second dervatves, Can dverge

23 We want to develop a method that starts out le steepest descent and gradually becomes Newton's method, doesn't need second dervatves, doesn't have trouble wth eccentrc functons and doesn't dverge! Fortunately such methods exst. hey combne the good aspects of steepest descent and Newton's method wthout the drawbacs. hese methods are called quas-newton methods or sometmes varable metrc methods. In general we wll defne our search drecton by the expresson snf ( x ) (3.8) where N wll be called the drecton matrx. If N I, then sf( x ) Steepest Descent If - N H, then s H f x Newton's Method If N s always postve defnte, then s always ponts downhll. o show ths, our crteron for movng downhll s: s f Or, f s (3.9) Substtutng (3.8) nto (3.9): f N f (3.3) Snce N s postve defnte, we now that any vector whch pre-multples N and postmultples N wll result n a postve scalar. hus the quantty wthn the parentheses s always postve; wth the negatve sgn t becomes always negatve, and therefore always goes downhll. 7.. A Ran One Hessan Inverse Update 7... Development In ths secton we wll develop one of the smplest updates, called a ran one update because the correcton to the drecton matrx, N, s a ran one matrx (.e., t only has one ndependent row or column). We frst start wth some prelmnares. Startng wth a aylor seres: f f f x x H x (3.3) 3

24 where x x x the gradent s gven by, and defnng: we have, f f Hx (3.3) f f γ (3.33) or - γ Hx H γ x (3.34) Equaton (3.34) s very mportant: t shows that for a quadratc functon, the nverse of the Hessan matrx ( H ) maps dfferences n the gradents to dfferences n x. he relatonshp expressed by (3.34) s called the Newton condton. We wll mae the drecton matrx satsfy ths relatonshp. However, snce we can only calculate γ and x after the lne search, we wll mae N γ x (3.35) hs expresson s sometmes called the quas-newton condton. It s quas n that t nvolves + for N nstead of. Equaton (3.35) nvolves more unnowns (the elements of N ) than equatons, so how do we solve for N? One of the smplest possbltes s: N N uu (3.36) a Where we wll update the drecton matrx wth a correcton whch s of the form whch s a ran one symmetrc matrx. If we substtute (3.36) nto (3.35), we have, auu, or a N γ uu γ x (3.37) auu γ x N γ (3.38) scalar Notng that u γ s a scalar, then u must be proportonal to x N γ. Snce any change n length can be absorbed by a, we wll set Substtutng (3.39) nto (3.38): u x N γ (3.39) 4

25 For ths to be true, so a x N γ x N γ γ x N γ a x N γ γ a x N γ γ scalar (3.4) (3.4) Substtutng (3.4) and (3.39) nto (3.36) gves the expresson we need: x N γ x N γ x N γ γ N N (3.4) Equaton (3.4) allows us to get a new drecton matrx n terms of the prevous matrx and the dfference n x and the gradent. We then use ths to get a new search drecton accordng to (3.8) Example: Ran One Hessan Inverse Update We wsh to mnmze the functon f x x x x 4x 3 8 startng from x f 4 We let N so the search drecton s s N f f.496 We normalze the search drecton to be: s.868 We execute a lne search n ths drecton (usng, for example, a quadratc ft) and stop at x f hen x x x γ f f

26 and x N γ auu x N γ x N γ x N γ γ New search drecton: N N auu.8.54 N N s When we step n ths drecton, usng agan a lne search, we arrve at the optmum x f x At ths pont we are done. However, f we update the drecton matrx one more tme, we fnd t has become the nverse Hessan. 6

27 .3.3 x x x γ f f x N γ auu x N γ x N γ x N γ γ N N a uu H he Heredtary Property he heredtary property s an mportant property of all update methods. he heredtary property states that not only wll N satsfy (3.35), but N γ x N γ x N γ x N γ x n n (3.43) where n s the number of varables. hat s, (3.35) s not only satsfed for the current step, but for the last n- steps. Why s ths sgnfcant? Let's wrte ths relatonshp of (3.43) as follows: n n N γ, γ, γ... γ x, x, x,, x Let the matrx defned by the columns of be denoted by G, and the matrx defned by columns of x be denoted by X. hen, 7

28 N G X n If γ γ are ndependent, and f we have n vectors,.e. G s a square matrx, then the nverse for G exsts and s unque and s unquely defned. N XG (3.44) Snce the Hessan nverse satsfes (3.44) for a quadratc functon, then we have the mportant result that, after n updates the drecton matrx becomes the Hessan nverse for a quadratc functon. hs mples the quas-newton method wll solve a quadratc n no more than n+ steps. he proof that our ran one update has the heredtary property s gven n the next secton Proof of the Heredtary Property for the Ran One Update HEOREM. Let H be a constant symmetrc matrx and suppose that x, x,, x and γ, γ,, γ are gven vectors, where γ Hx,,,,,, where n. Startng wth any ntal symmetrc matrx N, let then N N x N γ x N γ x N γ γ (3.45) N γ x for (3.46) PROOF. he proof s by nducton. We wll show that f (3.46) holds for prevous drecton matrx, t holds for the current drecton matrx. We now that at the current pont,, the followng s true, N γ x (3.47) because we enforced ths condton when we developed the update. Now, suppose t s true that, N γ x for - (3.48).e. that the heredtary property holds for the prevous drecton matrx. We can post multply (3.45) by γ, gvng, x N γ γ x N γ x N γ γ N γ N γ (3.49) 8

29 o smplfy thngs, let y x N γ x N γ γ so that we can wrte (3.49) as, N γ N γ y x N γ γ (3.5) We can dstrbute the transpose on the last term, and dstrbute the post multplcaton γ to gve (Note: Recall that when you tae the transpose nsde a product, the order of the product s reversed; also because N s symmetrc, N γ γ γ N γ ), N N thus: N γ N γ y x γ γ N γ (3.5) Snce we have assumed (3.48) s true, we can replace N γ wth x : N γ x y x γ γ x (3.5) Now we examne the term n bracets. We note that, γ x H x x x H x x γ (3.53) So the term n bracets n (3.5) vanshes, gvng, N γ x for (3.54) hus, f the heredtary property holds for the prevous drecton matrx, t holds for the current drecton matrx. When, condton (3.47) s all that s needed to have the heredtary property for the frst update, N. he second update, N, wll then have the heredtary property snce N does, and so on Conjugacy Defnton Quas-Newton methods are also methods of conjugate drectons. A set of search drectons, s, s,..., s are sad to be conjugate wth respect to a square, symmetrc matrx, H, f, s Hs for all (3.55) 9

30 A set of conjugate drectons possesses an mportant property: If mnmzng lne searches are used along each conjugate drecton, a method of conjugate drectons s guaranteed to mnmze a quadratc functon of n varables n at most n steps. Hmmelblau ndcates the excellent convergence propertes of quas-newton methods on general functons may be due more to ther conjugate drecton propertes than to ther ablty to approxmate the Hessan nverse. Because of the mportance of conjugate drectons, we wll prove two results here. PROPOSIION. If H s postve defnte and the set of non-zero vectors conjugate to H, then these vectors are lnearly ndependent. n s, s,..., s are PROOF. Suppose we have constants,,,,..., n such that s s... s... n s n (3.56) Now we multply each term by s H: n n s Hs s Hs s Hs s Hs postve (3.57) From conjugacy, all of the terms except s Hs are zero. Snce H s postve defnte, then the only way for ths remanng term to be zero s for to be zero. In ths way we can show that for (3.57) to be satsfed all the coeffcents must be zero. hs s the defnton of lnear ndependence Conjugate Drecton heorem We wll now show that a method of conjugate drectons wll solve a quadratc functon n n steps, f mnmzng steps are taen. HEOREM. Let n s, s,..., s be a set of non-zero H conjugate vectors, wth H a postve defnte matrx. For the functon, f f f x x x x Hx x (3.58) the sequence, x x s (3.59) wth, f s s Hs f f H x x (3.6) Hmmelblau, Appled Nonlnear Programmng, p.. 3

31 converges to the unque soluton, * Hx ( x ) f, after n steps, that s n * x x. PROOF. Based on (3.59) above we note that, Lewse for Or, n general x x s x : x x s x s s x x s s... s (3.6) After n steps, we can wrte the optmum (assumng the drectons are ndependent, whch we just showed) as, * n n x x s s... s... s (3.6) Multplyng both sdes of (3.6) by s H, we have, * n n s H x x s Hs s Hs s Hs s Hs Solvng for : postve * ( ) s H x x s Hs (3.63) Unfortunately (3.63) s n terms of x*, whch we presumably don t now. However, f we multply (3.6) by s H, we have,... s H x x s Hs s Hs s Hs (3.64) whch gves, s H( x x ) (3.65) Substtutng ths result nto (3.63), we have 3

32 * ( ) s H x x s Hs (3.66) Notng that * Hx ( x) f s the soluton to (3.58), we can solve for the as, f s s Hs whch s dentcal wth (3.6). We notce that (3.6) s the same as the mnmzng step we derved n Secton 9.. hus the conjugate drecton theorem reles on tang mnmzng steps Examples We stated earler that quas-newton methods are also methods of conjugate drectons. hus for the example gven n Secton 7.3, we should have, s Hs Substtutng the search drectons and Hessan of that problem, Wthn the round-off of the data, we see ths s verfed. In the prevous problem we only had two search drectons. Let s loo at a problem where we have three search drectons so we have more conjugate relatonshps to examne. We wll consder the problem, Mn f x x 4x 4x 8x x. 3 3 Startng from 6.7 x 3 f 8.8 s We execute a lne search n the drecton of steepest descent (normalzed as s above), stop at * and determne the new pont and gradent. We calculate the new search drecton usng our ran update, 3

33 x.3 f s We go through ths cycle agan, x.473 f.8.56 s After steppng n the above drecton we arrve at the optmum, x...5 f Snce we have used a method of conjugate drectons, We wll chec ths:.7 s Hs s Hs s should be conjugate to s and s Some Insght nto Conjugacy As we dd n secton 4.3, we wll defne the error n the objectve at the current value of x as, E( x) xx* Hxx * We can rewrte ths expresson as, E( α) αα* SHSα α * (3.67) Where S s a matrx wth columns, reduces to, n s, s,..., s. If the s vectors are conjugate then (3.67) 33

34 n E( α ) ( *) d () where d s Hs. ( ) E α can then be mnmzed by choosng *,.e., by mang exact lne searches. Quotng Fletcher, 3 hus conjugacy mples a dagonalzng transformaton SHSof H to a new coordnate system, α, n whch the varables are decoupled. A conjugate drecton method s then the alternatng varables method appled n ths new coordnate system. he alternatng varables method referred to s just a method where the optmum s found wth respect to one varable, holdng the rest constant, and then a second varable, etc. Usually such a scheme would not wor well. Conjugate drectons are such that the s are decoupled so t does wor here. As we show n Secton 9.3, another result of conjugacy s that at the + step, f forall s (3.68) Equaton (3.68) ndcates ) that the current gradent s orthogonal to all the past search drectons, and ) at the current pont we have zero slope wth respect to all past search drectons,.e., f forall meanng we have mnmzed the functon n the subspace of the prevous drectons. As an example, for the three varable functon of Secton 7.5, f should be orthogonal to s and s : f f 7.4. Ran Updates s s he DFP Method Although the ran one update does have the heredtary property (and s a method of conjugate drectons), t does not guarantee that at each stage the drecton matrx, N, s postve defnte. It s mportant that the update reman postve defnte because ths nsures the search 3 R. Fletcher, Practcal Methods of Optmzaton, Second Edton, 987, pg

35 drecton wll always go downhll. It has been shown that (3.4) s the only ran one update whch satsfes the quas-newton condton. For more flexblty, ran updates have been proposed. hese are of the form, N N auu bvv (3.69) If we substtute ths nto the quas-newton condton, we have, N γ x (3.7) N γ auu γ bvv γ x (3.7) here are a number of possble choces for u and v. One choce s to try, ux, vn γ (3.7) Substtutng (3.7) nto (3.7), b N γ ax x γ N γ N γ γ x scalar scalar (3.73) In (3.73) we note that the dot products result n scalars. If we choose a and b such that, x γ and b a Equaton (3.7) becomes, and s satsfed. N γ γ (3.74) N γ x N γ x (3.75) Combnng (3.74), (3.7) and (3.69), the update s, N x x N γ N γ N x γ N γ γ (3.76) Or, wth some rearrangng, as t s more commonly gven, N x x N γ γ N N x γ γ N γ (3.77) 35

36 Davdon 4 was the frst one to propose ths update. Fletcher and Powell further developed hs method; 5 thus ths method came to be nown as the Davdon-Fletcher-Powell (DFP) update. hs update has the followng propertes, For quadratc functons: n. t has the heredtary property; after n updates, N H.. t s a method of conjugate drectons and therefore termnates after at most n steps. For general functons (ncludng quadratcs): 3. the drecton matrx N remans postve defnte f we do exact lne searches. hs guarantees the search drecton ponts downhll at every step. hs property s proved n the next secton Proof the DFP Update Stays Postve Defnte HEOREM. If postve defnte matrx, N for all. x γ for all steps of the algorthm, and f we start wth any symmetrc, N, then the DFP update preserves the postve defnteness of PROOF. he proof s nductve. We wll show that f From the defnton of postve defnteness, zn z for all z N s postve defnte, For smplcty we wll drop the superscrpt on the update terms. From (3.66), + N s also. x x Nγγ N term x γ γ Nγ zn z znz z z z z term term 3 (3.78) We need to show that all the terms on the rght hand sde are postve. We wll focus for a moment on the frst and thrd terms on the rght hand sde. Notng that N can be wrtten as N LL va Choles decomposton, and f we substtute al z, a z L, bl γ, b γ L the frst and thrd terms are, Nγγ N znzz z aa γ Nγ ab bb (3.79) he Cauchy-Schwarz nequalty states that for any two vectors, x and y, 4 W. C. Davdon, USAEC Doc. ANL-599 (rev.) Nov R. Fletcher and M. J. D. Powell, Computer J. 6: 63,

37 xx x y yy thus aa ab bb (3.8) So the frst and thrd terms of (3.78) are postve. Now we need to show ths for the second term, z z x x x z x x z z x γ x γ x γ (3.8) he numerator of the rght-most expresson s obvously postve. he denomnator can be wrtten, x γ x f x f s f s f term term (3.8) he second term n (3.8), s f, s negatve f the search drecton goes downhll whch t does f N s postve defnte, and wth the mnus sgn s therefore postve. he frst term f, can be postve or negatve; however, t s zero f we are at *; thus the entre expresson n (3.8) s postve f we tae a mnmzng step, *. n (3.8), s We have now shown that all three terms of (3.78) are postve f we tae a mnmzng step. hus, f N s postve defnte, N s postve defnte, etc DFP Update: Closng Remars he DFP update was popular for many years. As mentoned, we need to tae a mnmzng step to nsure N stays postve defnte. Recall that we fnd * usng a parabolc ft; on nonquadratcs there s usually some error here. We can reduce the error by refttng the parabola several tmes as we obtan more ponts n the regon of *. However, ths requres more functon evaluatons. he DFP method s more senstve to errors n * than the BFGS update, descrbed n the next secton, and can degrade f * s not accurate he Broyden Fletcher Goldfarb Shanno (BFGS) Update he current "best" update s nown as the Broyden, Fletcher, Goldfarb, Shanno or BFGS update, suggested by all four authors ndependently n 97. It s also a ran update. It has the same propertes as the DFP update but s less senstve to errors n *. hs means we can be sloppy n our lne search when we are far away from the optmum and the method stll wors well. hs update s, 37

38 N γ N γ x x x γ N N γ x N x γ x γ x γ (3.83) hs update s currently consdered to be the best update for use n optmzaton. It s the update nsde OptdesX, Excel and many other optmzaton pacages Comments About Quas-Newton Methods he quas-newton methods explaned here combne the advantages of steepest descent and Newton s method wthout the dsadvantages. hey start out as steepest descent, whch wors well far from the optmum, and gradually become Newton s method, whch wors well near the optmum. hey do ths wthout requrng the evaluaton of second dervatves. By nsurng the update s postve defnte, the search drecton wll always go downhll. Note that these methods use nformaton the prevous methods threw away. Quas-Newton methods use dfferences n gradents and dfferences n x to estmate second dervatves accordng to (3.34). hs allows nformaton from prevous steps to correct (or update) the current step. As mentoned, quas-newton methods are also methods of conjugate drectons. hs s shown n Secton Hessan Updates Vs. Hessan Inverse Updates All of the updates we have presented so far are updates for the Hessan Inverse. We can easly develop updates for the Hessan tself, as wll be requred for the SQP algorthm, startng from the condton γ H x (3.84) nstead of H γ x whch we used before. he BFGS Hessan approxmaton (Equaton (3.83) s the Hessan nverse approxmaton) s gven by, H γ γ H x x H H γ x x H x (3.85) You wll note that ths loos a lot le the DFP Hessan nverse update but wth H nterchanged wth N and nterchanged wth x. In fact these two formulas are sad to be complementary to each other. 38

39 8. he Conjugate Gradent Method 8.. Defnton here s one more method we wll learn, called the conjugate gradent method. We wll present the results for ths method prmarly because t s an algorthm used n Mcrosoft Excel. he conjugate gradent method s bult upon steepest descent, except a correcton factor s added to the search drecton. he correcton maes ths method a method of conjugate drectons. For the conjugate drecton method, the search drecton s gven by, s f s (3.86) Where, a scalar, s gven by f f f (3.87) f 8.. Example: Conjuage Gradent Method We wll optmze our usual functon, f x x x 4x startng from x 3 8 f 4 We tae a mnmzng step n the negatve gradent drecton and stop at x f.7.5 Now we calculate as f f f f We calculate the new search drecton as, 39

40 s f s when we step n ths drecton, we arrve at the optmum, x f he man advantage of the conjugate gradent method, as compared to quas-newton methods, s computaton and storage. he conjugate gradent method only requres that we store the last search drecton and last gradent, nstead of a full matrx. hus ths method s a good one to use for large problems (say wth 5 varables). Although both conjugate gradent and quas-newton methods wll optmze quadratc functons n n steps, on real problems quas-newton methods are better. Further, small errors can buld up n the conjugate gradent method so some researchers recommend restartng the algorthm perodcally (such as every n steps) to be steepest descent. 9. Appendx 9.. he Gradent of a Quadratc Functon n Vector Form We defne the coordnate vector to be, e th A sngle n the poston (3.88) We note that x e so x x, x,, xn e, e,, e I n (3.89) Suppose we have a lnear functon: f x a b x then f x a b x a b x term term For the frst term, snce a s a constant, a. Loong at the second term, from the rule for dfferentaton of a product, bx b x x b 4

41 but and b x I hus f xab x b x x b Ib b (3.9) Now suppose we have a quadratc functon of the form: qxab x x Hx (3.9) We wsh to evaluate the gradent n vector form. We wll do ths term by term, q x a b x x Hx Applyng the results from a lnear functon, q b x Hx x a b x x Hx So we only need to evaluate the term, ux, v Hx, then xhx x v v x We now wrte, xhx. If we splt ths nto two vectors,.e. x v IHx Hx, so we must only evaluate Hx = [ h x, h x,, h x] r r rn v x Hx x. We can where h r represents the frst row of H, Applyng the gradent operator, h r represents the second row, and so forth. Hx h rx, hrx,, hrnx From the prevous result for b x, we now that herefore, hxh snce h r s a vector constant. r r 4

42 Hx h, h,, h r r rn H q x ab x x Hx qxab x x Hx b x HHx x b HH x bhx (3.9) Returnng now to the gradent of the expresson, If the quadratc we are approxmatng s a aylor expanson, hen (3.9) s: f f f x x H x f f H x (3.93) 9.. Optmal Step Length for Quadratc Functon In ths secton we wll derve (3.). If we start wth a aylor expanson, f f f x x Hx (3.94) When we do a lne search, x s (3.95) Substtutng (3.95) nto (3.94) gves f f f s s Hs If we tae the dervatve of ths expresson wth respect to (a scalar), df d f ss Hs (3.96) Settng the dervatve equal to zero and solvng for gves: 4

43 * f shs s (3.97) 9.3. Proof that a Method of Conjugate Drectons Mnmzes the Current Subspace HEOREM. A conjugate drecton method s such that each x s the mnmzer n the subspace generated by x and the drectons, s, s,..., s,.e.,,..., x x s. We wsh to show that, f for all s (3.98) whch ndcates that we have zero slope along any search drecton n the subspace generated by x and the search drectons s, s,..., s,.e., f forall PROOF. he proof by nducton. Gven the usual expresson for the gradent of a aylor expanson, f f Hx Whch we wll wrte as, f f Hs (3.99) If we multply both sdes by s f f s s s Hs By defnton of ths s true for =. For <, f f s s s Hs term term erm vanshes by the nducton hypothess, whle term vanshes from the defnton of conjugate drectons. 43

44 9.4. Proof that an Update wth the Heredtary Property s Also a Method of Conjugate Drectons HEOREM. An update wth the heredtary property and exact lne searches s a method of conjugate drectons and therefore termnates after m nteratons on a quadratc functon. We assume that the heredtary property holds for,,..., m N γ x for all (3.) We need to show that conjugacy holds as well, s Hs forall (3.) he proof s by nducton. We wll show that f We note that s s conjugate then s s as well,.e., + s Hs forall (3.) s N f (3.3) by defnton of the quas-newton method. Or tang the transpose, f s N (3.4) Substtutng (3.4) nto (3.); + s Hs f N Hs for all (3.5) Also, Hs Hx γ so (3.5) becomes, + f N γ s Hs for all (3.6) From the heredtary property we have N γ x, so (3.6) can be wrtten, 44

45 + s Hs f x forall he term n bracets s zero for all values of,,..., from the assumpton the prevous search drecton was conjugate whch mples (3.98). It s zero for from the defnton of *. hus f we have conjugate drectons at, and the heredtary property holds, we have conjugate drectons at +.. References For more nformaton on unconstraned optmzaton and n partcular Hessan updates, see: R. Fletcher, Practcal Methods of Optmzaton, Second Edton, Wley, 987. D. Luenberger, and Y. Ye, Lnear and Nonlnear Programmng, hrd Edton, 8. 45

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could