CHAPTER 8 CONSTRAINED OPTIMIZATION 2: SEQUENTIAL QUADRATIC PROGRAMMING, INTERIOR POINT AND GENERALIZED REDUCED GRADIENT METHODS

Size: px

Start display at page:

Download "CHAPTER 8 CONSTRAINED OPTIMIZATION 2: SEQUENTIAL QUADRATIC PROGRAMMING, INTERIOR POINT AND GENERALIZED REDUCED GRADIENT METHODS"

Paul Hicks
5 years ago
Views:

1 CHAPER 8 CONSRAINED OPIMIZAION : SEQUENIAL QUADRAIC PROGRAMMING, INERIOR POIN AND GENERALIZED REDUCED GRADIEN MEHODS 8. Introduction In the previous chapter we eained the necessary and sufficient conditions for a constrained optiu. We did not, however, discuss any algoriths for constrained optiization. hat is the purpose of this chapter. he three algoriths we will study are three of the ost coon. Sequential Quadratic Prograing (SQP) is a very popular algorith because of its fast convergence properties. It is available in MALAB and is widely used. he Interior Point (IP) algorith has grown in popularity the past 5 years and recently becae the default algorith in MALAB. It is particularly useful for solving large-scale probles. he Generalized Reduced Gradient ethod (GRG) has been shown to be effective on highly nonlinear engineering probles and is the algorith used in Ecel. SQP and IP share a coon background. Both of these algoriths apply the Newton- Raphson (NR) technique for solving nonlinear equations to the KK equations for a odified version of the proble. hus we will begin with a review of the NR ethod. 8. he Newton-Raphson Method for Solving Nonlinear Equations Before we get to the algoriths, there is soe background we need to cover first. his includes reviewing the Newton-Raphson (NR) ethod for solving sets of nonlinear equations. 8.. One equation with One Unknown he NR ethod is used to find the solution to sets of nonlinear equations. For eaple, suppose we wish to find the solution to the equation: + = e We cannot solve for directly. he NR ethod solves the equation in an iterative fashion based on results derived fro the aylor epansion. First, the equation is rewritten in the for, + - e = (8.) We then provide a starting estiate of the value of that solves the equation. his point becoes the point of epansion for a aylor series:

2 f NR f NR + df NR d ( ) (8.) where we use the subscript NR to distinguish between a function where we are applying the NR ethod and the objective function of an optiization proble. We would like to drive the value of the function to zero: = f NR + df NR d If we denote Δ =, and solve for ( - ) (8.3) D in (8.3): Δ = f NR / d df NR (8.4) We then add D to to obtain a new guess for the value of that satisfies the equation, obtain the derivative there, get the new function value, and iterate until the function value or residual, goes to zero. he process is illustrated in Fig. 8. for the eaple given in (8.) with a starting guess =.. 4 dy /d Second trial, =.469 First trial, = y Fig. 8. Newton-Raphson ethod on (8.) Nuerical results are: k f() df/d

3 For siple roots, NR has second order convergence. his eans that the nuber of significant figures in the solution roughly doubles at each iteration. We can see this in the above table where the value of at iteration has one significant figure (); at iteration 3 it has one (); at iteration 4 it has three (.4); at iteration 5 it has si (.469), and so on. We also see that the error in the residual, as indicated by the nuber of zeros after the decial point, also decreases in this fashion, i.e., the nuber of zeros roughly doubles at each iteration. 8.. Multiple Equations with Multiple Unknowns he NR ethod is easily etended to solve n equation in n unknowns. Writing the aylor series for the equations in vector for: = f NR = f NR = f nnr ( ) Δ ( ) Δ + f NR + f NR + ( f nnr ) Δ We can rewrite these relationships as: ( f NR ) ( f NR )! ( f nnr ) Δ = f NR f NR! f nnr (8.5) For X Syste, f NR f NR f NR f NR Δ Δ = f NR f NR (8.6) In (8.5) we will denote the vector of residuals at the starting point as f NR atri of coefficients as G. Equation (8.5) can then be written,. We will denote the GΔ = f NR (8.7) he solution is obviously 3

4 Δ = ( G - )f NR (8.8) 8..3 Using the NR ethod to Solve the Necessary Conditions In this section we will ake a very iportant connection we will apply NR to solve the necessary conditions. Consider for eaple, a very siple case an unconstrained proble in two variables with objective function, f. We know the necessary conditions are, f = (8.9) f = Now suppose we wish to solve these equations using NR, that is we wish to find the value * that drives the partial derivatives to zero. In ters of notation and discussion this gets a little tricky because the NR ethod involves taking derivatives of the equations to be solved, and the equations we wish to solve are coposed of derivatives. So when we substitute (8.9) into the NR ethod, we end up with second derivatives. For eaple, if we set f NR = f and f NR = f, then we can write (8.6) as, f f f f Δ Δ = f f (8.) or, é f f ù é f ù ê ê - ê éd ù = ê ê ê f f ëd û ê f ê ê- ë û ë û (8.) which should be failiar fro Chapter 3, because (8.) can be written in vector for as, and the solution is, HD =-Ñf (8.) - ( ) D =- H Ñf (8.3) 4

5 We recognize (8.3) as Newton s ethod for solving for an unconstrained optiu. hus we have the iportant result that Newton s ethod is the sae as applying NR on the necessary conditions for an unconstrained proble. Fro the properties of the NR ethod, we know that if Newton s ethod converges (and recall that it doesn t always converge), it will do so with second order convergence. Both SQP and IP ethods etend these ideas to constrained probles. 8.3 he Sequential Quadratic Prograing (SQP) Algorith 8.3. Introduction and Proble Definition he SQP algorith was developed in the early 98 s priarily by M. J. D. Powell, a atheatician at Cabridge University [3, 4]. SQP works by solving for where the KK equations are satisfied. SQP is a very efficient algorith in ters of the nuber of function calls needed to get to the optiu. It converges to the optiu by siultaneously iproving the objective and tightening feasibility of the constraints. Only the optial design is guaranteed to be feasible; interediate designs ay be infeasible. It requires that we have soe eans of estiating the active constraints at each step of the algorith. We will start with a proble which only has equality constraints. We recall that when we only have equality constraints, we do not have to worry about copleentary slackness, which akes thing sipler. So the proble we will focus on at this point is, Min f () (8.4) s.t. g i () = i,..., (8.5) 8.3. he SQP Approiation As we have previously entioned in Chapter 7, a proble with a quadratic objective and linear constraints is known as a quadratic prograing proble. hese probles have a special nae because the KK equations are linear (ecept for copleentary slackness) and are easily solved. We will ake a quadratic prograing approiation at the point k to the proble (8.4)-(8.5) given by, f a = f k + ( f k ) Δ + Δ L k Δ (8.6) g i,a = g k k i + ( g i ) Δ = i =,, (8.7) where the subscript a indicates the approiation. Close eaination of (8.6) shows soething unepected. Instead of Ñ f as we would norally have if we were doing a aylor approiation of the objective, we have Ñ L, the Hessian of the Lagrangian function with respect to. Why is this the case? It is directly tied to applying NR on the KK, as we will presently show. For now we will just accept that the objective uses the Hessian of the Lagrangian instead of the Hessian of the objective. 5

6 We will solve the QP approiation, (8.6)-(8.7), by solving the KK equations for this proble, which, as entioned, are linear. hese equations are given by, Ñf -ål Ñ g = (8.8) a i i, a i= g i,a = for i =,, (8.9) Since, for eaple, fro (8.6), f a = f k + L k Δ we can also write these equations in ters of the original proble, f k + L k Δ λ i g k i = (8.) g i k + g i k i= ( ) Δ= for i =,, (8.) hese are a linear set of equations we can readily solve, as shown in the eaple in the net section. We will want to write these equations even ore concisely. If we define the following atrices and vectors, J k = k ( g )! k ( g ) ( J k ) = g k k, ", g g k = g k! g k L k = f k λ i k g i i= We can write (8.)-(8.) as, L k Δ + ( J k ) λ = f k (8.) J k Δ = g k (8.3) Again, to ephasize, this set of equations represents the solution to (8.8)-(8.9). 6

7 8.3.3 Eaple : Solving the SQP Approiation Suppose we have as our approiation the following, édù é ùédù fa = 3+ [ 3 ] ê [ ] + D D ê ê ëd û ë ûëd û éd ù ga = 5+ [ 3] ê = ëd û (8.4) We can write out the KK equations for this approiation as, f a λ i g k i = = i= 3 + Δ g a = 5+ 3 Δ = We can write these equations in atri for as, Δ Δ λ 3 = (8.5) é -ùéd ù é-3ù ê -3 ê D = ê - ê ê ê êë 3 ê ûë l û êë- 5û (8.6) he solution is, éd ù é-.6ù ê D = ê -.8 ê ê êë l û êë.4 û (8.7) Observations: his calculation represents the ain step in an iteration of the SQP algorith which solves a sequence of quadratic progras. If we wanted to continue, we would add D to our current, update the Lagrangian Hessian, ake a new approiation, solve for that solution, and continue iterating in this fashion. If we ever reach a point where D goes to zero as we solve for the optiu of the approiation, the original KK equations are satisfied. We can see this by eaining (8.)-(8.). If D is zero, we have, f +!"# LΔ λ * g i i = (8.8) = i= 7

8 ( ) Δ g i + g!# " i $# which then atch (8.8)-(8.9). = = for i =,, (8.9) NR on the KK Equations for Probles with Equality Constraints In this section we wish to look at applying the NR ethod to the original KK equations. he KK equations for a proble with equality constraints only are, åli gi (8.3) i= Ñf - Ñ = g i = i =,, (8.3) Now suppose we wish to solve these equations using the NR ethod. he coefficient atri for NR will be coposed of the derivatives of (8.3)-(8.3). For eaple, the first row of the coefficient atri would be, ( f NR ) ( λ f NR ) where function f NR given by, Δ Δλ = f NR f NR (8.3) f NR = f g λ i i (8.33) i= If we substitute (8.33) into (8.3), the first row becoes, f i= λ i g i f g, λ i f g i,, λ i i= n i, g, g,, g i= n Recalling, L f å li gi i= Ñ =Ñ - Ñ And using the atrices, J k = k ( g )! k ( g ) ( J k ) = g k k, ", g 8

9 g k = g k! g k L k = f k λ i k g i i= we can write the NR equations in atri for as, L k ( J k ) J k k λ λ k = f k + ( J k ) λ k (g k ) (8.34) If we do the atri ultiplications we have and collecting ters, L k Δ + J k J k Δ = g k L k Δ + J k J k Δ = g k ( ) (λ λ k ) = f k + ( J k ) λ k (8.35) ( ) ( ) λ = f k (8.36) ( ) which equations are the sae as (8.)-(8.3)). hus we see that solving for the optiu of the QP approiation is the sae as doing a NR iteration on the KK equations. his is the reason we use the Hessian of the Lagrangian function rather than the Hessian of the objective in the approiation SQP: Inequality and Equality Constraints In the previous section we considered equality constraints only. We need to etend these results to the general case. We will state this proble as Min f ( ) s.t. g i he quadratic approiation at point k is: g i ( ) i =,,k (8.37) ( ) = i = k +,, Min f a = f k + ( f k ) Δ + ( Δ ) L k Δ s.t. gia, : g k k i + ( g i ) Δ i =,,...,k (8.38) 9

10 g k k i + ( g i ) Δ = i = k +,..., Notice that the approiations are a function only of. All gradients and the Lagrangian hessian in (7.4) are evaluated at the point of epansion and so represent known quantities. In the article where Powell [4] describes this algorith he akes a significant stateent at this point. Quoting, he etension of the Newton iteration to take account of inequality constraints on the variables arises fro the fact that the value of D that solves (8.36) can also be found by solving a quadratic prograing proble. Specifically, D is the value that akes the quadratic function in (8.38) stationary. Further, the value of l for the KK conditions is equal to the vector of Lagrange ultipliers of the quadratic prograing proble. hus solving the quadratic objective and linear constraints in (8.38) is the sae as solving the NR iteration on the original KK equations. he ain difficulty in etending SQP to the general proble has to do the with the copleentary slackness condition. his equation is nonlinear, and so akes the QP proble nonlinear. We recall that copleentary slackness basically enforces that either a constraint is binding or the associated Lagrange ultiplier is zero. hus we can incorporate this condition if we can develop a ethod to deterine which inequality constraints are binding at the optiu. An eaple of such a solution technique is given by Goldfarb and Idnani [5]. his algorith starts out by solving for the unconstrained optiu to the proble and evaluating which constraints are violated. It then oves to add in these constraints until it is at the optiu. hus it tends to drive to the optiu fro infeasible space. here are other iportant details to develop a realistic, efficient SQP algorith. For eaple, the QP approiation involves the Lagrangian hessian atri, which involves second derivatives. As you ight epect, we don't evaluate the Hessian directly but approiate it using a quasi-newton update, such as the BFGS update. Recall that updates use differences in and differences in gradients to estiate second derivatives. o estiate Ñ L we will need to use differences in the gradient of the Lagrangian function, å Ñ L =Ñf - l Ñg i i i= Note that to evaluate this gradient we need values for l i. We will get these fro our solution to the QP proble. Since our update stays positive definite, we don t have to worry about the ethod diverging like Newton s ethod does for unconstrained probles Coents on the SQP Algorith he SQP algorith has the following characteristics, he algorith is usually very fast.

11 Because it does not rely on a traditional line search, it is often ore accurate in identifying an optiu. he efficiency of the algorith is partly because it does not enforce feasibility of the constraints at each step. Rather it gradually enforces feasibility as part of the KK conditions. It is only guaranteed to be feasible at the optiu. Relative to engineering probles, there are soe drawbacks: Because it can go infeasible during optiization soeties by relatively large aounts it can crash engineering odels. It is ore sensitive to nuerical noise and/or error in derivatives than GRG. If we terinate the optiization process before the optiu is reached, SQP does not guarantee that we will have in-hand a better design than we started with Suary of Steps for SQP Algorith. Make a QP approiation to the original proble. For the first iteration, use a Lagrangian Hessian equal to the identity atri.. Solve for the optiu to the QP proble. As part of this solution, values for the Lagrange ultipliers are obtained. 3. Eecute a siple line search by first stepping to the optiu of the QP proble. So the new old initial step is, and = +D. See if at this point a penalty function, coposed of the values of the objective and violated constraints, is reduced. If not, cut back the step size until the penalty function is reduced. he penalty function is given vio by P= f +å l g where the suation is done over the set of violated constraints, and i= i i the absolute values of the constraints are taken. he Lagrange ultipliers act as scaling or weighting factors between the objective and violated constraints. 4. Evaluate the Lagrangian gradient at the new point. Calculate the difference in and in the Lagrangian gradient, g. Update the Lagrangian Hessian using the BFGS update. 5. Return to Step until is sufficiently sall. When approaches zero, the KK conditions for the original proble are satisfied Eaple of SQP Algorith Find the optiu to the proble, f = Min ( ) 4 s.t. ( ) ( ) g = ³ starting fro the point [ ] -,4. A contour plot of the proble is shown in Fig. 8.. his is siilar to Rosenbrock s function with a constraint. he proble is interesting for several reasons: the objective is quite eccentric at the optiu, the algorith starts at a point where

12 the search direction is pointing away fro the optiu, and the constraint boundary at the starting point has a slope opposite to that at the optiu. Fig. 8.. Contour plot of eaple proble for SQP algorith. Iteration We calculate the gradients, etc. at the beginning point. he Lagrangian Hessian is initialized to the identity atri. é ù At ( ) = [-, 4 ], f = 7, ( Ñ f ) = [ 8,6 ], Ñ L = ê ë û g ( g ) [ ] =.4375, Ñ =.5,.75 Based on these values, we create the first approiation, édù é ùédù fa = 7. + [ 8 6] ê [ ] + D D ê ê ëd û ë ûëd û g a éd ù = + [ ] ê ³ ëd û We will assue the constraint is binding. hen the KK conditions for the optiu of the approiation are given by the following equations: Ñf -lñ g = a a

13 g = a hese equations can be written as, ( ) ( ) 8+D - l.5 = 6+D - l.75 = D +.75D = he solution to this set of equations is D =-.5, D =-.5, l = 5. he proposed step is, é-ù é -.5 ù é-.5ù = +D = ê + = 4 ê -.5 ê.75 ë û ë û ë û Before we accept this step, however, we need to check the penalty function, vio P f l g = +å i= i i to ake sure it decreased with the step. At the starting point, the constraint is satisfied, so the penalty function is just the value of the objective, P = 7. At the proposed point the objective value is f =.5and the constraint is slightly violated with g =-.5. he penalty function is therefore, P = * -.5 =.75. Since this is less than 7, we accept the full step. Contours of the first approiation and the path of the first step are shown in Fig Fig. 8.3 he first SQP approiation and step. 3

14 Iteration At ( ) [ ] ( = - f = Ñ f ) = [- - ] g =-.5, ( Ñ g ) = [.5.75].5.75,.5, 8.., We now need to update the Hessian of the Lagrangian. o do this we need the Lagrangian gradient at and. (Note that we use the sae Lagrange ultiplier, l, for both gradients.) é8.ù é.5 ù é.5 ù Ñ L (, λ ) = ê -( 5.) = 6. ê.75 ê.5 ë û ë û ë û é-8.ù é.5 ù é-.5ù Ñ L (, λ ) = ê -( 5.) = -. ê.75 ê ë û ë û ë û é-. ù γ =ÑL (, λ ) -Ñ L (, λ ) = ê -7. ë û é-.5ù é-.ù é -.5 ù D = ê - =.75 ê 4. ê -.5 ë û ë û ë û Fro Chapter 3, we will use the BFGS Hessian update, Substituting: H ( ) ( ) ( ) ( ) k k k k k k γ γ H D D H k+ k = H + - γ D D H D k k k k k é-.ù é.. ùé -.5 ù é.. ù [-. -7.] [ ] é.. ù ê 7. ê.. ê.5 ê.. ë - û ë ûë- û ë û ê.. ë û é -.5 ù é.. ùé -.5 ù [-. -7.] ê [ ] -.5 ê.. ê -.5 ë û ë ûë û Ñ L = + - é.. ù é ù é.47.8ù Ñ L = ê.. + ê ê ë û ë û ë û é ù Ñ L = ê ë û 4

15 he second approiation is therefore, édù é ùédù fa =.5 + [-8. -.] ê [ ] + D D ê ê ëd û ë ûëd û g a éd ù =- + [ ] ê ³ ëd û As we did before, we will assue the constraint is binding. he KK equations are, ( ) ( ) D D - l.5 = D +.937D - l.75 = D +.75D = he solution to this set of equations is D =.645, D =- 5.48, l =-.65. Because l is negative, we need to drop the constraint fro the picture. (We can see in Fig. 8.4 below that the constraint is not binding at the optiu.) With the constraint dropped, the solution is, D =.7, D =- 5.3, l =. his gives a new of, é-.5ù é.7 ù é.57 ù = +D = ê + =.75 ê -5.3 ê ë û ë û ë û However, when we try to step this far, we find the penalty function has increased fro.75 to 7.48 (this is the value of the objective only the violated constraint does not enter in to the penalty function because the Lagrange ultiplier is zero). We cut the step back. How uch to cut back is soewhat arbitrary. We will ake the step.5 ties the original. he new value of becoes, é-.5ù é.7 ù é-.4965ù = +D = ê +.5 =.75 ê -5.3 ê ë û ë û ë û At which point the penalty function is So we accept this step. Contours of the second approiation are shown in Fig. 8.4, along with the step taken. 5

16 Iteration 3 Fig. 8.4 he second approiation and step. At ( ) [ ] ( = - - f = Ñ f ) = [- - ] g =-.674, ( Ñ g ) = [ ] , 7.367, 5..4, é-8.ù é.5 ù é-8.ù Ñ L (, λ ) = ê -( ) = -. ê.75 ê -. ë û ë û ë û é-5.ù é.493ù é-5.ù Ñ L (, λ ) = ê -( ) = -.4 ê.75 ê -.4 ë û ë û ë û ù γ =ÑL (, λ ) -Ñ L (, λ ) = ê -.4 ë û é-.4965ù é-.5ù é.4 ù D = ê - = ê.75 ê ë û ë û ë û é.898 Based on these vectors, the new Lagrangian Hessian is, é ù é ù é ù Ñ L = ê ê ê ë û ë û ë û 6

17 L = So our net approiation is, édù é ùéDù fa = [ ] ê [ ] + D D ê ê ëd û ë ûëd û g a éd ù =- + [ ] ê ³ ëd û he KK equations, assuing the constraint is binding, are, ( ) ( ) D D - l.493 = D +.43D - l.75 = D +.75D = he solution to this set of equations is D =.399, D =.846, l =.5. Our new proposed point is, 3 é-.4965ù é.399ù é-.3566ù = +D = ê + = ê.846 ê -.9 ë û ë û ë û At this point the penalty function has decreased fro 7.37 to We accept the full step. A contour plot of the third approiation is shown in Fig Fig. 8.5 he third approiation and step. 7

18 Iteration At ( ) = [- - ] f = ( Ñ f ) = [- - ] 3 3 g =-.954, ( Ñ g ) = [.3.75] , 5.859,.9.76, 3 é-5.ù é.493ù é-5.6ù Ñ L (, λ ) = ê -(.5) = -.4 ê.75 ê -.4 ë û ë û ë û 3 3 é-.9 ù é.3ù é ù Ñ L (, λ ) = ê -(.5) = -.76 ê.75 ê ë û ë û ë û ù γ =ÑL (, λ ) -Ñ L (, λ ) = ê.8475 ë û é.5 é-.3566ù é-.4965ù é.399ù D = ê - = -.9 ê ê.846 ë û ë û ë û Based on these vectors, the new Lagrangian Hessian is, 3 é ù é ù é ù Ñ L = ê ê ê ë û ë û ë û 3 é ù Ñ L = ê ë û Our new approiation is, édù é ùédù fa = [ ] ê [ ] + D D ê ê ëd û ë ûëd û g a éd ù =- + [ ] ê ³ ëd û he KK equations, assuing the constraint is binding, are, D D - l(.3) = ( ) D +.985D - l.75 = D +.75D = 8

19 he solution to this proble is, D =.699, D =-.474, l =.79. Since l is positive, our assuption about the constraint was correct. Our new proposed point is, 4 3 é-.3566ù é.699 ù é.533 ù = +D = ê + = -.9 ê ê ë û ë û ë û At this point the penalty function is 4.87, a decrease fro 5.85, so we take the full step. he contour plot is given in Fig he fifth step is shown in Fig Fig. 8.6 he fourth approiation and step. Fig. 8.7 he fifth approiation and step. 9

20 We would continue in this fashion until D goes to zero. We would then know the original KK equations were satisfied. he solution to this proble occurs at, ( * ) =.5.75, f * = 4.5 A suary of five steps is overlaid on the original proble in Fig Fig. 8.8 he path of the SQP algorith.

21 8.4 he Interior Point (IP) Algorith 8.4. Proble Definition In general, for the prial-dual ethod given here, there are two approaches to developing the equations for the IP ethod: putting slack variables into their own vector, and incorporating slack variables as part of the design variables. Both ethods can be found in the literature. We have opted for the forer, because the developent is soewhat ore straightforward. It is also the approach used in MALAB. he developent of the latter is given in a later optional section. For siplicity, we will start with a proble that only has inequality constraints. Min f ( ) (8.39) s.t. g i ( ) i =,, (8.4) We will add in slack variables to turn the inequalities into equalities: Min f ( ) (8.4) s.t. g i ( ) s i = i =,, (8.4) s i i =,..., (8.43) Slack variables are so naed because they take up the slack between the constraint value and the right hand side. For an inequality constraint to be feasible, s i ust be. If (8.4) is satisfied and s i =, then the original inequality is binding. he IP algorith eliinates the lower bounds on s by incorporating a barrier function as part of the objective: Min s.t. g i f µ = f ( ) µ ln(s i ) (8.44) i= ( ) s i = i =,, (8.45) We note that as s i approaches zero (fro a positive value, i.e. fro feasible space), the negative barrier ter goes to infinity. his obviously penalizes the objective and forces the algorith to keep s positive. he IP algorith solves a sequence of these probles for a decreasing set of barrier paraeters µ. Asµ approaches zero, the barrier becoes steeper and sharper. his is illustrated in Fig. 8.9.

22 Fig. 8.9 Barrier function µ ln() for µ =, and. We can define the Lagrangian function for this proble as, L(,s,λ) = f () µ i= ln(s i ) λ i (g i () s i ) (8.46) i= aking the gradient of this function with respect to, s, λ, and setting it equal to zero gives us the KK conditions for (8.44) (8.45), L = f () λ i g i () = (8.47) i= s L = s i λ i µ = i =,, (8.48) λ L = (g i () s i ) = i =,, (8.49) If we define e as the vector of s of diension and, S = s! s Λ = λ! λ we can replace (8.48) above with (8.5) below,

23 f () λ i g i () = (8.5) i= SΛe µe = (8.5) g i () s i = i =,, (8.5) It is worth noting that asµ, the above equations (along with λ,s ) represent the KK conditions for the original proble (8.39)-(8.4). Notice that with µ =, (8.5) (perhaps ore easily seen in (8.48)) epresses copleentary slackness for the slack variable bounds, i.e. either λ i = or s i = Proble Solution We will use the NR ethod to solve the set of equations represented by (8.5)-(8.5). he coefficient atri (see 8.5) will be the first derivatives of these equations. We note that we have n equations fro (8.5), equations fro (8.5) and equations fro (8.5). Siilarly, these equations are functions of n variables, variables l, and variables s. he coefficient atri for NR can be represented as, ( f NR ) ( s f NR ) ( λ f NR )!!! ( f (n+) NR ) s f (n+) NR ( ) ( λ f (n+) NR ) Δ Δs Δλ = r r r3 (8.53) where f NR (the first NR equation) is given by, f NR = f g λ i i (8.54) i= and r, r and r3 represent the residuals for (8.5) (8.5). If we substitute (8.54) into the first row of (8.53), the first row of the coefficient atri becoes, f λ g i f g,, i λ i i= n i, i= n,,!# " $#, g,..., g!######## "######## $!# #" ### $ f s f λ f Recalling, and L = f λ i g i (8.55) i= 3

24 J k = k ( g )! k ( g ) ( J k ) = g k k, " g we can write the NR iteration equations at step k as, L k ( J k ) Λ k S k J k I Δ k Δs k Δλ k = f k () i= λ i k g i k () S k Λ k e µ k e g k k i () s i (8.56) At this point, we could solve this syste of equations to obtain D k, Ds k, and Dl k. However, for large probles we can gain efficiency by siplifying this epression. If we look at row above, Λ k Δs k + S k Δλ k = S k Λ k e + µe (8.57) We would like to solve (8.57) for Ds k. Rearranging ters and pre-ultiplying both sides by ( Λ k ) gives, Δs k = ( Λ k ) S k Λe + µλ e Λ S k Δ k (8.58) For diagonal atrices, the order of atri ultiplication does not atter, so the first ter on the right hand side siplifies to give, Δs k = S k e + µλ e Λ S k Δ k (8.59) Eaining the first and second ters on the right hand side, we note that, S k e = s k! s n k Equation (8.59) becoes, " = s k µ(λ k ) e = µ λ k! µ λ k " = s k 4

25 Δs k = s k + s k (Λ k ) S k Δ k (8.6) = (Λ k ) S k Δ k We define Ω to be, Ω k = (Λ k ) S k (8.6) so that we have, Δs k = Ω k Δλ k (8.6) We can then write (8.56) as a reduced set of equations: L k J k ( J k ) Ω k Δ k Δλ k = f ()+ λ g () i i i= g i () s i (8.63) his atri represents a linear syetric syste of equations (syetric because the offdiagonal eleents are the transpose of one another), of lower diension than (8.56) and is ore efficiently solved. Once we have solved for Δλ k, we can use (8.6) to get Δs k he Line Search It is soeties said, the devil is in the details. hat is certainly true for the line search for the IP ethod. Modern algoriths eploy a nuber of line search techniques, including erit functions, trust regions and filter ethods [6, 7, 8]. hey include techniques to handle non-positive-definite hessians or otherwise poorly conditioned probles, or to regain feasibility. Soeties a different step length is used for the prial variables (, s) and the dual variables (l). We will adopt that strategy here as well. We can consider Δ, Δs, Δλ, as the search directions for, s, and l. We then need to deterine the step size (between -) in these directions, k+ = k +α k Δ k (8.64) λ k+ = λ k +α k Δλ k (8.65) s k+ = s k +α k Δs k (8.66) Siilar to SQP, a straightforward ethod is to accept a if it results in a decrease in a erit function that cobines the objective with a su of the violated constraints, i.e., P = f k +ν viol g i i= 5

26 where n is a constant. We will also reduce the step length if necessary to keep a slack variable or a Lagrange ultiplier positive Eaple : wo Variables, One Constraint We will apply the IP ethod to the sae eaple given for SQP in Section 8.3.8, naely, f = Min ( ) 4 s.t. ( ) ( ) g = ³ We reforulate the proble using a slack variable, s, for the constraint: f = Min ( ) 4 s.t. g ( ) = ( +.5) +.75 s = s We eliinate the lower bound for s by adding a barrier ter, Min f µ s.t. ( ) = f () µ ln(s ) ( ) = ( +.5) +.75 s = g At the starting point we have, ( ) = -,4, f =7, f ( ) = 8,6, g =.4375, g ( ) =.5,.75 We will assue we do not have second derivatives available, so we will begin with the Hessian set to L =. J = [.5,.75]. We will also set µ = 5, s =.4375, λ =. his value of l was picked to approiately satisfy (8.5), i.e. λ s µ =. If the erit function increases for a proposed step, we will cut the step in half and continue doing so several ties First Iteration Based on the data above, the coefficient atri (refer back to (8.56)) for the first step is, 6

27 For the residual vector, we have, f k () λ k i g k i () = i= = S k Λ k e µ k e = =.5 g k i () s k i = = hus the set of equations for our first step becoes, Δ Δs Δλ = he solution is Δ = [.93,.465], Δs = 3.44, Δλ =.73 Because the full step would ake the slack negative, we set the step length for and s to be.748 to keep the slack slightly positive. We accept the full step for l. At this trial point the objective has decreased to.78 but the constraint is violated at.47. However the erit function has decreased fro 7 to 4. and we accept the step. Our new point is = [.695,.57], s =., λ = Second Iteration At ( ) = [.695,.57], f =.78, ( f ) = [ ] g =.474, ( g ) = [.89.75] We start this step by updating the Hessian of the Lagrangian. o do so, we evaluate the Lagrangian gradient at and. We then calculate the g and D vectors. (As we did for SQP, we use the sae Lagrange ultiplier, l, for both gradients.) ( ) = 8. L,λ 6. ( 4.73).5.75 =

28 ( ) =.56 L,λ.435 ( 4.73) = = γ = L(,λ ) L(,λ ) = 4.8 Δ = We use this data to obtain the new estiate of the Lagrangian Hessian using BFGS Hessian update, as we did for SQP (see the SQP eaple for details). he new Hessian is, L = After evaluating the residuals, our net set of equations becoes, Δ Δs Δλ = he solution gives: Δ = [.4, 3.39], Δs =.8, Δλ = For l we only take.693 of the step to keep l. With the full step for and s, our erit function decreases fro 4. to.8. Our new point is, ( ) = [.59,.6], s =.3, λ =, f = 8.8, g =.988 Subsequent steps proceed in a siilar anner. he progress of the algorith to the optiu, with our relatively unsophisticated line search, is shown in Fig. 8. below. he algorith reaches the optiu in about nine steps. At every iteration we reduce µ according to the equation µ k+ = µ k. For coparison purposes, the progress of the Interior Point ethod in 5 fincon on this proble is shown in Fig

29 Eaple of Interior Point Method Fig 8.. he progress of the IP algorith on Eaple. 6 Eaple of Interior Point Method Fig. 8. Path of fincon IP algorith on Eaple. 9

30 8.4.5 Eaple : One Variable, wo Constraints In this eaple proble the functions are very siple, but the setup for the constraints is ore involved than the previous eaple. We will show how the proble changes when we have two slack variables. Min f = s.t. + 9 We will change the inequality and the bound to equality constraints by eans of slack variables so that we have, Min f = s.t. + 9 s = s = s, s We reove the bounds on the slacks by adding barrier ters, Min f µ = µ ln(s i ) i= s.t. + 9 s = s = By inspection, the solution to the proble is =, s = 7, s = ; f = First Iteration At our starting point = 3, ( s ) = [3,], f = 9, g =, g = We also have, f = [6], f = [], g = [ ], g = [], g = [], g = [] We will use µ =, α =.5, (λ ) = [,]. hus, S = 3 Λ = J = Because we only have one variable, the Hessian of the Lagrangian is just a atri (we use the actual second derivative here for siplicity): L = f λ g λ g = [] ()[] ()[] = [] With this inforation we can then build our coefficient atri, 3

31 he vector of residuals is, L k ( J k ) Λ k S k J k I = 3 f k () λ k i g k i () = [6] ()[ ] ()[] = 7 i= S k Λ k e µ k e = 3. = g k i () s k i = We can then solve for our new point as (recall alpha =.5), =.74, s g =, g =. ( ) = [4.65,.74], ( λ ) = [.83,.43] at which point f = 4.73, and *An Alternate Developent of the Newton Iteration Equations *his section is optional. As entioned at the start of the section on IP algoriths, there are two approaches to developing the NR equations: separating out the slack variables in their own vector, and including the slacks as part of the vector. Previously we kept the slacks separate. Now we will cobine the with. his developent follows the work by Wachter and Biegler [9, 4]. For siplicity, we will start with a proble which only has equality constraints. We will, however, include a lower bound on the variables: Min f ( ) (8.67) s.t. g i ( ) = i =,, (8.68) (8.69) As before, we eliinate the lower bounds by including the in the objective function with a barrier function: 3

32 Min s.t. g i f µ = f n ( ) µ ln( i ) (8.7) i= ( ) = i =,, (8.7) We now consider the necessary conditions for a solution to the barrier proble represented by (8.7)-(8.7). We first define, z i = µ i (8.7) We can write the KK conditions as, f () λ i g i () z = (8.73) i= g i () = i =,, (8.74) i z i µ = i =,,n (8.75) If we define e as the vector of s of n diension and, X =! n Z = z! z n we can replace (8.75) above with (8.78) below, f () λ i g i () z = (8.76) i= g i () = i =,, (8.77) XZe µe = (8.78) As µ, the above equations (along with z ) represent the KK conditions for the original proble. he variables z can be viewed as the Lagrange ultipliers for the bound constraints. Notice that with µ =, (8.78) epresses copleentary slackness for the bound constraints, i.e. either i = or z i = Proble Solution We will now construct the coefficient atri for the Newton iteration. We note that we have n equations fro (8.73), equations fro (8.77) and n equations fro (8.75). Siilarly, these equations are functions of n variables, variables l, and n variables z. 3

33 For eaple, the first row of the coefficient atri for NR could be represented as, where f NR is given by, ( f NR ) ( λ f NR ) ( z f NR ) Δ Δλ Δz = r (8.79) f NR = f g λ i i z i= (8.8) and r represents the residuals for (8.76). If we substitute (8.8) into atri (8.79), the first row of the coefficient atri is, f λ g i i z f g,, λ i i= n i z, g,, g,,..., i= n n!######### #" ########### $!# "# $!### "### $ f λ f z f Defining, L = f λ i g i + µx (8.8) i= we can write the NR iteration equations at step k as, L k ( J k ) I J k Z k X k Δ k Δλ k Δz k = r r r3 (8.8) he reader ight wish to copare this with (8.56). As before, we could solve this syste of equations to obtain D k, Dl k, and Dz k. However, we can siplify this epression. We will start by looking at the third set of equations in (8.56) above, Z k Δ k + + X k Δz k = X k Z k e + µe (8.83) where we have substituted in the actual residual value for r3, i.e., r3= X k Z k e + µe We would like to solve (8.83) for Dz k. Rearranging ters gives, 33

34 X k Δz k = X k Z k e + µe Z k Δ k (8.84) If we pre-ultiply both sides by (X k ) - we have, Δz k = Z k e + µ(x k ) e (X k ) Z k Δ k (8.85) Eaining the first and second ters on the right hand side, we note that, Z k e = z k! z n k Equation (8.85) becoes, " = z k µ(x k ) e = µ k! µ n k " = z k We define S to be, so that we have, Δz k = z k + z k (X k ) Z k Δ k = (X k ) Z k Δ k (8.86) S k = (X k ) Z k (8.87) Δz k = S k Δ k (8.88) Now we will eaine the first two rows of (8.8). hese represent equations, L k Δ k + ( J k ) Δλ k + IΔz k = r (8.89) J k Δ k = r (8.9) We see that Δz k only appears in (8.89). If we substitute (8.88) for Δz k and gather ters, (8.89) becoes, or in atri for, L k + S k Δ k + ( J k ) Δλ k = r (8.9) 34

35 L k + S ( J k ) J k Δ k Δλ k = r r (8.9) Once we have solved for Δ k, we can use (8.88) to get Δz k Eaple : Solving the IP Equations We will illustrate this approach on a siilar eaple proble used earlier: Min f = s.t. + 9 We will change the inequality constraint to an equality constraint by eans of a slack variable,, so that we have, Min f = s.t. + 9 =, We will eliinate the lower bounds by using a barrier forulation: Min f µ = µ ln( i ) i= s.t. + 9 = starting fro ( ) = [3, 3]. At this point, f = 9, ( f ) = [6,]; g =, ( g ) = [, ]. We will set µ = and λ =. his then gives, z = µ = and z = µ = Noting, L = f λ i g i + µx = i= () +.. J k = [, ] We can write the coefficient atri, 35

36 L k ( J k ) I J k Z k X k = By evaluating (8.76) through (8.78) we find the vector of residuals to be, r.333 r = r3 Solving this set of equations, gives Δ = , Δλ =.83, Δz = If we take the full step by adding these delta values to our beginning values, we have = , λ =.69, z = At this new point the constraint is satisfied ( g = ) and the objective has decreased fro 9 to he Generalized Reduced Gradient (GRG) Algorith 8.5. Introduction GRG works quite differently than the SQP or IP ethods. If started inside feasible space, GRG goes downhill until it runs into fences constraints and then corrects the search direction such that it follows the fences downhill. At every step it enforces feasibility. he strategy of GRG in following fences works well for engineering probles because ost engineering optius are constrained. For inforation beyond what is given here consult Lasdon et al. [3] and Gabriele and Ragsdell [3] Eplicit vs. Iplicit Eliination Suppose we have the following optiization proble, f = + 3 (8.93) Min ( ) 36

37 g = + - 6= (8.94) s.t. ( ) A contour plot is given in Fig. 8.9a. Fro previous discussions about odeling in Chapter, we know there are two approaches to this proble we can solve it as a proble in two variables with one equality constraint, or we can use the equality constraint to eliinate a variable and the constraint. We will use the second approach. Using (8.94) to solve for, = 6- Substituting into the objective function, (8.93), we have, f = + 3(6 - ) (8.95) Min ( ) Matheatically, solving the proble given by (8.93)-(8.94) is the sae as solving the proble in (8.95). We have used the constraint to eplicitly eliinate a variable and a constraint. Once we solve for the optial value of, we will obviously have to back substitute to get the value of using (8.94). he solution in is illustrated in Fig. 8.9b, where the sensitivity plot for (8.95) is given (because we only have one variable, we can t df show a contour plot). he derivative of (8.95) would be considered to be the reduced d gradient relative to the original proble. Usually we cannot ake an eplicit substitution as we did in this eaple. So we eliinate variables iplicitly. We show how this can be done in the net section. Fig. 8.9 a) Contour plot in, with equality constraint. he optiu is at = [ ] Fig. 8.9 b) Sensitivity plot for Eq he optiu is at =

38 8.5.3 Iplicit Eliination In this section we will look at how we can eliinate variables iplicitly. We do this by considering differential changes in the objective and constraints. We will start by considering a siple proble of two variables with one equality constraint, Min f ( ) = [ ] s.t. g ( ) = Suppose we are at a feasible point. hus the equality constraint is satisfied. We wish to ove to iprove the objective function. he differential change is given by, f f df = d + d (8.96) to keep the constraint satisfied the differential change ust be zero: g g dg = d + d = (8.97) Solving for d in (7.47) gives: d substituting into (7.46) gives, - g = d g df é f f æ g öù = ê - ç d ë è g øû (8.98) where the ter in brackets is the reduced gradient. df é R f f æ g öù i.e., = ê - ç d ë è g øû (8.99) If we substitute D for d, then the equations are only approiate. We are stepping tangent to the constraint in a direction that iproves the objective function GRG Algorith with Equality Constraints Only We can etend the concepts of the previous section to the general proble which we represent in vector notation. Suppose now we consider the general proble with equality constraints, 38

39 Min f ( ) s.t. g i ( ) = i =,, We have n design variables and equality constraints. We begin by partitioning the design variables into (n-) independent variables, z, and dependent variables y. he independent variables will be used to iprove the objective function, and the dependent variables will be used to satisfy the binding constraints. If we partition the gradient vectors as well we have, f ( z) = f y f ( ) z ( ) = f ( ) y f ( ) ( ) z f f ( ) z n ( ) f y y We will also define independent and dependent atrices of the partial derivatives of the constraints: ψ z = g z g z g z n g z g z g z n é g g g ù ê y y y y = ê y ê g g g ê y y y ë û We can write the differential changes in the objective and constraints in vector for as: df = f z ( ) dz + f ( y) dy (8.) d y y y = d + d = z z y y (8.) Noting that y y is a square atri, and solving (8.) for dy, - y y dy =- dz y z (8.) substituting (8.) into (8.), - y y df =Ñf ( z ) d z -Ñf ( y ) d y z z 39

40 or Ñ f =Ñf ( z) -Ñf ( y) R - y y y z (8.3) where Ñf R is the reduced gradient. he reduced gradient is the direction of steepest ascent that stays tangent to the binding constraints GRG Eaple : One Equality Constraint We will illustrate the theory of the previous section with the following eaple. For this eaple we will have three variables and one equality constraint. We state the proble as, Min f = s.t. g = = Step : Evaluate the objective and constraints at the starting point. he starting point will be = [ ], at which point f = 3 and g =. So the constraint is satisfied. Step : Partition the variables. We have one binding constraint so we will need one dependent variable. We will arbitrarily choose y =. he independent variables will therefore be z = as the dependent variable, so [ ] [ ]. hus, 3 é f ù éù ê éù é4 ù é f ù z = y = Ñ z = = 3 f ê 6 = ê Ñ = ê = = 3 y ë û ê ë ë û ë û ê ë 3 û [ ] f ( ) ê f ( ) [ 8 ] [ 6] y é g g ù y é g ù = ê = - = ê = z ë 3 û y ë û [ 4 ] [ ] Step 3: Copute the reduced gradient. We now have the inforation we need to copute the reduced gradient: - y y Ñ fr =Ñf ( z) -Ñf ( y) y z éù Ñ f R = [ 4 ] -[ 6] ê [ 4 -] ë û = - [ 8 ] Step 4: Copute the direction of search. 4

41 We will step in the direction of steepest descent, i.e., the negative reduced gradient direction, which is the direction of steepest descent which stays tangent to the constraint. é 8 ù s = ê - or, noralized, ë û é.837 ù s = ê -.58 ë û Step 5: Do a line search in the independent variables We will use our regular forula, new old z = z +as We will arbitrarily pick a starting step length a =.5 new é ù é ù é ù é ù ê.5 new = ê + = ê.58 ê.794 ë - 3 û ë û ë û ë û Step 6: Solve for the value of the dependent variable. We do this using (7.5) above, only we will substitute D y for dy: D y=- [ ] So the new value of is, - y y Dz y z - y y éd ù D =- y z ê D 3 ë û é ù é.469 ù =- ê [ 4 -] ê.96 ë û ë- û =-.959 = +D new old = =.4 at which point f = 8.9 and g =. We Our new point is = [ ] observe that the objective has decreased fro 3 to 8.9 and the constraint is still satisfied. his only represents one step in the line search. We would continue the line search until we reach a iniu GRG Algorith with Equality and Inequality Constraints In this section we will consider the general proble with both inequality and equality constraints, 4

42 Min f ( ) s.t. g i g i ( ) i =,,k ( ) = i = k +,, he etension of the GRG algorith to include inequality constraints involves soe additional copleity, because the derivation of GRG is based on equality constraints. We therefore convert inequalities into equalities by adding slack variables. he GRG algorith described here is an active constraint algorith only the binding inequality constraints are used to deterine the search direction. he non-binding constraints enter into the proble only if they becoe binding or violated Steps of the GRG Algorith for the General Proble. Evaluate the objective function and all constraints at the current point.. For any binding inequality constraints, add a slack variable, s i 3. Partition the variables into independent variables and dependent variables. We will need one dependent variable for each binding constraint. Any variable at either its upper or lower liit should becoe an independent variable. 4. Copute the reduced gradient using (8.3). 5. Calculate a direction of search. We can use any ethod to calculate the search direction that relies on gradients since the reduced gradient is a gradient. For eaple, we can use a quasi-newton update. 6. Do a line search in the independent variables. For each step, find the corresponding values in the dependent variables using (8.) with Dz and Dy substituted for dz and dy. 7. At each step in the line search, drive back to the constraint boundaries for any violated constraints using Newton-Raphson to adjust the dependent variables. If an independent variable hits its bound, set it equal to its bound. he NR iteration is given by y - y fro the calculation of the reduced gradient. 8. he line search ay terinate either of 4 ways y - D y=- ( g-b) We note we already have the atri y ) he iniu in the direction of search is found (using, for eaple, quadratic interpolation). 4

43 ) A dependent variable hits its upper or lower liit. 3) A forerly non-binding constraint becoes binding. 4) NR fails to converge. In this case we ust cut back the step size until NR does converge. 9. If at any point the reduced gradient in step 4 is equal to, the KK conditions are satisfied GRG Eaple : wo Inequality Constraints In this proble we have two inequality constraints and will therefore need to add in slack variables. Fig. 8. Eaple proble for GRG algorith f = + Min. ( ) g = s.t.: ( ) ( ) g = + - Suppose, to ake things interesting, we are starting at = [.5655, ] constraints are binding. Step : Evaluate functions. where both 43

44 Step : Add in slack variables. ( ) ( ) ( ) f = 5. g =. g =. We note that both constraints are binding so we will add in two slack variables. s, s. Step 3: Partition the variables Since the slack variables are at their lower liits (=) they will becoe the independent variables;, will be the dependent variables. z = [ s s ] y = [ ] Step 4: Copute the reduced gradient ( z ) [..] Ñ f ( y ) = [ 5.3.] Ñ f = y é ù = z ê ë û y é ù = y ê.. ë û y - é ù = y ê ë û thus - y y é ùé ù é ù = = y z ê ê ê ë ûë û ë û é ù Ñ f r = [..] -[ 5.3 ] ê ë û = - [..] [.5.56] [.5.56] = - - Step 5: Calculate a search direction. We want to ove in the negative gradient direction, so our search direction will be s = his is the direction for the independent variables (the slacks). When [ ] noralized this direction is = [.9.98] s. Step 6: Conduct the line search in the independent variables We will start our line search, denoting the current point as z, z = z +as Suppose we pick a =.. hen 44

45 z z é.ù é.9ù = ê (.). + ê.98 ë û ë û é.9ù = ê.98 ë û Step 7: Adjust the dependent variables o find the change in the dependent variables, we use (7.5) - D é y ù é ù é y ù D y= ê =- D D ê ê z y z at which point f ( ) =.5 ë û ë ûë û é ùé.9ù = ê ê.98 ë ûë û é-.394ù = ê ë û = =.68 = =-.48 Have we violated any constraints? ( ) ( ) ( ) ( ) = + - = - - =- g = + - 9= =.3 (violated) g (satisfied) We need to drive back to where the violated constraint is satisfied. We will use NR to do this. Since we don't want to drive back to where both constraints are binding, we will set the residual for constraint to zero. NR Iteration : ( n) ( ) y - y = y - ( g-b) y é.68 ù é ùé.3ù = ê.48 -ê.3.63 ê. ë- û ë- ûë û é.3 ù = ê -. ë û at this point 45

46 g g ( ) ( ) = =-. =-.98 NR Iteration : é.3 ù é ùé-.ù = ê. -ê.3.63 ê. ë- û ë- ûë û é.33ù = ê -.3 ë û evaluating constraints: g g ( ) ( ) = = =-.98 We are now feasible again. We have taken one step in the line search! Our new point is satisfied. é.33ù = ê -.3 ë û at which point the objective is.43, and all constraints are We would continue the line search until we run into a new constraint boundary, a dependent variable hits a bound, or we can no longer get back on the constraint boundaries (which is not an issue in this eaple, since the constraint is linear). 46

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance