ARE202A, Fall Contents - PDF Free Download

ARE202A, Fall 2005 LECTURE #2: WED, NOV 6, 2005 PRINT DATE: NOVEMBER 2, 2005 (NPP2) Contents 5. Nonlinear Programming Problems and the Kuhn Tucker conditions (cont) 5.2. Necessary and sucient conditions for a solution to an NPP (cont) 5.2.. Preliminaries: the problem of the vanishing gradient 2 5.2.2. Preliminaries: The relationship between quasi-concavity and the Hessian of f 5 5.2.3. Preliminaries: The relationship between (strict) quasi-concavity and (strict) concavity 6 5.3. Sucient Conditions for a solution to the NPP 8 5. Nonlinear Programming Problems and the Kuhn Tucker conditions (cont) 5.2. Necessary and sucient conditions for a solution to an NPP (cont) So far we've only established necessary conditions for a solution to the NPP. But we can't stop here. We could have found a minimum on the constraint set, and the same KKT conditions would be satised. In this lecture we focus on nding sucient conditions for a solution, and in particular, conditions under which the KKT conditions will be both necessary and almost but not quite sucient for a solution. The basic suciency conditions we're going to rely on are that the objective function f is strictly quasi-concave while the constraint functions are quasi-convex. But there are a lot of subtleties that we need to address. We begin with some preliminary issues.

2 LECTURE #2: WED, NOV 6, 2005 PRINT DATE: NOVEMBER 2, 2005 (NPP2) 5.2.. Preliminaries: the problem of the vanishing gradient. In addition to the usual quasi-concavity, quasi-convexity conditions, we have to deal with the familiar annoyance posed by the example: max x 3 on [ ; ]. Obviously this problem has a solution at x =. However, at x = 0, the KKT conditions are satised, i.e., gradient is zero, and so can be written as a nonnegative linear combination of the constraints vectors, with weights zero applied to each of the constraints, niether of which is satised with equality. That is, f 0 (x) = 0 which is the sum of the gradients of two constraints at zero, each weighted by zero. So without imposing a restriction that excludes function such as this, we cannot say that satisfying the KKT conditions is sucient for a max when the objective and constraint functions have the right \quasi" properties. To exclude this case, we could assume that f has a non-vanishing gradient. But this restriction throws the baby out with the bath-water: e.g., the problem max x( x) s.t. x 2 [0; ] has a global max at 0.5, at which point the gradient vanishes. So we want to exclude precisely those functions that have vanishing gradients at x's which are not unconstrained maxima. The following condition on f called pseudoconcavity in S&B (the original name) and M.K.9 in MWG does just this, in addition to implying quasi-concavity. 8x; x 0 2 X; if f(x 0 ) > f(x) then rf(x) (x 0 x) > 0: () Note that () says a couple of things. First, it says that a necessary condition for f(x 0 ) > f(x) is that dx = (x 0 x) makes an acute angle with the gradient of f. (This looks very much like quasi-concavity). Second, it implies that if rf( ) = 0 at x then f( ) attains a global max at x (2) since if not then there would necessarily exist x; x 0 2 X s.t. f(x 0 ) > f(x), and rf(x) (x 0 x) = 0 (x 0 x) = 0, violating ().

ARE202A, Fall 2005 3 Figure. pseudo-concavity implies quasi-concavity Our next result establishes precisely the relationship between pseudo-concavity and quasi-concavity: if f is C 2 then f satises () i f is quasi-concave and satises (2) (3) To prove the =) direction of (3), we'll show that (2) together with (: quasi-concavity) implies (: pseudo-concavity). Suppose that f satises (2) but is not quasi-concave, i.e., 9x 0 ; x 00 ; z 2 X such that z = x 0 +( )x 00, for some 2 (0; ) and f(x 00 ) f(x 0 ) > f(z). We will show that f violates () at a point x near z. Since f does not obtain a global maximum at z, (2) implies that rf(z) 6= 0. Therefore by continuity, we can pick > 0 suciently small and x = z + rf(z) such that f(x) < f(x 0 ) and rf(x) rf(z) > 0. As Fig. indicates, there are now two cases to consider: the angle between rf(x) and x 0 x 00 is either 90 or > 90. First suppose that rf(x) (x 0 x 00 ) 0. In this case, we have rf(x) (x 0 x) = rf(x) (x 0 (z + rf(z))) = rf(x) (( )(x 0 x 00 ) rf(z))) = ( ) rf(x) (x 0 x 00 ) {z } 0 by assumption rf(x)rf(z) {z } >0 by construction < 0

4 LECTURE #2: WED, NOV 6, 2005 PRINT DATE: NOVEMBER 2, 2005 (NPP2) On the other hand, if rf(x) (x 00 x 0 ) < 0, repeat the above argument to conclude that rf(x) (x 00 x) < 0. We have thus identied a point x such that f(x 00 ) > f(x 0 ) f(x) such that either rf(x) (x 0 x) < 0 or rf(x) (x 00 x) < 0, verifying that () is violated. To prove the ( direction of (3), we'll show that (2) together with (: pseudo-concavity) implies (: quasi-concavity). Assume that there exists x; y such that f(x 0 ) > f(x) but rf(x) (x 0 x) 0.. Since x is not a global maximizer of f, (2) implies that rf(x) 6= 0. By continuity, we can pick > 0 suciently small that for y = x 0 rf(x), f(y) > f(x). We'll show that a portion of the line segment joining y and x does not belong to the upper contour set Figure 2. pseudo-concavity rules this out of f corresponding to f(x), proving that f is not quasi-concave. We have rf(x) (y x) = rf(x) ((x 0 rf(x)) x) = rf(x) (x 0 x) {z } 0 by assumption jjrf(x)jj 2 {z } >0 < 0 Now, let dx = (y x). For all, rf(x)dx = rf(x) (y x) < 0. Taylor-Young's theorem then implies that if f is suciently small, f(x + dx ) < f(x), establishing that a portion of the line segment joining x and y does not belong to the upper contour set of f corresponding to f(x). The following modication, changing only the rst strict inequality to a weak inequality, gives us a condition that implies strict quasi-concavity. 8x; x 0 2 X s.t. f(x 0 ) f(x); rf(x) (x 0 x) > 0: ( 0 ) Conclude that pseudo-concavity is a much weaker assumption than the non-vanishing gradient condition, and will give us just enough to ensure that the KT conditions are not only necessary

ARE202A, Fall 2005 5 but sucient as well. In particular, pseudo-concavity admits the possibility that our solution to the NPP may be unconstrained. 5.2.2. Preliminaries: The relationship between quasi-concavity and the Hessian of f. Recall from earlier that an alternative way to specify that f is strictly (weakly) concave is to require that the Hessian of f be everywhere negative (semi) denite. There is an analogous way to specify that f is strictly (weakly) quasi-concave which involves a restricted kind of deniteness property for f. The following result gives a sucient condition for strict quasi concavity. Theorem (SQC): A sucient condition for f : R n! R to be strictly quasi-concave is that for all x and all dx such that rf(x) 0 dx = 0, dx 0 Hf(x)dx < 0. A sucient condition for the above hessian property is the following condition on the leading principal minors of the bordered hessian of f: for all x, and all k = ; :::; n, the sign of the k'th leading principal minor of the following bordered matrix must have the same sign as ( ) k : 2 3 0 rf(x) 0 6 7 4 5. where the k'th leading principal minor of this matrix is the determinant of the rf(x) Hf(x) top-left (k + ) (k + ) submatrix. It's important to emphasize that to guarantee strict quasiconcavity (or any global concavity/convexity property for that matter), the hessian property has to hold for all x in the domain of the function. By extension, the leading principal minor property has to hold for all x in order to guarantee strict quasi-concavity. Note also that the above condition isn't necessary for strict quasi-concavity: the usual example, f(x) = x 3, establishes this: f is strictly quasi-concave, but at x = 0, and all dx, rf(x)dx = 0, while dx 0 Hf(x)dx = 0.

6 LECTURE #2: WED, NOV 6, 2005 PRINT DATE: NOVEMBER 2, 2005 (NPP2) To prove Theorem (SQC), assume that the condition of the Theorem is satised and pick arbitrary points x; x 0 in the domain of f such that f(x 0 ) f(x). Let Y = fy = x + ( )x 0 : 2 [0; ]g and let Y denote the set of minimizers of f on Y. We'll prove that Y fx; x 0 g. A consequence of this fact is that for all 2 (0; ), f(x+( )x 0 ) > f(x), implying that f is strictly quasi-concave. To establish that Y fx; x 0 g, consider an arbitrary interior point y of Y, i.e., i.e., some point y = x + ( )x 0 ), 2 (0; ). Clearly, a necessary condition for y to minimize f( ) on Y is that rf(y)(y x) = 0. (Otherwise you could move along the line Y in one or other direction and increase f.) Assume therefore that this condition is satised at y. But in this case, for all y 0 2 Y, rf(y)(y 0 y) = 0. and hence, by the condition of the theorem above, (y 0 y) 0 Hf(y)(y 0 y) < 0. Now apply the Taylor-Young theorem, for k = 2, to observe that for y 0 2 Y suciently close to y, sign(f(y 0 ) f(y)) = sign rf(y)(y 0 y) + (y 0 y) 0 Hf(y)(y 0 y) < 0 Conclude that if y satises the necessary condition for an interior minimum of f on Y, then y is in fact a local maximizer of f on Y. This completes the proof.. 5.2.3. Preliminaries: The relationship between (strict) quasi-concavity and (strict) concavity. A sucient condition for strict concavity is that for all x, dx 0 Hf(x)dx < 0, and all dx 6= 0. For strict quasi-concavity, we only require this property of the Hessian holds for vectors that are orthogonal to r(x). This seems much weaker, innitely weaker in fact. However, it's not quite as weak as it looks: the condition on orthogonal vectors also has implications for dx's that are almost orthogonal to r(x), and we need these implications in order to prove that the KT conditions are sucient for a solution. Specically, if rf(x)dx = 0 implies dx 0 Hf(x)dx < 0, then by continuity, for dx 6= 0 such that jrf(x)dxj is suciently small, dx 0 Hf(x)dx < 0 (4) Why is (4) so important? The problem we face, once again, is that in order to have a local maximum, you need an -ball around x such that f( ) is strictly less than f(x) on the intersection

ARE202A, Fall 2005 7 of this ball with the constraint set. Now as we've discussed over and over again, you can't nd this ball just by using rst order conditions. You need your second order conditions to be cooperative at the point where the rst order conditions fail you. If they don't cooperate, then for any given -ball, there are going to be dx's that make an acute angle with r(x) for which your rst order conditions won't be inadequate. To see the point, suppose for the moment that the following were the case, for some f : R n! R and some > 0. Fortunately for all of us it cannot happen. (It's ruled out by (4).) Unfortunately for me, it's so unable to happen that I can't even draw it. for all x and all dx 6= 0 such that rf(x)dx = 0, dx 0 Hf(x)dx < 0 (5) for all x and all dx 6= 0 such that rf(x)dx 6= 0, dx 0 Hf(x)dx > Now, for any x satisfying the KKT, we can nd dx such that rg j (x)dx < 0 for all j, and rf(x)dx is an extremely small negative number. Now suppose that (5) could happen. In this case, if we considered dx, for > 0 extremely small, we would nd that the second order term in the Taylor expansion would dominate the rst term, with the result that f(x + dx) > f(dx). In short, if (5) could happen, then the KKT conditions plus the hessian condition above plus quasi-convexity of the constraint functions would not be sucient for a solution. How small is \suciently small" in (4)? The following example shows that the requirement for suciently small gets tougher and tougher, the less concave is f. Example: Consider the function f(x; y) = (xy), which is strictly quasi-concave but not concave for > 0:5. We'll illustrate that regardless of the value of, dx 0 Hf dx < 0, for any vector that is almost orthogonal to rf, but that the criterion of \almost" gets tighter and tighter as gets

8 LECTURE #2: WED, NOV 6, 2005 PRINT DATE: NOVEMBER 2, 2005 (NPP2) larger. That is, the higher is (i.e., the less concave is f), the closer to orthogonal does dx have to be in order to ensure that dx 0 Hf dx is negative. We have rf(x ; x 2 ) = x x 2 ; x x 2 Evaluated at (x ; x 2 ) = (; ), we have Hf(; ) = 2 6 4 Note that! as!. 2 3 ( )x 2 x 2 x x2 and Hf(x ; x 2 ) = 6 7 4 x x 2 ( )x x 2 2 5. 2 3 2 3 7 5 = 2 6 7 4 5, where =. Now choose a unit length vector dx and consider dx 0 Hf dx = (dx 2 +dx 2 2)+2dx dx 2 = (dx +dx 2 ) 2 ( )(dx 2 +dx 2 2) = (dx +dx 2 ) 2 ( ) For dx such that x = x 2, dx 0 Hf dx < 0, for all <, verifying that f is strictly quasi-concave. However, the closer is to unity, the smaller is the set of unit vectors for which dx 0 Hf dx < 0. 5.3. Sucient Conditions for a solution to the NPP The following theorem gives sucient conditions for a solution (not necessarily unique) to the NPP. Theorem (S): (Sucient conditions for a solution to the NPP): If f satises () and the g j 's are quasi-convex, then a sucient condition for a solution to the NPP at x 2 R m + is that there exists a vector 2 R m + such that rf(x) T = T Jg(x) and has the property that j = 0, for each j such that g j (x) < b j.

ARE202A, Fall 2005 9 Note that Theorem (S) doesn't guarantee that a solution exists. Need compactness for this. Note also that the sucient conditions are like the necessary conditions, except that you don't need the CQ but do need the quasi-concavity/() and quasi-convexity stu. (MWG's version of Theorem (S) Theorem M.K.3 is just like mine except that they do include the constraint qualication. This addition is unnecessary (they're not wrong, they just have a meaningless additional condition). The C.Q. says that you can have a maximum without the non-negative cone condition holding. If you assume as in (S) that this condition holds, then, obviously, you don't need to worry that perhaps it mightn't hold! Theorem (S) would also be true if the words \f satises ()" were replaced by the stronger condition \f is quasi-concave and rf( ) is never zero." Since quasi-concavity is easier to work with, I'll discuss this modied, though less satisfactory, theorem. Moreover, if f is strictly quasi-concave or condition ( 0 ) is satised, then we get a unique solution. Sketch of the proof of Theorem (S) for the case of one constraint: Suppose that the KT conditions are satised at (x; ), along with the other conditions of the suciency theorem, i.e., quasi-concave objective, quasi-convex constraint functions and nonvanishing gradient. In fact, we are going to assume that the constraint function is strictly quasi-convex, just to make things a bit easier. First note that if is zero, then rf(x) is zero also. But then we're done, because by (2), f must attain a global and hence constrained max at x. Now assume that > 0. This in turn (complementary slackness) implies that x is on the boundary of the constraint set, i.e., g(x) = b. In this case, the Kuhn Tucker conditions say that rf(x) must be pointing in the same direction as rg(x).

0 LECTURE #2: WED, NOV 6, 2005 PRINT DATE: NOVEMBER 2, 2005 (NPP2) Figure 3. Sucient conditions for a solution to the NPP How does this guarantee us a max? We'll show that the KT conditions plus nonvanishing gradient plus strict quasi-concavity (applied locally) guarantee a local max, and the quasi-concavity/quasiconvexity conditions (applied globally) do the rest. To establish a strictly local max, we have to show that there exists a ball about x such that no matter where we move within that ball, we either decrease the value of f or move outside the constraint set. To nd the right ball, we proceed as follows () Since f and g are, respectively, strictly quasi-concave and strictly quasi-convex, we know that dx 0 Hf dx < 0 and dx 0 Hg dx > 0, for dx's that are orthogonal to the direction that the gradients of f and g both point in. By continuity, there exists an interval around 90 degrees such that for any vector dx that makes an angle in this interval with rf (i.e., lives in the cone-shaped object C in Fig. 3), dx 0 Hf dx < 0, while dx 0 Hg dx > 0.

ARE202A, Fall 2005 (2) Next note that there exists > 0 such that for dx's in B(x; ) but not in the cone-like set C the rst term in the Taylor expansion about x in the direction dx determines the sign of the entire expansion. You should, hopefully, understand well, by now, why without excluding the cone we couldn't nd such an.! (3) Finally, note that there exists 0 < 2 such that for dx's in B(x; 2 ) and in the cone-like set C, the rst two terms in the Taylor expansion about x in the direction dx determines the sign of the entire expansion. There are four classes of directions to consider: () if we move from x in a direction that makes a big enough acute angle with the gradient vector, rg(x) (e.g., dx ), then by the rst order version of Taylor's theorem, we increase the value of g, i.e., move outside the constraint set. (2) if we move from x in a direction that makes a big enough obtuse angle with the gradient vector, rf(x) (e.g., dx 4 ), then by the rst order version of Taylor's theorem, enough we reduce the value of f, (3) if we move from x in a direction dx that makes an barely acute angle with the gradient vector, rg(x), i.e., we move in a direction almost perpendicular to the gradient vector, (e.g., dx 2 ), by the second order version of Taylor's theorem, enough we increase the value of g (since both of the rst two terms in the expansion of g are positive). (4) if we move from x in a direction dx that makes an barely obtuse angle with the gradient vector, rf(x), i.e., we move in a direction almost perpendicular to the gradient vector, (e.g., dx 3 ), by the second order version of Taylor's theorem, enough we reduce the value of f (since both of the rst two terms in the expansion of f are negative). So we've shown that if we move a distance less than 2 in any possible direction away from x, either f goes down or g goes up. Conclude that we have a local max.

2 LECTURE #2: WED, NOV 6, 2005 PRINT DATE: NOVEMBER 2, 2005 (NPP2) Of course, these derivative arguments only guarantee a local maximum on the constraint set. That is, there exists a neighborhood of x such that f is at least as large at x as it is anywhere on the intersection of this neighborhood with the constraint set. Quasi-concavity and quasi-convexity are needed to ensure that x is indeed a solution to the NPP, i.e., a global max on the constraint set. Quasi-concavity says that if there is a point x 0 anywhere in the constraint set at which f attained a strictly higher value than at x, then f will be strictly higher than at x on the whole line segment joining x 0 to x. Quasi-convexity guarantees that the constraint set is convex, so that if x 0 is in the constraint set, then so is the whole line segment. But this means that if there exists a point such as x 0, then on any neighborhood of x there will be points at which f is strictly higher than it is at x, contradicting the fact that we have a local max. We'll make the above informal argument precise, for the special case of maximizing f(x) such that g(x) b, where f is strictly quasi-concave with non-vanishing gradient and g : R n! R is linear. Further more we'll make the assumption that v 0 Hf(x)v < 0 for all x and all v such that rf(x)v = 0. Recall that this condition is sucient for quasi-concavity, but not necessary. We don't want to have to deal with the case where it isn't satised. Suppose that x satises the KKT conditions, i.e., rf(x) and rg(x) are colinear. We need to show that there exists > 0 such that for all dx 2 B(0; ), g(x + dx) b implies f(x + dx) f(x) < 0. Let S denote the unit hypersphere, i.e., S = fv 2 R n : jjvjj = g. Recall from the lecture on Taylor expansions that if f is thrice continuously dierentiable, then for all x, for all v 2 S there exists (v) 2 [0; ] such that f(x + v) f(x) = rf(x)v + v 0 Hf(x)v=2 + Tf 3 (x + (v)v; v)=6 {z } a cubic remainder term

ARE202A, Fall 2005 3 so that for > 0 and dx = v, f(x + dx) f(x) = 2 rf(x)v= + v 0 Hf(x)v=2 + Tf 3 (x + (v)v; v)=6 By continuity and the fact that S is compact, there exists! such that for all v 2 S, jv 0 Hf(x)v=2j <! and jtf 3 (x + (v)v; v)=6j <!. Because f is strictly quasi-concave, there exists > 0 such that rf(x)v = 0 implies v 0 Hf(x)v=2 < 2. By continuity, therefore, there exists > 0 such that for all v 2 S, jrf(x)vj < implies v 0 Hf(x)v=2 <. Let = min[; ; ]=2! and observe that for all v 2 S and all 0 < <, if dx = v then 8 jrf(x)vj implies: >< jrf(x)v=j > 2!; jv 0 Hf(x)v=2j <!; >: jtf 3 (x + (v)v; v)=6j <! 8 while jrf(x)vj < implies: >< jv 0 Hf(x)v=2j > ; >: jtf 3 (x + (v)v; v)=6j < Therefore, for all v 2 S, all 0 < <, 8 sign f(x + dx) f(x) = >< sign rf(x)v if rf(x)v >: sign v 0 Hf(x)v=2 if < rf(x)v 0 It follows, therefore, that for all v 2 S all 0 < <, rf(x)v 0 implies f(x + dx) f(x) < 0. Moreover, since the KKT conditions are satised, rf(x)v > 0 implies rg(x)v > 0. Now by assumption, rf(x) 6= 0 so that g(x) = b. Since g is linear, rf(x)v > 0 implies g(x + dx) > b (i.e., x+dx is outside of the constraint set so we don't have to worry about it). We have, therefore, show The function h(v) = v0 Hf(x)v is continuous w.r.t v; since S is compact, it follows from Weierstrass's theorem that h( ) attains a maximum at, say v. Moreover, since h( ) is negative on S, h(v) < =2 < 0, for some > 0. Now consider the function Tf 3 : B(x; ) S! R, where Tf 3 (x 0 ; v) is the third order Taylor term centered at x 0 in the direction v. Since f is thrice continuously dierentiable, Tf 3 ( ; ) is a continuous function. Since B(x; ) S is compact, Tf 3 ( ; ) is bounded. Hence there exists! > 0 such that for all v 2 S, jtf 3 (x + (v)v; v)=6j <!.

4 LECTURE #2: WED, NOV 6, 2005 PRINT DATE: NOVEMBER 2, 2005 (NPP2) that f attains a local maximum on the constraint set. Moreover, since the constraint set is convex and f is quasi-concave, a local maximum is a global maximum. It follows that x is a solution to the NPP.