ARE202A, Fall 2005 CONTENTS. 1. Graphical Overview of Optimization Theory (cont) Separating Hyperplanes 1

AREA, Fall 5 LECTURE #: WED, OCT 5, 5 PRINT DATE: OCTOBER 5, 5 (GRAPHICAL) CONTENTS 1. Graphical Overview of Optimization Theory (cont) 1 1.4. Separating Hyperplanes 1 1.5. Constrained Maximization: One Variable. 3 1.5. Unconstrained Maximization: Several Variables. 4 1.6. Introduction to Taylor s theorem 11 1.7. Level Sets, upper and lower contour sets and Gradient vectors 14 1. GRAPHICAL OVERVIEW OF OPTIMIZATION THEORY (CONT) 1.4. Separating Hyperplanes A very important property of convex sets is that if they are almost disjoint more precisely, the intersection of their interiors is empty then they can be separated by hyperplanes. For now, we won t be technical about what a hyperplane is. It s enough to know that lines, planes and their higher-dimensional equivalents are all hyperplanes. As Fig. 4 illustrates, this property is completely intuitive in two dimensions. In the top two panels, we have convex sets whose interiors do not intersect, and we can put a line between them. Note that it doesn t matter if the sets are disjoint or not. However, if the sets are disjoint, then there will be many hyperplanes that separate them. if they are not disjoint, but their interiors do not intersect, then there may be a unique hyperplane separating them. What condition guarantees a unique hyperplane? Differentiability of the boundaries of the sets. 1

LECTURE #: WED, OCT 5, 5 PRINT DATE: OCTOBER 5, 5 (GRAPHICAL) DISJOINT INTERIORS HAVE EMPTY INTERSECTION INTERIORS INTERSECT ONE SET ISN T CONVEX FIGURE 4. Convex sets whose intersection is empty can be separated by a hyperplane In the bottom two panels, we can t separate the sets by a hyperplane. In the bottom left panel, it s because the sets have common interior points; in the bottom right it s because one set isn t convex. Why do we care about them? They crop up all over the place in economics, econometrics and finance. Two examples will suffice here (1) The budget set and the upper contour set of a utility function () The Edgeworth box Telling you any more would mean teaching you economics, and I don t want to do that!!

AREA, Fall 5 3 f f is defined on the interval [x, x] f f is defined on the interval [, ) Three Local Maxima Two Local Maxima x x x x FIGURE 5. Constrained Optima of a one-dimensional function 1.5. Constrained Maximization: One Variable. Question: So far, we ve been declaring that a necessary condition for a local maximum at x is that f (x ) =. That is, if the slope isn t zero at x then I know I don t have a local maximum at x. Now that was true, because of the way in which I defined the function f, but it was only true given this caveat. What was the critical part of the definition of f? In other words, under what conditions can a differentiable function f have a local maximum at x but the slope isn t zero at x? Answer: Whenever you are looking for a maximum on, say, an interval of the real line, the above conditions needn t hold. In this case, we say that we are looking for a constrained maximum. The constraint is the fact that the maximum you are looking for must lie in a specific set. Typically, in economic problems, the maximization problems we have to solve are constrained maximization problems. Very rarely, are economic problems completely unconstrained. (1) Most obvious constraint is that in many cases, the variable that you are maximizing must be nonnegative (i.e., positive or zero). () Might also be the case that you have bounds on either end of the problem, e.g., find the best location for a gas station between San Francisco and Los Angeles. Consider Fig. 5.

4 LECTURE #: WED, OCT 5, 5 PRINT DATE: OCTOBER 5, 5 (GRAPHICAL) (1) The graph on the left is defined on the interval [x, x]. It has three local maxima but only for one of them is the slope zero. What are the conditions? (a) A necessary condition for f to attain a maximum at x is that the slope of f at x is zero or negative. (b) A necessary condition for f to attain a maximum at x is that the slope of f at x is zero or positive. (c) A necessary condition for f to attain a maximum at x strictly between these points is that the slope of f is zero. () The graph on the right is defined on the interval [, ). In this case, we only have to worry about the point zero and all other points. What are the conditions? (a) A necessary condition for f to attain a maximum at is that the slope of f at zero is nonpositive. (b) A necessary condition for f to attain a maximum at a positive number x is that the slope of f at x is zero. These conditions are known as the Kuhn-Tucker necessary conditions for a constrained maximum. (Actually, they are known as the Karush-Kuhn-Tucker conditions. Karush was a graduate student who was involved in developing the conditions, but his name has been lost in history.) 1.5. Unconstrained Maximization: Several Variables. Fig. 6 above is the graph of a function f which has a nice global maximum in the interior of the the domain. We are interested again in the relationship between the slope of the function f at a point x and the extrema (maxima and minima) of the function. In this case, it s less clear what we mean by the slope of a function of several variables. If the function has n variables, then the slope of the function is an n-component vector. Each component of the vector denotes the slope of the function in a different direction. That is, slice the graph through the maximum point, (a) in a direction parallel to the horizontal axis, and (b) in a direction parallel to the vertical axis. Each time you do this, you get a single variable function. The slope you get from (a) is the first component of the vector of slopes. It s also known as the first partial derivative of the function with respect to the first variable The slope you get from (b) is the second component of the vector of slopes.

AREA, Fall 5 5 5 4 3 1 4 6 8 1 4 6 8 1 FIGURE 6. Maximum of a function of two variables 4 3 1-1 1 3 4 5 5 4 3 1 4 6 8 1 4 6 8 1 FIGURE 7 It s also known as the first partial derivative of the function with respect to the second variable The vector itself is sometimes called the derivative of f at x. Also it is referred to as the gradient of f, evaluated at x. It is denoted by f (x), where the bold symbols denote vectors. In other words, the notation used for the slope, the first derivative and the gradient of f is the same. Now we can go back to the issue of unconstrained maximization. Question: Suppose I know that f attains an (unconstrained) local maximum at x, i.e., a maximum at an interior point of the domain of f. What can I say then about the slope of f at x, i.e., the vector f (x )?

6 LECTURE #: WED, OCT 5, 5 PRINT DATE: OCTOBER 5, 5 (GRAPHICAL) f 5 5 1.5 1.5.5.5 y 1 1 x FIGURE 8. Zero partials don t necessary guarantee zero slopes in other directions Answer: The slope of f has to be zero at x, i.e., f (x ) =. In this case, the statement f (x ) = says that the whole vector is zero. Note: There are lots of directions and the derivative only picks two of them, i.e,. the directions parallel to the axes. It turns out that if certain conditions are satisfied a sufficient (but not necessary) condition is that each of the partials is continuous function 1 then knowing that the two partial derivatives are zero guarantees that the slope of the function is zero in every direction. To see that this isn t true generally, consider the if x = y function f(x,y) = graphed in Fig. 8. The partial derivatives of this function are well otherwise xy x y defined for all x,y, provided x y, and when either x or y is zero, both derivatives are zero. However, clearly you don t have a max/min or that derivatives in other directions will be zero. The problem with this example is that in a neighborhood of zero, the partial derivatives of f aren t continuous functions. To demonstrate this, Fig. 9 plots f x ( ) as we go around a circle centered at zero, with very small radius δ >. On the horizontal axis we measure the angle θ between x and the vector (1,). The diagonals occur at 45, 135, etc. As the figure illustrates, the derivative is well defined everywhere except along the diagonals, but since the same picture holds for all positive δ, the function is not continuously differentiable in a neighborhood of zero. Summary: A necessary condition for f to attain a local maximum at x in the interior of its domain is that the slope of every cross-section of the function is zero. If each of the partial derivatives of f( ) is continuous at x, then zero slopes in the directions of each axis implies zero slopes in every direction. 1 Marsden and Tromba, Vector Calculus, Third Edition, Theorem 9, p.17

AREA, Fall 5 7 8 6 4 fx(δsin(θ), δcos(θ)) 4 6 8 45 9 135 18 5 7 315 36 θ FIGURE 9. Plot of f x ( ) going around a circle of radius δ Knowing that the slope of every cross-section is zero at x does not in general imply that f( ) attains a local maximum at x. Fig. 1 below is an example in which the graph of the function is flat at x, i.e., the slopes of all of the cross-sections are zero, but you don t have a local maximum at x. In this example, you have what s called a saddlepoint of the function. 1-1 4-4 - 4-4 - FIGURE 1. Saddle Point of a function of two variables

8 LECTURE #: WED, OCT 5, 5 PRINT DATE: OCTOBER 5, 5 (GRAPHICAL) Question: Suppose I know that the slope of f is zero at x. What additional information about the slope of f at x do I need to know that f attains a strict local maximum at x? First shot at an answer: A first guess is that the slopes of all of the partial derivatives have to be decreasing, i.e., the second order partials must all be negative. This condition would rule out the possibility of a saddle point. Turns out however that it isn t enough. Reason is that just looking at partials doesn t give you information about what s going on with the other directional derivatives. Imagine a graph where if you took circular cross-sections, i.e., sliced the graph with a circular cookie-cutter, what you got when you laid out the cut was a sine curve, with zeros corresponding to the points at the axes. Here you might think you had a weak local maximum if you just looked at the second-order partials, but you can t put the board on top.

AREA, Fall 5 9.5.5 1.5 1.5.5.5 1 1 FIGURE 11. Negative second partials don t imply a maximum if x = y = More concretely, consider Fig. 11, which is the graph of the function f(x,y)= (xy).1(x + y ). otherwise x +y To see what s going on with this function, we ll do an exercise similar to the one we did for Fig. 8, except that instead of plotting partial derivatives, we ll look at the second-derivatives of the diagonal cross-sections as we work our way around the unit circle. More precisely, for each direction h [,36 ] we look at f hh (h 1,h ), which is the directional derivative in direction h of the directional derivative in direction h of f. (This hideous but unavoidable terminology will become familiar later on.) Observe that along the directions parallel to the axes, i.e., when θ {,9,18,7,36 }, the slope of the slope of the diagonal cross-section is negative, while along directions parallel to the 45 degree lines, i.e., when fhh(sin(θ), cos(θ)).5..15.1.5.5.1.15 45 9 135 18 5 7 315 36 θ FIGURE 1.: Slope of slope of diagonal x-sections θ {45,135,5,315 }, it is positive. For a negative definite function, this graph would be below the origin for all values of θ. Note well that this picture is not a pathological one like Fig. 8. All of the derivatives in sight are nicely continuously differentiable. So, we ve established that negativity of the second partials, i.e., f ii <, for each i = 1,...n, isn t enough to guarantee a maximum. So what do we need? The requirement is that the second derivative of the function f evaluated at x (which is a matrix) is negative definite. What does this mean? Mathematically a bit complex, but diagramatically it means that you can put a flat board on top of the graph of the function at x and the graph will be everywhere below the board. Note that in figure 11, you can t do this. Similarly, to establish that f attains a local minimum at x, need to show that you can put a flat board below the graph of the function at x and the graph is everywhere above the board.

1 LECTURE #: WED, OCT 5, 5 PRINT DATE: OCTOBER 5, 5 (GRAPHICAL) Terminology: Confusion always arises concerning the term second partial. Could mean the second of the first partial derivatives, i.e., the partial derivative w.r.t. to the second element of the domain. I ll try to be consistent about the other usage: i.e., a second partial to me will (almost) always refer to a partial derivative of a partial derivative. Summary: If first partial derivatives are all zero, then unless the function is pathological/nasty/not continuously differentiable, derivatives in all directions will also be zero. But if all the second partial derivatives have the same sign, then there s nothing pathological about a function in which the second partial in some direction not parallel to the axis has a different sign. In the summary table below, i s denote components of R n, i.e., i = 1,...n, while h denote directions (infinitely many). Conditions for a Strict Local Maximum at x. f : R R f : R n R A B A plus this You need Inadequate You need guarantees B First Order Condition f (x ) = f (x ) = f h (x ) =, all h f i ( ) s continuous Second Order Condition f (x ) < f ii (x ) < all i f hh (x ) <, all h Hessian negative definite Local vs Global Optima: Once again, the first and second derivatives of the function at x tell you nothing in general about whether or not a local extremum of the function at x is a global extremum. There are conditions that we can impose on the function that does guarantee that a local extremum is either a global maximum or a global minimum Question: When the function f satisfies a certain property then knowing f (x ) = is SUFFICIENT to conclude that f attains a GLOBAL maximum at x. What is that property? Answer: Same as before. A function is concave if the set of points that lie below the graph of the function is a convex set. Alternatively, the matrix of second partial derivatives (the Hessian) has to be everywhere negative semi definite. Once again, to see the role of concavity, consider the surface of a camel. Each hump is a local maximum, but one of them isn t a global one unless the camel has perfectly balanced humps. Can t

AREA, Fall 5 11 put a flat board on top of both humps of the camel and have the camel be underneath the board. Same thing for minima. Note well: Earlier, I made a big deal of the fact that f (x ) was not a sufficient condition (along with f (x ) = ) for a max, and gave the counterexample f(x) = x 4, which has the property that f () = f () = but you have a strict global minimum at. Yet, here, I don t require strictness, only weak concavity. The reason for the difference is that in the paragraph above, I m stipulating (at least for a function with one argument) that f ( ), i.e., that the second derivative is everywhere nonpositive, which is a much stronger condition. Fact: If f is a differentiable concave function, then a necessary and sufficient condition for f to attain a global maximum at x in the interior of its domain is that f (x ) =. Note that this is a vector equality. Fact: If f is a convex function, then a necessary and sufficient condition for f to attain a global minimum at x in the interior of its domain is that f (x ) =. Once again, this is a vector equality. 1.6. Introduction to Taylor s theorem This is topic which we will return to in much greater detail later on. I m just going to introduce it here because you will see it a lot in 1. Taylor s theorem says that if f is k times differentiable, then given x and x: f(x ) f( x) + f ( x)dx + 1 f ( x)dx + 1 6 f ( x)dx +... + 1 k! f ( k)( x)dx k where dx = x x. We ll call the right hand side a k th order Taylor expansion of f about x. The more terms we add to the list, the better the approximation. Note that the expansion will in general look very different depending on the point at which you evaluate the derivatives. This theorem has a million applications, indeed is the basis of about 9% of what we use calculus for. We ll discuss it in depth later, but for now just a couple of pictures. Consider Fig. 13. The function f( ) is quite well approximated by the affine function A( ), but it is better approximated by the quadratic function Q( ). And would be better still approximated by a cubic function, etc. If f where a k th order polynomial, then f would be perfectly approximated by a Taylor approximation that included k terms.

1 LECTURE #: WED, OCT 5, 5 PRINT DATE: OCTOBER 5, 5 (GRAPHICAL) f A(x ) Q(x )f(x ) Q( ) = a+bdx+cdx = A(x )+.5 f ( x)dx A( ) = a+bdx = f( x)+ f ( x)dx f( ) f( x) x x x FIGURE 13. Approximating a function with linear and quadratic functions We ve been consistently using one major application of Taylor in the past couple of lectures. If you have a nonlinear function which attains an extremum (i.e., max, min or inflexion point), you can t distinguish the type of extremum just by considering a first order Taylor approximation; with a higher order approx, you can, provided the quadratic term is not zero. See Fig. 14. If the quadratic term is zero e.g., you re trying to distinguish between x n and x n, at x =, where n >, then you must look at higher order Taylor expansions in order to distinguish the various possibilities. The rule is if you go out as far as the first non-zero term in the expansion, you ll be able to figure out what s going on. Orders of Small: You ll often here professors say things casually in class such as that s a second-order difference, we can ignore it. Most students don t have a clue what they are talking about. here s what they mean. Suppose you have two functions f and g and you are interested in what these functions look like in a neighborhood of x. If their first (n 1th) order Taylor expansions are identical, but their second (nth) order expansions are different (i.e., their first derivatives are identical) then we say that the difference between the

AREA, Fall 5 13 linear approx quadratic approx FIGURE 14. Distinguishing maxima from minima with a quadratic approximation two functions is a second-order difference (n th order difference). Sometimes we say that the difference is second-order small. For example, consider the left panel of Fig. 15, in which the indifference curve U( ) pictured is tangent to the budget line P( ) at ( x). (View both the indifference curve and the budget line as functions taking x to y.) Now take first-order Taylor expansions of both functions at x. Because the first derivatives of U( ) and P( ) are equal at x, the first order taylor expansions of the former denoted A Ū x ( ) and represented by the green dashed line overlaps the first order expansion of the latter A P x( ) which coincides with the the affine function P( ). Evaluated at x, there is a difference between U( ) and P( ), but the evaluations of A Ū x ( ) and A P x ( ) coincide. I haven t drawn them in the figure, but clearly the second derivative of U is non-zero. hence the second-order expansions, of Q Ū x ( ) and Q P x ( ) would not coincide. Hence, the difference U(x ) P(x ) is said to be a second-order difference. In the right panel of Fig. 15, on the other hand, the first order Taylor expansions A Ṷ x ( ) and APˆx ( ) are evaluated du( ˆx) dp( ˆx) at a different starting point ( ˆx rather than x), and they no longer overlap (because dx dx ). In this case, the difference U(x ) P(x ) is said to be a first-order difference.

14 LECTURE #: WED, OCT 5, 5 PRINT DATE: OCTOBER 5, 5 (GRAPHICAL) y y A P x ( ) = A Ū x ( ) U(x ) A Ū x (x ) = A P x (x ) = P(x ) U( x) = P( x) U( ) A Ṷ x (x ) A Pˆx (x ) U( x) = P( x) A Ṷ x ( ) APˆx ( ) U( ) P( ) P( ) x x x x ˆx x FIGURE 15. First and Second order differences 1.7. Level Sets, upper and lower contour sets and Gradient vectors Next step is constrained maxima with several variables, but first need some terminology. You all presumably know what contour lines are on a map: lines joining up the points on the map that are at the same height above sea level. Same thing in math, except slightly different terms. Three terms that you need to know: (1) Level set: A level set of a function f consists of all of the points in the domain of f at which the function takes a certain value. In other words, take any two points that belong to the same level set of a function f : this means that f assigns the same value to both points. () Upper contour set: An upper contour set of a function f consists of all of the points in the domain of f at which the value of the function is at least a certain value. We talk about the upper contour set of a function f corresponding to α, refering to the set of points to which f assigns the value at least α. (3) Lower contour set: A lower contour set of a function f consists of all of the points in the domain of f at which the value of the function is no more than a certain value.

AREA, Fall 5 15 3.5 1.5 1.5 -.5-1 5 4 1 3 4 5 1 3 FIGURE 16. Level and contour sets of a concave function For example, consider Fig. 16 below. The level sets are indicated on the diagram by dotted lines. Very important fact that everybody gets wrong: the level sets are the lines on the horizontal plane at the bottom of the graph, NOT the lines that are actually on the graph. That is, the level sets are points in the domain of the function above which the function is the same height. Pick the first level set in the picture: suppose that the height of the function for every point on the level set is α. Notice that for every point above and to the right of this level set, the value of the function at this point is larger than α. Hence the set of points above and to the right of this level set is the upper contour set of the function corresponding to the value α. This is a source of endless confusion for everybody: compare the two curves in Fig. 17. The two curves are identical except for the labels. The interpretation of the curves is entirely different. (1) On the left, we have the graph of a function of one variable; area NE of the line is the area above the graph; area SW of the line is the area below the graph; () On the right, we have the level set of a function of two variables; area NE of the line is an upper contour set of the function; area SW of the line is an lower contour set of the function. In this case,

16 LECTURE #: WED, OCT 5, 5 PRINT DATE: OCTOBER 5, 5 (GRAPHICAL) y x x x 1 FIGURE 17. The graphs of one function and the level sets of another the two-dimensional picture represents the domain of the function; the height of the function isn t drawn. Where are the upper contour sets located in the left panel of the figure? Ans: pick α on the vertical axis. Find x α on the horizontal axis that s mapped to that point α. The interval [,x α ] is the upper contour set corresponding to α. Some familiar economic examples of level sets and contour sets. (1) level sets that you know by other names: indifference curves; isoprofit lines; budget line (p x = y). the production possibility frontier (this is the zero level set of the function q f(x)). () lower contour sets that you know by other names: budget sets; production possibility set; (3) upper contour sets that you know by other names: think of the region of comparative advantage in an Edgeworth box: this is the intersection of the upper contour sets of the two traders utility functions. Some practice examples for level sets. What are the level sets of a single variable function with no flat spots? Ans: A discrete (i.e., separated) set of points.

AREA, Fall 5 17 What are the level sets of a concave single variable function with no flat spots? How many points can be in a level set? Ans: At most two. Now consider a function f of two variables that has a strict local maximum at x (i.e., f is strictly higher at x than on a nbd). What can you say about the level set of the function through x? Ans: The point x must be an isolated point of the level set. Not necessarily the unique point, but certainly isolated.