Introduction to Convex Analysis Microeconomics II - Tutoring Class Professor: V. Filipe Martins-da-Rocha TA: Cinthia Konichi April 2010 1 Basic Concepts and Results This is a first glance on basic convex analysis results that are going to be used extensively during the course. Convexity is a very important concept in optimization theory since if it is assumed, necessary conditions for optimality become sufficient conditions. For an introduction to convex analysis as applied to optimization theory see Florenzano and Le Van (2001) and Izmailov and Solodov (2005). Borwein and Lewis (2000) and Bertsekas et al. (2003) are other useful references. For a throughout treatment see Rockafellar (1970). We begin by characterizing convex sets. Let E be a real vector space. Then the following definitions are appropriate: Definition 1 A set D E is said to be convex if for any x, y D the set {αx+(1 α)y : α [0, 1]} is contained in D. The point αx + (1 α)y is called convex combination of x and y (with parameter α). By induction, it is easily seen that D is convex if and only if p k=1 λ kx k D for every finite set {x 1,..., x p } of p elements of D and for every system of p nonnegative real coefficients {λ 1,..., λ p } such that p k=1 λ k = 1. Hence, a subset D of E is convex if and only if every convex combination of finitely many elements of D belongs to D. Some examples of convex sets are the open (or closed) balls on a vector space, the line segments, the vector space itself and the empty set. EPGE-FGV 1
x αx + (1 α)y y x y Figure 1: A convex and a non-convex set Lemma 2 Let Λ be an arbitrary set and {D λ } λ Λ be a family of convex subsets of E. Then D = λ Λ D λ is convex. Proof: Let x D and y D. Then x D λ and y D λ for all λ Λ. Since the sets D λ, λ Λ, are convex, αx + (1 α)y D λ for any α [0, 1] for all λ Λ. Hence, αx + (1 α)y D, that is D is convex. Then, on an optimization problem, for example, an arbitrary number of convex constraints, on the same space, turns out to be convex. With any arbitrary set C of E, we can associate another set, called the convex hull of C, denoted by co C, which is the intersection of all convex subsets of E containing C. By Lemma 2, the convex hull is convex. By the way, it is the smallest convex subset of E containing C. Lemma 3 Let D E be a convex set, then cld is convex 1 Proof: 2 Let x, y cld and α (0, 1). Then, there are sequences (x n ) n N and (y n ) n N in D, such that x n x and y n y. But, by the convexity of D αx n + (1 α)y n D, n N. Then αx n + (1 α)y n αx + (1 α)y implies αx + (1 α)y cld. C cl C co C Figure 2: The closure and the convex hull of a set C 1 Where cld is the closure of D. Remember that the closure of an arbitrary set B on a vector space is the set of limit points of sequences belonging to B. 2 For convenience, assume that E is a normed real vector space and take the usual convergence concept. 2
Definition 4 Let D E be a convex set. The function f : D R is convex in D when for any x D, y D e α [0, 1] we have: f(αx + (1 α)y) αf(x) + (1 α)f(y) The function f is said strictly convex if the inequality above is strict for all x y and α (0, 1). f(y) αf(x) + (1 α)f(y) f(αx + (1 α)y) f(x) x αx + (1 α)y y Figure 3: An illustration of a convex function Definition 5 Let D E be a convex set. The function f : D R is concave in D if ( f) is convex in D. Next lemma states an equivalent definition of concavity: Lemma 6 Let D E be a convex set, then f : D R is concave if and only if the set hypof := {(x, µ) D R : f(x) µ} is convex. Such set is called the hypograph of f. Proof: First, suppose that hypof is convex. Let x D and y D. Clearly, (x, f(x)) hypof and (y, f(y)) hypof. Because of the convexity of hypof, for all α [0, 1], we have: (αx + (1 α)y, αf(x) + (1 α)f(y)) = α(x, f(x)) + (1 α)(y, f(y)) hypof By the definition of hypof, we have: f(αx + (1 α)y) αf(x) + (1 α)f(y) 3
So f is concave. Conversely, suppose now that f is concave. Let (x, c 1 ) hypof and (x, c 2 ) hypof. Since f(x) c 1 and f(y) c 2, by the concavity of f, for all α [0, 1] we have: f(αx + (1 α)y) αf(x) + (1 α)f(y) αc 1 + (1 α)c 2 which means that: α(x, c 1 ) + (1 α)(y, c 2 ) = (αx + (1 α)y, αc 1 + (1 α)c 2 ) hypof Hence, the hypof is convex. We say that max f(x) subject to x D (1) is convex maximization problem when D E is a convex set and f : D R is concave in D. The importance of the convexity assumption can be seen in the following result. Theorem 7 Let D E be a convex set and f : D R a concave function in D. Then, every local maximum of problem 1 is global. Moreover, the set of elements that maximize the problem is convex. If f is strictly concave, the problem does not have more than one maximizer. Proof: Suppose by way of contradiction that x D is a local maximizer that is not global. Then, exists y D such that f(y) > f( x). Define x(α) = αy + (1 α) x. By the convexity of D, x(α) D for all α [0, 1]. By the concavity of f, for all α (0, 1], we have: f(x(α)) αf(y) + (1 α)f( x) = f( x) + α(f(y) f( x)) > f( x) Taking α > 0 sufficiently low, we can guarantee that the point x(α) is arbitrarily close to the point x and f(x(α)) > f( x). This contradicts the fact that x is a local maximizer of problem 1. Then, any local solution must be a global solution. Let S D be the set of (global) maximizers and v R the optimum value of the problem. Note that we have f(x) = v for any x S. For any x S, x S and α [0, 1], by the concavity of f, we have: f(αx + (1 α) x) αf(x) + (1 α)f( x) = α v + (1 α) v = v which implies that f(αx + (1 α) x) = v and then αx + (1 α) x S. Suppose now that f is strictly concave and that exist x S and x S, with x x. Let α (0, 1). Since x and x are global maximizers and αx + (1 α) x D by the convexity of D, it follows that: f(αx + (1 α) x) f(x) = f( x) = v 4
However, as f is strictly concave: f(αx + (1 α) x) > αf(x) + (1 α)f( x) = α v + (1 α) v = v (2) which is a contradiction. 2 Projection and Convex Sets Henceforth, let E be a real vector space equipped with an inner product, : E E R. Consider that : E R + is the norm generated by such inner product. Definition 8 Let B E be a nonempty set and let x 0 E be an arbitrary point. Then we define the distance of the point x 0 to the set B to be d B (x 0 ) : E R +, where d B (x 0 ) := inf x B x x 0 The set P B (x 0 ) := {x B : x x 0 = d B (x 0 )} is called the projection of x 0 on B. Note that, since B and 0, this function is well defined. It is easy to see that d B is continuous 3. If E is finite dimensional and B is a closed set, the minimum is attained. In fact, note that we can define a sequence in B whose distance from x 0 converges to d B (x 0 ). But this implies that this sequence of distances is bounded, which, by the way, implies that the sequence in B is bounded. Therefore, by the Bolzano- Weierstrass Theorem 4, this sequence admits a convergent subsequence. Thus, its limit point belongs to B (closed). Although, under closedness the minimum is not necessarily unique, adding convexity guarantees uniqueness. The geometrical intuition (E = R 2 ) for this result is that the distance is defined by a path which is orthogonal to the set. Therefore, two different projections imply two different paths that are orthogonal to the set. But, then we can define a triangle between those points. However, note that the convexity of B implies that the line segment joining the projections is in B. Hence we have a non-degenerate triangle with two right angles, a contradiction. Let s do the formal statement: Theorem 9 Let E be a finite dimensional real vector space, with a norm defined by an inner product. Let D E be a closed and convex set; and, fix x 0 E. Then 3 Let x, y E. Then x x x y + y x, x E by the triangle inequality inf x B x x x y + inf x B y x. By the other hand, y x y x + x x, x E. Therefore d B (x) d B (y) x y, i.e., d B is a Lipschitz function (thus is continuous). 4 Note that the assertion that a bounded sequence has a convergent subsequence may be invalid on an infinite dimensional space. For example, on R consider the sequence (x n ) n N such that x n = (y n,t ) t N and y n,t = 1 if t = n and 0 otherwise. It s easily seen that (x n ) n N is bounded, however it has no convergent subsequence (on the usual definition of convergence). 5
D DA = 2.5 DC = 2.5 A C 3 2 1 H G GH = 1 4 3 2 1 0 1 2 3 4 1 Figure 4: Geometrical intuition for the unique minimum 1. x P D (x 0 ) if and only if x D and x x 0, x y 0, y D; 2. there is a unique x D such that x x 0 = d D (x 0 ). Proof: Let s prove the first assertion and then use it to demonstrate the second one. Let x P D (x 0 ). Then, since D is convex, for any α (0, 1) and any y D\{ x} x x 0 (1 α) x + αy x 0 x x 0 2 (1 α) x + αy x 0 2 0 x x 0, x x 0 x x 0 α x + αy, x x 0 α x + αy 0 x x 0, x x 0 x x 0, x x 0 α x + αy α x + αy, x x 0 α x + αy 0 x x 0, α x αy α x + αy, x x 0 α x + αy 0 x x 0, α x αy α x + αy, x x 0 α x + αy, α x + αy 0 2α x x 0, x y α 2 x y 2 0 Dividing both sides of the inequality above by 2α > 0 and letting α 0, we get x x 0, x y 0. On the other hand, let x D be such that x x 0, x y 0, y D. Note that, for any y D, x x 0, x y = x x 0 2 + x x 0, x 0 y. But, by the Cauchy-Schwartz inequality we have x x 0 x 0 y x x 0, x 0 y. Therefore, y D, 0 x x 0, x y x x 0 2 x x 0, x 0 y x x 0 x 0 y If x o D, then x x 0 2 0 so x = x 0 P D (x 0 ). If x 0 / D then x x 0 x 0 y, y D, so x P D (x 0 ). Thus (1) is proved. Now, let x, x P D (x 0 ). Then, y D, x x 0, x y 0 and x x 0, x y 0. In particular, x x 0, x x 0 and x x 0, x x 0. Therefore 0 x x 0, x x x x 0, x x = x x, x x = x x 2 6
Hence, x = x. Therefore, for a convex D E, it is possible to define a function p D : E D such that P D (x) = {p D (x)}, x E. 3 Separating Hyperplane Theorems Before going to the results, let us define some objects. Definition 10 For a E\{0} and c R the set H(a, c) := {x E : a, x = c} is said to be a hyperplane. Note that E may be written as the union of two disjoint sets and the hyperplane. That is E = H(a, c) {x E : a, x < c} {x E : a, x > c}. Definition 11 Let B 1, B 2 E. The hyperplane H(a, c) is said to separate B 1 and B 2 if x B 1, a, x c a, y, y B 2 If both inequalities are strict, we say that H(a, c) strictly separates B 1 and B 2. In the geometric sense, separability means that a set is on one side of the hyperplane and the other set on the other side (see Figure 5). The next result is important for convex sets: Lemma 12 Let D E be a convex set and let H(a, c) be a hyperplane such that H(a, c) D =. Then D {x E : a, x < c} = or D {x E : a, x > c} =. Proof: Assume by way of contradiction that x, x D, x {x E : a, x < c} and x {x E : a, x > c}. Then, x = λx + (1 λ)x H(a, c), where λ = (0, 1). Therefore, x D H(a, c), contradiction. a,x c a,x a,x 3.1 Support Theorem The next lemma states that a point that does not belong to the closure of a convex set can be strict separated from that set. Lemma 13 (Minkowski Lemma) Let D E be a non-empty convex set, where E is finite dimensional. If x / cld, then there are a E\{0} and c R such that x H(a, c) and a, y > c, y D. 7
... Figure 5: Strict and non-strict separability Proof: By Lemma 3, we know that cld is convex. By Theorem 9, there is a unique projection x = P D (x) cld. Let a = x x (a 0 since x / cld) and c = a, x. Then, y D, a, y = x x, y x x, x. But, by Theorem 9, x x, x y 0, y D. Therefore, a, y x x, x = x x 2 + x x, x = x x 2 + c > c, since x x. Definition 14 A hyperplane H(a, c) is said to support a set B if H(a, c) fr B 5 and B {x E : a, x < c} = or B {x E : a, x > c} =. If x H(a, c) fr B it is x a cl D Figure 6: An illustration of Minkowski Lemma 5 Where fr B is the boundary of B. Remember that x E belongs to fr B if and only if there are sequences on B and B c which both converge to x. Thus, fr B clb. 8
said that H(a, c) supports D at x. That is, a hyperplane supports a set if it contains at least one point of the set and if all the points of the set are on a same side of the hyperplane. Note that a set can admit more than one hyperplanes supporting it at the same point. Theorem 15 (Support Theorem) Let E be finite dimensional and D E be a nonempty convex set. If x fr D, then there are a E\{0} and c R such that x H(a, c) and a, y c, y D, i.e., the hyperplane H(a, c) supports D at x. Proof: Since x fr D, there is (x n ) n N such that x n x and x n / cld, n N. By Minkowski Lemma, for each n N, there are a n E\{0} and c n R such that x n H(a n, c n ). Note that a n, x n = c n an a n, x n = c n a n ( ) a Since (x n ) n N is convergent, it is bounded. Furthermore, n a n is bounded. Then, ( ) n N c by the Cauchy-Schwartz inequality, n a n is bounded. Therefore, there is an infinite ( ) n N ( ) subset N c N such that n a a n and n n N a n are convergent. Let a E and n N ( ) c R be the respective limit points of those subsequences. Since = 1, n N, we have a E\{0}. By continuity of the inner product, we have x H(a, c). By the other hand, since a n, y > c n, y D, n N, we have a, y c. a n a n 3.2 Separating Theorems Finally, the most important result of this material: Theorem 16 (Separating Hyperplane) Let E be finite dimensional and D 1, D 2 E be disjoint non-empty convex sets. Then there are a E\{0} and c R such that x D 1, a, x c a, y, y D 2 i.e., there is a hyperplane H(a, c) which separates D 1 and D 2. Proof: Define D = D 2 D 1. Then D is convex and 0 / D because D 1 D 2 =. If 0 / cl D we can apply Minkowski Lemma. Otherwise, if 0 fr D, we apply the Support Theorem. In both cases, there exist a E\{0} such that a, z 0 z D. Therefore, x D 1, a, x a, y, y D 2 9
In particular, the function a, is bounded from below in D 2 and bounded from above in D 1. Then, x D 1, a, x sup a, x inf a, x a, y, y D 2 x D 1 x D 2 Since D 1 and D 2 are non-empty, the last inequality is well defined and defining the result holds. c = sup x D 1 a, x + inf x D 2 a, x 2 If, in addition to being convex, we assume that the sets are closed and at least one of them is compact, then we have strict separation. The intuition is that under compacity the sets cannot be, at the same time, arbitrarily near and disjoint. Theorem 17 (Strict Separating Hyperplane) Let E be finite dimensional and D 1, D 2 E be disjoint, non-empty, closed and convex sets. In addition assume that D 1 is compact. Then there are a E\{0} and b, c R such that x D 1, a, x b < c a, y, y D 2 i.e., there is a hyperplane H(a, c) which separates D 1 and D 2. Proof: Since d D2 is continuous and D 1 is compact, by the Weierstrass Theorem x arg min{d D2 (x) : x D 1 } As D 1 is compact, D 2 is closed and D 1 D 2 =, we have d D2 (x ) > 0. Let y = P D2 (x ). Define a = y x 0. But by Theorem 9, we have 0 y x, y y, y D 2 a, y a, y, y D 2. Define c 2 = a, y. Note that, for any x D 1 we have y x d D2 (x) d D2 (x ) = x P D2 (x ) = x y x = P D1 (y ) Hence, we have a, x a, x, x D 1. Define c 1 = a, x. Note that c 2 c 1 = y x 2 > 0. Therefore, let b = (c 2 c 1 )/4 and c = (c 2 c 1 )/2. Then, we have x D 1, a, x b < c a, y, y D 2 10
4 Applications In this section, we will give an important application of the results seen above. Lemma 18 Assume E is finite dimensional and let D E be a convex set. If f : D R is concave and x intd, then the following set is non-empty: f(x) = {b E : f(y) + b, y x f(x), y D} Proof: By Lemma 6, hypof is convex. Then, by the Support Theorem, there is a (R E)\{0} and c R such that (f(x), x) H(a, c) and a, z c, z hypof. Let a = (a 1, a 2 ) where a 1 R and a 2 E. Analogously, let z = (z 1, z 2 ). For each z hypof we have a 1 z 1 + a 2, z 2 c. Since for all v < z 1, (v, z 2 ) hypof, it should be the case that a 1 0. Otherwise, for v sufficiently negative the inequality would be invalid. Now we have to prove that a 1 > 0. Assume by way of contradiction that a 1 = 0. Since x H(a, c), we have a 2, x = c. But for δ > 0 sufficiently small we have x = x+δa 2 D, and a 2, x = a 2, x + δ a 2 2 E c a 2 = 0, a contradiction because the Support Theorem guarantees that a 0. Therefore, a 1 > 0. Define b = a 2 /a 1 and ĉ = c/a 1. Since (f(y), y) hypof, we have f(y)+ b, y ĉ = f(x)+ b, x. Thus, f(y)+ b, y x f(x) b f(x). Note the implication of the Support Theorem as applied to concave functions. For each interior point of the function s domain 6, there is a concave programming problem whose solution is given by this point. In fact, for any x intd and b f(x) we have: x arg max x D {f(x ) + b, x } Hence, the supporting hyperplanes gives us a way to distort (rotate around a point) the graph of the objective function in such a way that any point may be turned into a global optimum. But then, one may think about rotating the function on a special point, a point which is a solution of a restricted optimization problem. The Lagrange Theorem builds upon this principle, but instead of using only the hypograph of the function, it uses the convex sets defined by the constraint functions. This way, it is possible to create restrictions on the hyperplane which can arise as support for the restricted optimum. Indeed, under some conditions, we can uniquely identify such hyperplane. Then we can solve the global optimization problem first. 6 If the function is assumed to be continuous over the entire domain, then the property is valid for all points. An important remark should be given (we omit the proof): A concave function is continuous at the interior of its domain. 11
References D.P. Bertsekas, A. Nedić, and A.E. Ozdaglar. Convex analysis and optimization. Athena Scientific Belmont, Mass, 2003. J.M. Borwein and A.S. Lewis. Convex analysis and nonlinear optimization. Springer Hong Kong, 2000. M. Florenzano and C. Le Van. Finite dimensional convexity and optimization. Springer, 2001. A. Izmailov and M. Solodov. Otimização volume 1: Condições de Otimalidade, Elementos de Análise Convexa e de Dualidade. IMPA, 2005. R.T. Rockafellar. Convex analysis. Princeton University Press, 1970. 12