Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from several different angles. The tools and concepts we develop here, however, have many other applications both in this course and beyond. A set C R N is convex if x, y C (1 θ)x + θy C for all θ [0, 1]. In English, this means that if we travel on a straight line between any two points in C, then we never leave C. These sets in R are convex: These sets are not: 1
Examples of convex (and nonconvex) sets Subspaces. Recall that if S is a subspace of R N, then x, y S ax + by S for all a, b R. So S is clearly convex. Affine sets. Affine sets are just subspaces that have been offset by the origin: {x R N : x = y + v, y T }, T = subspace, for some fixed vector v. An equivalent definition is that x, y C θx + (1 θ)y C for all θ R the difference between this definition and that for a subspace is that subspaces must include the origin. Bound constraints. Rectangular sets of the form C = {x R N : l 1 x 1 u 1, l x u,..., l N x N u N } for some l 1,..., l N, u 1,..., u N R are convex. The simplex in R N {x R N : x 1 + x + + x N 1, x 1, x,..., x N 0} is convex. Any subset of R N that can be expressed as a set of linear inequality constraints {x R N : Ax b} is convex. Notice that both rectangular sets and the simplex
fall into this category for the simplex, take 1 1 1 1 1 1 0 0 0 A = 0 1 0 0...., b = 0 0.. 0 1 0 In general, when sets like these are bounded, the result is a polyhedron. Norm balls. If is a valid norm on R N, then is a convex set. B r = {x : x r}, Ellipsoids. An ellipsoid is a set of the form E = {x : (x x 0 ) T P 1 (x x 0 ) r}, for a symmetric positive-definite matrix P. Geometrically, the ellipsoid is centered at x 0, its axes are oriented with the eigenvectors of P, and the relative widths along these axes are proportional to the eigenvalues of P. A single point {x} is convex. The empty set is convex. The set is convex. (Sketch it!) The set is not convex. {x R : x 1 x 1 x + 1 0} {x R : x 1 x 1 x + 1 0} 3
The set is certainly not convex. {x R : x 1 x 1 x + 1 = 0} Sets defined by linear equality constraints where only some of the constraints have to hold are in general not convex. For example {x R : x 1 x 1 and x 1 + x 1} is convex, while is not convex. {x R : x 1 x 1 or x 1 + x 1} Cones A cone is a set C such that x C θx C for all θ 0. Convex cones are sets which are both convex and a cone. C is a convex cone if x 1, x C θ 1 x 1 + θ x C for all θ 1, θ 0. Given an x 1, x, the set of all linear combinations with positive weights makes a wedge. For practice, sketch the region below that consists of all such combinations of x 1 and x : 4
x x 1 We will mostly be interested in proper cones, which in addition to being convex, are closed, have a non-empty interior 1 ( solid ), and do not contain entire lines ( pointed ). Examples: Non-negative orthant. The set of vectors whose entries are nonnegative, R N + = {x R N : x n 0, for n = 1,..., N}, is a proper cone. Positive semi-definite cone. The set of N N symmetric matrices with non-negative eigenvalues, S N + = {X R N N : X = V ΛV T, is a proper cone. V T V = V V T = I, Λ diagonal and non-negative} Non-negative polynomials. Vectors of coefficients of non-negative polynomials on [0, 1], {x R N : x 1 +x t+x 3 t + +x N t N 1 0 for all 0 t 1}, 1 See Technical Details for precise definition. 5
form a proper cone. Notice that it is not necessary that all the x n 0; for example t t (x 1 = 0, x = 1, x 3 = 1) is non-negative on [0, 1]. Norm cones. The subset of R N+1 defined by {(x, t), x R N, t R : x t} is a proper cone for any valid norm and t > 0. Sketch what this looks like for the Euclidean norm and N = : Every proper cone K defines a partial ordering or generalized inequality. We write x K y when y x K. For example, for vectors x, y R N, we say x R N + y when x n y n for all n = 1,..., N. For symmetric matrices X, Y, we say X S N + Y when Y X has non-negative eigenvalues. We will typically just use when the context makes it clear. In fact, for R N + we will just write x y (as we did above) to mean that 6
the entries in x are component-by-component upper-bounded by the entries in y. Partial orderings obey share of the properties of the standard on the real line. For example: x y, u v x + u y + v. But other properties do not hold; for example, it is not necessary that either x y or y x. For an extensive list of properties of partial orderings (most of which will make perfect sense on sight) can be found in [BV04, Chapter.4]. Hyperplanes and halfspaces Hyperplanes and halfspaces are both very simple constructs, but they will be crucial to our understanding to convex sets, functions, and optimization problems. A hyperplane is a set of the form {x R N : x, c = t} for some fixed vector c 0 and scalar t. When t = 0, this set is a subspace of dimension N 1, and contains all vectors that are orthogonal to c. For t 0, this is an affine space consisting of all the vectors orthogonal to c (call this set C ) offset to some x 0 : {x R N : x, c = t} = {x R N : x = x 0 + C }, for any x 0 with x 0, c = t. We might take x 0 = t c/ c, for instance. The point is, c is a normal vector of the set. Here are some examples in R : 7
5 5 4 t =1 4 t =5 3 3 1 apple 1 c = 0 1 apple c = 1 1 1 3 4 5 1 1 3 4 5 t = 1 t =1 t =5 It should be clear that hyperplanes are convex sets. A halfspace is a set of the form {x R N : x, c t} for some fixed vector c 0 and scalar t. For t = 0, the halfspace contains all vectors whose inner product with c is negative (i.e. the angle between x and c is greater than 90 ). Here is a simple example: 8
5 4 t =5 3 apple c = 1 1 1 1 3 4 5 Separating hyperplanes If two convex sets are disjoint, then there is a hyperplane that separates them. C c d D {x : hx, ai = b} a = d c b = kdk kck 9
This fact is intuitive, and is incredibly useful in understanding the solutions to convex optimization programs (we will see this even in the next section). It also easy to prove under mild additional assumptions; below we will show how to explicitly construct a separating hyperplane between two convex sets if there are points in the two sets that are definitively closest to one another. The result extends to general convex sets, and the proof is based on the same basic idea, but requires some additional toplogical considerations Let C, D be disjoint convex sets and suppose that there exist points c C and d D that achieve the minimum distance c d = inf x y. x C,y D Define Then a = d c, b = d c. x, a b for x C, x, a b for x D. That is the hyperplane {x : x, a = b} separates C and D. To see this, set f(x) = x, a b, and show that for and point u in D, we have f(u) 0. First, we prove the basic geometric fact that for any two vectors x, y, if x + θy x for all θ [0, 1] then x, y 0. (1) This holds, for example, when the sets are closed, and at least one of them is bounded. 10
To establish this, we expand the norm as x + θy = x + θ y + θ x, y, from which we can immediately deduce that θ y + x, y 0 for all θ [0, 1] x, y 0. Now let u be an arbitrary point in D. Since D is convex, we know that d + θ(u d) D for all θ [0, 1]. Since d is as close to c as any other point in D, we have and so by (1), we know that This means d + θ(u d) c d c, d c, u d 0. f(u) = u, d c d c d + c, d c = u, d c = u (d + c)/, d c = u d + d/ c/, d c = u d, d c + c d 0. The proof that for every v C, f(v) 0 is exactly the same. 11
Note the following: We actually only need the interiors of sets C and D to be disjoint they can intersect at one or more points along their boundaries. Viz C D But you have to be more careful in defining the hyperplane which separates them (as we would have c = d a = 0 in the argument above). There can be many hyperplanes that separate the two sets, but there is not necessarily more than one. The hyperplane {x : x, a = b} strictly separates sets C and D if x, a < b for all x C and x, a > b for all x D. There are examples of closed, convex, disjoint sets C, D that are separable but not strictly separable. It is a good exercise to think of an example. 1
Supporting hyperplanes A direct consequence of the separating hyperplane theorem is that every point on the boundary of a convex set C can be separated from its interior. If a 0 satisfies x, a x 0, a for all x C, then {x : x, a = x 0, a } is called a supporting hyperplane to C at x 0. Here s a picture: x 0 The hyperplane is tangent to C at x 0, and the halfspace {x : x, a x 0, a } contains C. The supporting hyperplane theorem says that a supporting hyperplane exists at every point x 0 on the boundary of a (non-empty) convex set C. Note that there might be more than one: x 0 13
Given the separating hyperplane theorem, this is easy to prove. If C has a non-empty interior, then it follows from just realizing that x 0 and int(c) are disjoint. If int(c) =, then C must be contained in an affine set, and hence also in (possibly many) hyperplanes, any of which will be supporting. 14
The closest point problem Let x 0 R N be given, and let C be a non-empty, closed, convex set. The projection of x 0 onto C is the closest point (in the standard Euclidean distance, for now) in C to x 0 : P C (x 0 ) = arg min y C x 0 y We will see below that there is a unique minimizer to this problem, and that the solution has geometric properties that are analogous to the case where C is a subspace. Projection onto a subspace Let s recall how we solve this problem in the special case where C := T is a K-dimensional subspace. In this case, the solution ˆx = P T (x 0 ) is unique, and is characterized by the orthogonality principle: x 0 ˆx T meaning that y, x 0 ˆx = 0 for all y T. The proof of this fact is reviewed in the Technical Details section at the end of these notes. The orthogonality principle leads immediately to an algorithm for calculating P T (x 0 ). Let v 1,..., v K be a basis for T ; we can write the solution as K ˆx = α k v k ; k=1 solving for the expansion coefficients α k is the same as solving for ˆx. We know that K x 0 α j v j, v k = 0, for k = 1,..., K, j=1 15
and so the α k must obey the linear system of equations K α j v j, v k = x 0, v k, for k = 1,..., K. j=1 Concatenating the v k as columns in the N K matrix V, and the entries α k into the vector α R K, we can write the equations above as V T V α = V T x 0. Since the {v k } are a basis for T (i.e. they are linearly independent), V T V is invertible, and we can solve for the best expansion coefficients: ˆα = (V T V ) 1 V T x 0. Using these expansion coefficient to reconstruct ˆx yields ˆx = V ˆα = V (V T V ) 1 V T x 0. In this case, the projector P T ( ) is a linear map, specified by N N matrix V (V T V ) 1 V T. Projection onto an affine set Affine sets are not fundamentally different than subspaces. Any affine set C can be written as a subspace T plus an offset v 0 : C = T + v 0 = {x : x = y + v 0, y C}. It is easy to translate the results for subspaces above to say that the projection onto an affine set is unique, and obeys the orthogonality principle y ˆx, x 0 ˆx = 0, for all y C. () 16
You can solve this problem by shifting x 0 and C by negative v 0, projecting x 0 v 0 onto the subspace C v 0, and then shifting the answer back. Projection onto a general convex set In general, there is no closed-form expression for the projector onto a given convex set. However, the concepts above (orthogonality, projection onto a subspace) can help us understand the solution for an arbitrary convex set. Uniqueness If C is closed 3 and convex, then for any x 0, the program has a unique solution. min y C x 0 y (3) First of all, that there is at least one point in C such that the minimum is obtained is a consequence of C being closed and of x 0 y being a continuous function of y. Let ˆx be one such minimizer; we will show that x 0 y > x 0 ˆx for all y C, y ˆx. Consider first all the points in C which are co-aligned with ˆx. Let I = {α R : αˆx C}. Since C is convex and closed, this is a closed interval of the real line (that contains at least the point α = 1). The function g(α) = x 0 αˆx = α ˆx α ˆx, x 0 + x 0, 3 We review basic concepts in topology for R N in the Technical Details section at the end of the notes. 17
is minimized at α = 1 by construction, and since its second derivative is strictly positive, we know that this is the unique minima. So if there is another minimizer of (3), then it is not co-aligned with ˆx. Now let y be any point in C that is not co-aligned with ˆx. We will show that y cannot minimize (3) because the point ˆx/ + y/ C is definitively closer to x 0. We have x 0 ˆx y = x 0 ˆx = x 0 ˆx 4 < x 0 ˆx 4 ( x0 ˆx = x 0 y. + x 0 y + x 0 ŷ 4 + x 0 ŷ 4 + x 0 y + x 0 ˆx, x 0 y + x 0 ˆx x 0 y The strict inequality above follows from Cauchy-Schwarz, while the last inequality follows from the fact that ˆx is a minimizer. This shows that no y ˆx can also minimize (3), and so ˆx is unique. ) 18
Obtuseness. Relative to the solution ˆx, every vector in C is at an obtuse angle to the error x 0 ˆx. More precisely, P C (x 0 ) = ˆx if and only if y ˆx, x 0 ˆx 0. (4) Compare with () above. Here is a picture: ˆx x 0 y We first prove that (4) P C (x 0 ) = ˆx. For any y C, we have y x 0 = y ˆx + ˆx x 0 = y ˆx + ˆx x 0 + y ˆx, ˆx x 0 ˆx x 0 + y ˆx, ˆx x 0. Note that the inner product term above is the same as (4), but with the sign of the second argument flipped, so we know this term must be non-negative. Thus y x 0 ˆx x 0. Since this holds uniformly over all y C, ˆx must be the closest point in C to x 0. 19
We now show that P C (x 0 ) = ˆx (4). Let y be an arbitrary point in C. Since C is convex, the point ˆx + θ(y ˆx) must also be in C for all θ [0, 1]. Since ˆx = P C (x 0 ), ˆx + θ(y ˆx) x 0 ˆx x 0 for all θ [0, 1]. By our intermediate result (1) a few pages ago, this means y ˆx, ˆx x 0 0 y ˆx, x 0 ˆx 0. 0
Technical Details: closest point to a subspace In this section, we establish the orthogonality principle for projection of a point x 0 onto a subspace T. Let ˆx be a vector which obeys ê = x ˆx T. We will show that ˆx is the unique closest point to x 0 in T. Let y be any other vector in T, and set We will show that e = x y. e > ê (i.e. that x y > x ˆx ). Note that e = x y = ê (y ˆx) = ê (y ˆx), ê (y ˆx) = ê + y ˆx ê, y ˆx y ˆx, ê. Since y ˆx T and ê T, and so ê, y ˆx = 0, and y ˆx, ê = 0, e = ê + y ˆx. Since all three quantities in the expression above are positive and we see that y ˆx > 0 y ˆx, y ˆx e > ê. We leave it as an exercise to establish the converse; that if y, ˆx x 0 = 0 for all y T, then ˆx is the projection of x 0 onto T. 1
Technical Details: basic topology in R N This section contains a brief review of basic topological concepts in R N. Our discussion will take place using the standard Euclidean distance measure (i.e. l norm), but all of these definitions can be generalized to other metrics. An excellent source for this material is [Rud76]. We say that a sequence of vectors {x k, k = 1,,...} converges to ˆx if x k ˆx ˆx as k. More precisely, this means that for every ɛ > 0, there exists an n ɛ such that x k ˆx ɛ for all k n ɛ. It is easy to show that a sequence of vectors converge if and only if their individual components converge point-by-point. A set X is open if we can draw a small ball around every point in X which is also entirely contained in X. More precisely, let B(x, ɛ) be the set of all points within ɛ of x: B(x, ɛ) = {y R N : x y ɛ}. Then X is open if for every x X, there exists an ɛ x > 0 such that B(x, ɛ x ) X. The standard example here is open intervals of the real line, e.g. (0, 1). There are many ways to define closed sets. The easiest is that a set X is closed if its complement is open. A more illuminating (and equivalent) definition is that X is closed if it contains all of its limit points. A vector ˆx is a limit point of X if there exists a sequence of vectors {x k } X that converge to ˆx.
The closure of general set X, denoted cl(x ), is the set of all limit points of X. Note that every x X is trivially a limit point (take the sequence x k = x), so X cl(x ). By construction, cl(x ) is the smallest closed set that contains X. Related to the definition of open and closed sets are the technical definitions of boundary and interior. The interior of a set X is the collection of points around which we can place a ball of finite width which remains in the set: int(x ) = {x X : ɛ > 0 such that B(x, ɛ) X }. The boundary of X is the set of points in cl(x ) that are not in the interior: bd(x ) = cl(x )\ int(x ). Another (equivalent) way of defining this is the set of points that are in both the closure of X and the closure of its complement X c. Note that if the set is not closed, there may be boundary points that are not in the set itself. The set X is bounded if we can find a uniform upper bound on the distance between two points it contains; this upper bound is commonly referred to as the diameter of the set: diam X = sup x y. x,y X The set X R N is compact if it is closed and bounded. A key fact about compact sets is that every sequence has a convergent subsequence this is known as the Bolzano-Weierstrauss theorem. 3
References [Ber03] D. Bertsekas. Convex Analysis and Optimization. Athena Scientific, 003. [BV04] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 004. [Rud76] W. Rudin. Principles of Mathematical Analysis. McGraw-Hill, 1976. 4