In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

Similar documents
Mathematics 530. Practice Problems. n + 1 }

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

IE 521 Convex Optimization

Inner product spaces. Layers of structure:

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

Lecture 1: Convex Sets January 23

Maths 212: Homework Solutions

Optimization and Optimal Control in Banach Spaces

Math 5210, Definitions and Theorems on Metric Spaces

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Convex Analysis and Optimization Chapter 2 Solutions

Optimality Conditions for Nonsmooth Convex Optimization

The proximal mapping

Introduction to Real Analysis Alternative Chapter 1

Linear Algebra Massoud Malek

Chapter 2: Preliminaries and elements of convex analysis

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Set, functions and Euclidean space. Seungjin Han

On John type ellipsoids

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Lecture 7: Positive Semidefinite Matrices

Convex Sets. Prof. Dan A. Simovici UMB

Some Background Material

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Spectral Theory, with an Introduction to Operator Means. William L. Green

Optimization Theory. A Concise Introduction. Jiongmin Yong

Lecture 1 Introduction

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Constrained Optimization and Lagrangian Duality

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Elements of Convex Optimization Theory

Appendix B Convex analysis

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Lecture Notes on Metric Spaces

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Math 341: Convex Geometry. Xi Chen

Convex Analysis and Economic Theory Winter 2018

Handout 2: Elements of Convex Analysis

Exercise Solutions to Functional Analysis

DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

Normed & Inner Product Vector Spaces

The Inverse Function Theorem 1

Lecture 2: Linear Algebra Review

Linear Analysis Lecture 5

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

POLARS AND DUAL CONES

Functional Analysis Review

Topology, Math 581, Fall 2017 last updated: November 24, Topology 1, Math 581, Fall 2017: Notes and homework Krzysztof Chris Ciesielski

Math 320-2: Midterm 2 Practice Solutions Northwestern University, Winter 2015

Introduction to Convex Analysis Microeconomics II - Tutoring Class

Convex Analysis and Economic Theory Winter 2018

Homework set 4 - Solutions

FUNCTIONAL ANALYSIS-NORMED SPACE

Chapter 2 Convex Analysis

Jensen s inequality for multivariate medians

Convex Functions. Pontus Giselsson

Math 209B Homework 2

Some Background Math Notes on Limsups, Sets, and Convexity

NORMS ON SPACE OF MATRICES

REVIEW OF ESSENTIAL MATH 346 TOPICS

MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012

Spectral theory for compact operators on Banach spaces

USING FUNCTIONAL ANALYSIS AND SOBOLEV SPACES TO SOLVE POISSON S EQUATION

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

Inequality Constraints

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Analysis and Linear Algebra. Lectures 1-3 on the mathematical tools that will be used in C103

Existence of minimizers

Math 302 Outcome Statements Winter 2013

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

Analysis Preliminary Exam Workshop: Hilbert Spaces

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

Summer School: Semidefinite Optimization

NOTES ON MULTIVARIABLE CALCULUS: DIFFERENTIAL CALCULUS

A NICE PROOF OF FARKAS LEMMA

Helly's Theorem and its Equivalences via Convex Analysis

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

Course Summary Math 211

Linear Algebra I. Ronald van Luijk, 2015

LINEAR ALGEBRA W W L CHEN

Convex Optimization and Modeling

Chapter 2 Metric Spaces

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Topological vectorspaces

Chapter 6: Orthogonality

Vector Spaces, Affine Spaces, and Metric Spaces

10725/36725 Optimization Homework 2 Solutions

Chapter 5. Basics of Euclidean Geometry

Convex Geometry. Carsten Schütt

Convex Functions and Optimization

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Mathematical Analysis Outline. William G. Faris

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Geometry and topology of continuous best and near best approximations

Transcription:

Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from several different angles. The tools and concepts we develop here, however, have many other applications both in this course and beyond. A set C R N is convex if x, y C (1 θ)x + θy C for all θ [0, 1]. In English, this means that if we travel on a straight line between any two points in C, then we never leave C. These sets in R are convex: These sets are not: 1

Examples of convex (and nonconvex) sets Subspaces. Recall that if S is a subspace of R N, then x, y S ax + by S for all a, b R. So S is clearly convex. Affine sets. Affine sets are just subspaces that have been offset by the origin: {x R N : x = y + v, y T }, T = subspace, for some fixed vector v. An equivalent definition is that x, y C θx + (1 θ)y C for all θ R the difference between this definition and that for a subspace is that subspaces must include the origin. Bound constraints. Rectangular sets of the form C = {x R N : l 1 x 1 u 1, l x u,..., l N x N u N } for some l 1,..., l N, u 1,..., u N R are convex. The simplex in R N {x R N : x 1 + x + + x N 1, x 1, x,..., x N 0} is convex. Any subset of R N that can be expressed as a set of linear inequality constraints {x R N : Ax b} is convex. Notice that both rectangular sets and the simplex

fall into this category for the simplex, take 1 1 1 1 1 1 0 0 0 A = 0 1 0 0...., b = 0 0.. 0 1 0 In general, when sets like these are bounded, the result is a polyhedron. Norm balls. If is a valid norm on R N, then is a convex set. B r = {x : x r}, Ellipsoids. An ellipsoid is a set of the form E = {x : (x x 0 ) T P 1 (x x 0 ) r}, for a symmetric positive-definite matrix P. Geometrically, the ellipsoid is centered at x 0, its axes are oriented with the eigenvectors of P, and the relative widths along these axes are proportional to the eigenvalues of P. A single point {x} is convex. The empty set is convex. The set is convex. (Sketch it!) The set is not convex. {x R : x 1 x 1 x + 1 0} {x R : x 1 x 1 x + 1 0} 3

The set is certainly not convex. {x R : x 1 x 1 x + 1 = 0} Sets defined by linear equality constraints where only some of the constraints have to hold are in general not convex. For example {x R : x 1 x 1 and x 1 + x 1} is convex, while is not convex. {x R : x 1 x 1 or x 1 + x 1} Cones A cone is a set C such that x C θx C for all θ 0. Convex cones are sets which are both convex and a cone. C is a convex cone if x 1, x C θ 1 x 1 + θ x C for all θ 1, θ 0. Given an x 1, x, the set of all linear combinations with positive weights makes a wedge. For practice, sketch the region below that consists of all such combinations of x 1 and x : 4

x x 1 We will mostly be interested in proper cones, which in addition to being convex, are closed, have a non-empty interior 1 ( solid ), and do not contain entire lines ( pointed ). Examples: Non-negative orthant. The set of vectors whose entries are nonnegative, R N + = {x R N : x n 0, for n = 1,..., N}, is a proper cone. Positive semi-definite cone. The set of N N symmetric matrices with non-negative eigenvalues, S N + = {X R N N : X = V ΛV T, is a proper cone. V T V = V V T = I, Λ diagonal and non-negative} Non-negative polynomials. Vectors of coefficients of non-negative polynomials on [0, 1], {x R N : x 1 +x t+x 3 t + +x N t N 1 0 for all 0 t 1}, 1 See Technical Details for precise definition. 5

form a proper cone. Notice that it is not necessary that all the x n 0; for example t t (x 1 = 0, x = 1, x 3 = 1) is non-negative on [0, 1]. Norm cones. The subset of R N+1 defined by {(x, t), x R N, t R : x t} is a proper cone for any valid norm and t > 0. Sketch what this looks like for the Euclidean norm and N = : Every proper cone K defines a partial ordering or generalized inequality. We write x K y when y x K. For example, for vectors x, y R N, we say x R N + y when x n y n for all n = 1,..., N. For symmetric matrices X, Y, we say X S N + Y when Y X has non-negative eigenvalues. We will typically just use when the context makes it clear. In fact, for R N + we will just write x y (as we did above) to mean that 6

the entries in x are component-by-component upper-bounded by the entries in y. Partial orderings obey share of the properties of the standard on the real line. For example: x y, u v x + u y + v. But other properties do not hold; for example, it is not necessary that either x y or y x. For an extensive list of properties of partial orderings (most of which will make perfect sense on sight) can be found in [BV04, Chapter.4]. Hyperplanes and halfspaces Hyperplanes and halfspaces are both very simple constructs, but they will be crucial to our understanding to convex sets, functions, and optimization problems. A hyperplane is a set of the form {x R N : x, c = t} for some fixed vector c 0 and scalar t. When t = 0, this set is a subspace of dimension N 1, and contains all vectors that are orthogonal to c. For t 0, this is an affine space consisting of all the vectors orthogonal to c (call this set C ) offset to some x 0 : {x R N : x, c = t} = {x R N : x = x 0 + C }, for any x 0 with x 0, c = t. We might take x 0 = t c/ c, for instance. The point is, c is a normal vector of the set. Here are some examples in R : 7

5 5 4 t =1 4 t =5 3 3 1 apple 1 c = 0 1 apple c = 1 1 1 3 4 5 1 1 3 4 5 t = 1 t =1 t =5 It should be clear that hyperplanes are convex sets. A halfspace is a set of the form {x R N : x, c t} for some fixed vector c 0 and scalar t. For t = 0, the halfspace contains all vectors whose inner product with c is negative (i.e. the angle between x and c is greater than 90 ). Here is a simple example: 8

5 4 t =5 3 apple c = 1 1 1 1 3 4 5 Separating hyperplanes If two convex sets are disjoint, then there is a hyperplane that separates them. C c d D {x : hx, ai = b} a = d c b = kdk kck 9

This fact is intuitive, and is incredibly useful in understanding the solutions to convex optimization programs (we will see this even in the next section). It also easy to prove under mild additional assumptions; below we will show how to explicitly construct a separating hyperplane between two convex sets if there are points in the two sets that are definitively closest to one another. The result extends to general convex sets, and the proof is based on the same basic idea, but requires some additional toplogical considerations Let C, D be disjoint convex sets and suppose that there exist points c C and d D that achieve the minimum distance c d = inf x y. x C,y D Define Then a = d c, b = d c. x, a b for x C, x, a b for x D. That is the hyperplane {x : x, a = b} separates C and D. To see this, set f(x) = x, a b, and show that for and point u in D, we have f(u) 0. First, we prove the basic geometric fact that for any two vectors x, y, if x + θy x for all θ [0, 1] then x, y 0. (1) This holds, for example, when the sets are closed, and at least one of them is bounded. 10

To establish this, we expand the norm as x + θy = x + θ y + θ x, y, from which we can immediately deduce that θ y + x, y 0 for all θ [0, 1] x, y 0. Now let u be an arbitrary point in D. Since D is convex, we know that d + θ(u d) D for all θ [0, 1]. Since d is as close to c as any other point in D, we have and so by (1), we know that This means d + θ(u d) c d c, d c, u d 0. f(u) = u, d c d c d + c, d c = u, d c = u (d + c)/, d c = u d + d/ c/, d c = u d, d c + c d 0. The proof that for every v C, f(v) 0 is exactly the same. 11

Note the following: We actually only need the interiors of sets C and D to be disjoint they can intersect at one or more points along their boundaries. Viz C D But you have to be more careful in defining the hyperplane which separates them (as we would have c = d a = 0 in the argument above). There can be many hyperplanes that separate the two sets, but there is not necessarily more than one. The hyperplane {x : x, a = b} strictly separates sets C and D if x, a < b for all x C and x, a > b for all x D. There are examples of closed, convex, disjoint sets C, D that are separable but not strictly separable. It is a good exercise to think of an example. 1

Supporting hyperplanes A direct consequence of the separating hyperplane theorem is that every point on the boundary of a convex set C can be separated from its interior. If a 0 satisfies x, a x 0, a for all x C, then {x : x, a = x 0, a } is called a supporting hyperplane to C at x 0. Here s a picture: x 0 The hyperplane is tangent to C at x 0, and the halfspace {x : x, a x 0, a } contains C. The supporting hyperplane theorem says that a supporting hyperplane exists at every point x 0 on the boundary of a (non-empty) convex set C. Note that there might be more than one: x 0 13

Given the separating hyperplane theorem, this is easy to prove. If C has a non-empty interior, then it follows from just realizing that x 0 and int(c) are disjoint. If int(c) =, then C must be contained in an affine set, and hence also in (possibly many) hyperplanes, any of which will be supporting. 14

The closest point problem Let x 0 R N be given, and let C be a non-empty, closed, convex set. The projection of x 0 onto C is the closest point (in the standard Euclidean distance, for now) in C to x 0 : P C (x 0 ) = arg min y C x 0 y We will see below that there is a unique minimizer to this problem, and that the solution has geometric properties that are analogous to the case where C is a subspace. Projection onto a subspace Let s recall how we solve this problem in the special case where C := T is a K-dimensional subspace. In this case, the solution ˆx = P T (x 0 ) is unique, and is characterized by the orthogonality principle: x 0 ˆx T meaning that y, x 0 ˆx = 0 for all y T. The proof of this fact is reviewed in the Technical Details section at the end of these notes. The orthogonality principle leads immediately to an algorithm for calculating P T (x 0 ). Let v 1,..., v K be a basis for T ; we can write the solution as K ˆx = α k v k ; k=1 solving for the expansion coefficients α k is the same as solving for ˆx. We know that K x 0 α j v j, v k = 0, for k = 1,..., K, j=1 15

and so the α k must obey the linear system of equations K α j v j, v k = x 0, v k, for k = 1,..., K. j=1 Concatenating the v k as columns in the N K matrix V, and the entries α k into the vector α R K, we can write the equations above as V T V α = V T x 0. Since the {v k } are a basis for T (i.e. they are linearly independent), V T V is invertible, and we can solve for the best expansion coefficients: ˆα = (V T V ) 1 V T x 0. Using these expansion coefficient to reconstruct ˆx yields ˆx = V ˆα = V (V T V ) 1 V T x 0. In this case, the projector P T ( ) is a linear map, specified by N N matrix V (V T V ) 1 V T. Projection onto an affine set Affine sets are not fundamentally different than subspaces. Any affine set C can be written as a subspace T plus an offset v 0 : C = T + v 0 = {x : x = y + v 0, y C}. It is easy to translate the results for subspaces above to say that the projection onto an affine set is unique, and obeys the orthogonality principle y ˆx, x 0 ˆx = 0, for all y C. () 16

You can solve this problem by shifting x 0 and C by negative v 0, projecting x 0 v 0 onto the subspace C v 0, and then shifting the answer back. Projection onto a general convex set In general, there is no closed-form expression for the projector onto a given convex set. However, the concepts above (orthogonality, projection onto a subspace) can help us understand the solution for an arbitrary convex set. Uniqueness If C is closed 3 and convex, then for any x 0, the program has a unique solution. min y C x 0 y (3) First of all, that there is at least one point in C such that the minimum is obtained is a consequence of C being closed and of x 0 y being a continuous function of y. Let ˆx be one such minimizer; we will show that x 0 y > x 0 ˆx for all y C, y ˆx. Consider first all the points in C which are co-aligned with ˆx. Let I = {α R : αˆx C}. Since C is convex and closed, this is a closed interval of the real line (that contains at least the point α = 1). The function g(α) = x 0 αˆx = α ˆx α ˆx, x 0 + x 0, 3 We review basic concepts in topology for R N in the Technical Details section at the end of the notes. 17

is minimized at α = 1 by construction, and since its second derivative is strictly positive, we know that this is the unique minima. So if there is another minimizer of (3), then it is not co-aligned with ˆx. Now let y be any point in C that is not co-aligned with ˆx. We will show that y cannot minimize (3) because the point ˆx/ + y/ C is definitively closer to x 0. We have x 0 ˆx y = x 0 ˆx = x 0 ˆx 4 < x 0 ˆx 4 ( x0 ˆx = x 0 y. + x 0 y + x 0 ŷ 4 + x 0 ŷ 4 + x 0 y + x 0 ˆx, x 0 y + x 0 ˆx x 0 y The strict inequality above follows from Cauchy-Schwarz, while the last inequality follows from the fact that ˆx is a minimizer. This shows that no y ˆx can also minimize (3), and so ˆx is unique. ) 18

Obtuseness. Relative to the solution ˆx, every vector in C is at an obtuse angle to the error x 0 ˆx. More precisely, P C (x 0 ) = ˆx if and only if y ˆx, x 0 ˆx 0. (4) Compare with () above. Here is a picture: ˆx x 0 y We first prove that (4) P C (x 0 ) = ˆx. For any y C, we have y x 0 = y ˆx + ˆx x 0 = y ˆx + ˆx x 0 + y ˆx, ˆx x 0 ˆx x 0 + y ˆx, ˆx x 0. Note that the inner product term above is the same as (4), but with the sign of the second argument flipped, so we know this term must be non-negative. Thus y x 0 ˆx x 0. Since this holds uniformly over all y C, ˆx must be the closest point in C to x 0. 19

We now show that P C (x 0 ) = ˆx (4). Let y be an arbitrary point in C. Since C is convex, the point ˆx + θ(y ˆx) must also be in C for all θ [0, 1]. Since ˆx = P C (x 0 ), ˆx + θ(y ˆx) x 0 ˆx x 0 for all θ [0, 1]. By our intermediate result (1) a few pages ago, this means y ˆx, ˆx x 0 0 y ˆx, x 0 ˆx 0. 0

Technical Details: closest point to a subspace In this section, we establish the orthogonality principle for projection of a point x 0 onto a subspace T. Let ˆx be a vector which obeys ê = x ˆx T. We will show that ˆx is the unique closest point to x 0 in T. Let y be any other vector in T, and set We will show that e = x y. e > ê (i.e. that x y > x ˆx ). Note that e = x y = ê (y ˆx) = ê (y ˆx), ê (y ˆx) = ê + y ˆx ê, y ˆx y ˆx, ê. Since y ˆx T and ê T, and so ê, y ˆx = 0, and y ˆx, ê = 0, e = ê + y ˆx. Since all three quantities in the expression above are positive and we see that y ˆx > 0 y ˆx, y ˆx e > ê. We leave it as an exercise to establish the converse; that if y, ˆx x 0 = 0 for all y T, then ˆx is the projection of x 0 onto T. 1

Technical Details: basic topology in R N This section contains a brief review of basic topological concepts in R N. Our discussion will take place using the standard Euclidean distance measure (i.e. l norm), but all of these definitions can be generalized to other metrics. An excellent source for this material is [Rud76]. We say that a sequence of vectors {x k, k = 1,,...} converges to ˆx if x k ˆx ˆx as k. More precisely, this means that for every ɛ > 0, there exists an n ɛ such that x k ˆx ɛ for all k n ɛ. It is easy to show that a sequence of vectors converge if and only if their individual components converge point-by-point. A set X is open if we can draw a small ball around every point in X which is also entirely contained in X. More precisely, let B(x, ɛ) be the set of all points within ɛ of x: B(x, ɛ) = {y R N : x y ɛ}. Then X is open if for every x X, there exists an ɛ x > 0 such that B(x, ɛ x ) X. The standard example here is open intervals of the real line, e.g. (0, 1). There are many ways to define closed sets. The easiest is that a set X is closed if its complement is open. A more illuminating (and equivalent) definition is that X is closed if it contains all of its limit points. A vector ˆx is a limit point of X if there exists a sequence of vectors {x k } X that converge to ˆx.

The closure of general set X, denoted cl(x ), is the set of all limit points of X. Note that every x X is trivially a limit point (take the sequence x k = x), so X cl(x ). By construction, cl(x ) is the smallest closed set that contains X. Related to the definition of open and closed sets are the technical definitions of boundary and interior. The interior of a set X is the collection of points around which we can place a ball of finite width which remains in the set: int(x ) = {x X : ɛ > 0 such that B(x, ɛ) X }. The boundary of X is the set of points in cl(x ) that are not in the interior: bd(x ) = cl(x )\ int(x ). Another (equivalent) way of defining this is the set of points that are in both the closure of X and the closure of its complement X c. Note that if the set is not closed, there may be boundary points that are not in the set itself. The set X is bounded if we can find a uniform upper bound on the distance between two points it contains; this upper bound is commonly referred to as the diameter of the set: diam X = sup x y. x,y X The set X R N is compact if it is closed and bounded. A key fact about compact sets is that every sequence has a convergent subsequence this is known as the Bolzano-Weierstrauss theorem. 3

References [Ber03] D. Bertsekas. Convex Analysis and Optimization. Athena Scientific, 003. [BV04] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 004. [Rud76] W. Rudin. Principles of Mathematical Analysis. McGraw-Hill, 1976. 4