Chapter 2: Preliminaries and elements of convex analysis Edoardo Amaldi DEIB Politecnico di Milano edoardo.amaldi@polimi.it Website: http://home.deib.polimi.it/amaldi/opt-14-15.shtml Academic year 2014-15 Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 1 / 23
2.1 Basic concepts In R n with Euclidean norm x S R n is an interior point of S if ε > 0 such that B ε(x) = {y R n : y x < ε} S. x R n is a boundary point of S if, for every ε > 0, B ε(x) contains at least one point of S and one point of its complement R n \S. The set of all the interior points of S R n is the interior of S, denoted by int(s). The set of all boundary points of S is the boundary of S, denoted by (S). S R n is open if S = int(s); S is closed if its complement is open. Intuitively, a closed set contains all the points in (S). S R n is bounded if M > 0 such that x M for every x S. S R n closed and bounded is compact. Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 2 / 23
Property: A set S R n is closed if and only if every sequence {x i } i N S that converges, converges to x S. A set S R n is compact if and only if every sequence {x i } i N S admits a subsequence that converges to a point x S. Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 3 / 23
Existence of an optimal solution In general, when we wish to minimize a function f : S R n R, we only know that there exists a greatest lower bound (infimum), that is inf x S f(x) Theorem (Weierstrass): Let S R n be nonempty and compact set, and f : S R be continuous on S. Then there exists a x S such that f(x ) f(x) for every x S. Examples in which the result does not hold because: S is not closed, S is not bounded or f(x) is not continuous on S. Since the problem admits an optimal solution x S, we can write min x S f(x). Observation: this result holds in any vector space of finite dimension. Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 4 / 23
Cones and affine subspaces Consider any subset S R n Definition: cone(s) denotes the set of all the conic combinations of points of S, i.e., all the points x R n such that x = m i=1 α i x i with x 1,...,x m S and α i 0 per every i, 1 i m. Examples: polyedral cone generated by a finite number of vectors, ice cream cone in R 3 generated by an infinite number of vectors Definition: aff(s) denotes the smallest affine subspace that contains S. aff(s) coincides with the set of all the affine combinations of points in S, i.e., all the points x R n such that x = m i=1 α i x i with x 1,...,x m S and with m i=1 α i = 1, where α i R for every i, 1 i m. Examples: straight line containing two points in R 2, plane containing three points in general position in R 3 Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 5 / 23
2.2 Elements of convex analysis Definition: A set C R n is convex if αx 1 +(1 α)x 2 C x 1,x 2 C and α [0,1]. A point x R n is a convex combination of x 1,...,x m R n se x = m α i x i i=1 with α i 0 for every i, 1 i m, and m i=1 α i = 1. Property: If C i with i = 1,...,k are convex, then k i=1c i is convex. Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 6 / 23
Examples of convex sets 1) Hyperplane H = {x R n : p t x = β} with p 0. For x H, p t x = β implies H = {x R n : p t (x x) = 0} and hence H is the set of all the vectors in R n orthogonal to p. N.B.: H is closed since H = (H) 2) Closed half-spaces H + = {x R n : p t x β} and H = {x R n : p t x β} with p 0. 3) Feasible region X = {x R n : Ax b, x 0} of a Linear Program (LP) min c t x s.t. Ax b x 0 X is a convex and closed subset (intersection of m+n closed half-spaces if A R m n ). Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 7 / 23
Definition: The intersection of a finite number of closed half-spaces is a polyedron. N.B.: Also the set of optimal solutions of a LP is a polyhedron (just add c t x = z to the constraints, where z is the optimal value) and hence convex. Definition: The convex hull of S R n, denoted by conv(s), is the intersection of all convex sets containing S. conv(s) coincides with the set set of all convex combinations of points in S. Two equivalent characterizations (external/internal descriptions). Definition: Given a convex set C of R n, a point x C is an extreme point of C if it cannot be expressed as convex combination of two different points of C, that is implies that x 1 = x 2. x = αx 1 +(1 α)x 2 with x 1,x 2 C and α (0,1) Examples: convex sets with a finite and infinite number of extreme points. Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 8 / 23
Projection on a convex set Projection of a point on a subset to which it does not belong to. Lemma (Projection): Let C R n be a nonempty, closed and convex set, then for every y C there exists a unique x C at minimum distance from y. Moreover, x C is the closest point to y if and only if (y x ) t (x x ) 0 x C. Definition: x is the projection of y on C. Geometric illustration: Proof: Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 9 / 23
Separation theorem and consequences Geometrically intuitive but fundamental result. Theorem (Separating hyperplane) Let C R n be a nonempty, closed and convex set and y C, then there exists a p R n such that p t x < p t y for every x C. Thus there exists a hyperplane H = {x R n : p t x = β} with p 0 that separates y from C, i.e., such that C H = {x R n : p t x β} and y H (p t y > β) Geometric illustration: Proof: According to the Lemma, (y x ) t (x x ) 0 x C. Taking p = (y x ) 0 and β = (y x ) t x we have p t x β x C, while p t y β = (y x ) t (y x ) = y x 2 > 0 since y C. Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 10 / 23
Three important consequences: 1) Any nonempty, closed and convex set C R n is the intersection of all closed half-spaces containing it. Definition: Let S R n be nonempty and x (S) ( boundary w.r.t. aff(s) ), H = {x R n : p t (x x) = 0} is a supporting hyperplane of S at x if S H or S H +. 2) Supporting hyperplane: If C is a convex set in R n, then for every x (C) there exists a supporting hyperplane H at x, i.e., p 0 such that p t (x x) 0, for each x C. A convex set admits at least a supporting hyperplane at each boundary point. For nonconvex sets, such a hyperplane may not exist. Examples: cases with 1/ /0 supporting hyperplanes in a given boundary point x Proof sketch: Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 11 / 23
Central result of Optimization (also of Game theory) from which we will derive the optimality conditions for Nonlinear Programming. 3) Farkas Lemma: Let A R m n and b R m. Then x R n such that Ax = b and x 0 y R m such that y t A 0 t e y t b > 0. In this form, it provides an infeasibility certificate for a given linear system, but it is also known as the theorem of the alternative. Alternative: exactly one of the two systems Ax = b,x 0 and y t A 0 t,y t b > 0 is feasible. Geometric interpretation: b belongs to the convex cone generated by the columns A 1,...,A n of A, i.e., to cone(a)= {z R m : z = n j=1 α j A j,α 1 0,...,α n 0}, if and only if no hyperplane separating b from cone(a) exists. Alternative: b cone(a) or b cone(a) (hence hyperplane separating b from cone(a)) Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 12 / 23
Proof: Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 13 / 23
Application of Farkas Lemma: asset pricing in absence of arbitrage Single period, m assets are traded, n possible states (scenarios). Assets can be bought or sold short, i.e., with the promise to buy them back at the end. Portfolio y R m, with y i = amount invested in asset i. A negative value of y i indicates a short position. If p i is the price of a unit of asset i at the beginning, the portfolio cost is: p t y Consider A R m n with a ij = value of one euro invested in asset i if state j occurs. At the end all assets are sold (we receive a payoff of a ij y i ) and the short positions are covered (we pay a ij y i and hence have a payoff of a ij y i ). Portfolio value: v t = y t A Absence of arbitrage condition: any portfolio with nonnegative values for all states must have a nonnegative cost. Algebraically: y such that y t A 0 t and p t y < 0. Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 14 / 23
According to Farkas Lemma: No possibility of arbitrage exist ( y such that y t A 0 t and p t y < 0 ) if and only if q R n with q 0 such that the asset price vector p satisfies Aq = p. If the market is efficient, some state prices q i exist. N.B.: in general q is not unique. Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 15 / 23
2.2.2 Convex functions Definitions: A function f : C R defined on a convex set C R n is convex if f(αx 1 +(1 α)x 2 ) αf(x 1 )+(1 α)f(x 2 ) x 1,x 2 C and α [0,1], f is strictly convex if the inequality holds with < for x 1,x 2 C with x 1 x 2 and α (0,1). f is concave if f is convex. The epigraph of f : S R n R, denoted by epi(f), is the subset of R n+1 epi(f) = {(x,y) S R : f(x) y}. Let f : C R be convex, the domain of f is the subset of R n dom(f) = {x C : f(x) < + }. Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 16 / 23
Property: Let C R n be a (nonempty) convex set and f : C R be a convex function. For each real β (also for + ), the level set L β = {x C : f(x) β} and {x C : f(x) < β} is a convex subset of R n. The function f is continuous in the relative interior (with respect to aff(c)) of its domain. The function f is convex if and only if epi(f) is a convex subset of R n+1 (exercise 1.3). Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 17 / 23
Optimal solution of convex problems Consider min x C R n f(x) where C R n is a convex set and f is a convex function. Proposition: If C R n is convex and f : C R is convex, each local minimum of f on C is a global minimum. If f is strictly convex on C, then there exists at most one global minimum (the problem could be unbounded). Proof: Suppose x is a local minimum and x C such that f(x ) < f(x ). Since f is convex f(αx +(1 α)x ) αf(x )+(1 α)f(x ) < f(x ) α (0,1) contradicts the fact that x is a local minimum. If f is strictly convex and x 1 and x 2 are two global minima, the convexity of C implies 1 2 (x 1 +x 2 ) C and strict convexity of f implies f( 1 2 (x 1 +x 2 )) < 1 2 f(x 1 )+ 1 2 f(x 2 ). Thus x 1 and x 2 cannot be two global minima. Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 18 / 23
Characterizations of convex functions 1) Proposition: A continuously differentiable function (of class C 1 ) f : C R defined on an open and nonempty convex set C R n is convex if and only if f(x) f(x)+ t f(x)(x x) x,x C. f is strictly convex if and only if the inequality holds with > for every pair x,x C with x x. Definition: Directional derivative of f: lim α 0 + f(x+α(x x)) f(x) α = t f(x)(x x) Geometric interpretation: The linear approximation of f at x (1st order Taylor s expansion) bounds below f(x) and ( ) ( ) x H = { R n+1 : ( t x f(x) 1) = f(x)+ t f(x) x } y y is a supporting hyperplane of epi(f) in (x,f(x)), with epi(f) H. Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 19 / 23
2) Proposition: A twice continuously differentiable function (of class C 2 ) f : C R defined on an open and nonempty convex set C R n is convex if and only if the Hessian matrix 2 f(x) = ( 2 f x i x j ) is positive semidefinite at every x S. For C 2 functions, if 2 f(x) is positive definite x C then f(x) is strictly convex. N.B.: This condition is sufficient but not necessary: f(x) = x 4 is strictly convex but f (0) = 0. Definition: A symmetric matrix A n n is positive definite if y t Ay > 0 y R n with y 0, A symmetric matrix A n n is positive semidefinite if y t Ay 0 y R n. Equivalent definitions: based on the sign of the eigenvalues/principal minors of A or of the diagonal coefficients of specific factorizations of A (e.g., Cholesky factorization). Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 20 / 23
Subgradient of convex and concave functions Convex/concave (continuous) functions that are not everywhere differentiable, e.g. f(x) = x. Generalization of the concept of gradient for C 1 functions to piecewise C 1 functions. Definitions: Let C R n be a convex set and f : C R be a convex function on C a vector γ R n is a subgradient of f at x C if f(x) f(x)+γ t (x x) x C, the subdifferential, denoted by f(x), is the set of all the subgradients of f at x. Example: For f(x) = x 2, in x = 3 the only subgradient is γ = 6. Indeed, 0 (x 3) 2 = x 2 6x +9 implies for every x: f(x) = x 2 6x 9 = 9+6(x 3) = f(x)+6(x x) Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 21 / 23
Other examples: 1) For f(x) = x it is clear that: γ = 1 if x > 0, γ = 1 if x < 0, f(x) = [ 1,1] if x = 0. 2) Consider f(x) = min{f 1(x),f 2(x)} with f 1(x) = 4 x and f 2(x) = 4 (x 2) 2. Since f 2(x) f 1(x) for 1 x 4, { 4 x 1 x 4 f(x) = 4 (x 2) 2 otherwise γ = 1 for x (1,4), γ = 2(x 2) for x < 1 or x > 4, γ [ 1,2] at x = 1, γ [ 4, 1] at x = 4. Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 22 / 23
Properties: 1) A convex function f : C R admits at least a subgradient at every interior point x of C. In particular, if x int(c) then there exists γ R n such that H = {(x,y) R n+1 : y = f(x)+γ t (x x)} is a supporting hyperplane of epi(f) at (x,f(x)). N.B.: The existence of (at least) a subgradient at every point of int(c), with C convex, is a necessary and sufficient condition for f to be convex on int(c). 2) If f is a convex function and x C, f(x) is a nonempty, convex, closed and bounded set. 3) x is a (global) minimum of a convex function f : C R if and only if 0 f(x ). Edoardo Amaldi (PoliMI) Optimization Academic year 2014-15 23 / 23