Stochastic Optimization Lecture notes for part I. J. Frédéric Bonnans 1

Size: px

Start display at page:

Download "Stochastic Optimization Lecture notes for part I. J. Frédéric Bonnans 1"

Arthur Reynolds
5 years ago
Views:

1 Stochastic Optimization Lecture notes for part I Optimization Master, University Paris-Saclay Version of January 25, 2018 J. Frédéric Bonnans 1 1 Centre de Mathématiques Appliquées, Inria, Ecole Polytechnique, Palaiseau, France. Updated versions: bonnans/notes.html

3 These lecture notes originated in Master courses given for several years in the University of Paris VI and in the University of Paris-Saclay. They give an introduction to convex analysis, in a Banach space setting, and its application to stochastic programming problems, i.e., to optimization problem where the decision must be taken in the presence of uncertainties, with a focus on dynamical problems. This is an active subject of research that covers many applications, especially in economics. There are already valuable classical textbooks on the subject, as Birge and Louveaux [7], Kall and Wallace [28]. The book [64] by Wallace and Ziemba is dedicated to applications. Some more advanced material is presented in Ruszczynski and Shapiro [53], Shapiro, Dentcheva and Ruszczynski [58], Föllmer and A. Schied [22]. Let us mention the historical review paper by Wets [65]. The basic tool for studying such problems is the combination of convex analysis with measure theory. Classical sources in convex analysis are Rockafellar [45], Ekeland and Temam [19]. An introduction to integration and probability theory is given in Malliavin [35]. The notes have the following structure. Chapter 1 present the basic tools in convex analysis: Fenchel conjugates, subdifferential, Lagrangian and perturbation duality, without neglecting linear programming duality and Hoffman s bounds. Chapter 2 recalls integration theory for a general measured space, and applies it to the computation of the conjugate of an integral functional. It ends with the study of optimization problems with integral cost function and constraints. Chapter 3 presents the concept of convergence in law, and the associated asymptotics (delta theorems). It applies these results to the computation of the asymptotic law of a sampled stochastic problem (including the case of constraints in expectations). Chapter 4 presents the notions of conditional expectation and variance, and applies them to the analysis of the optimality conditions of a dynamic, stochastic optimization problem with information constraints. The last chapter discusse some algorithmic questions. 3

5 Contents Chapter 1. A convex optimization toolbox 9 1. Convex functions Optimization problems Separation of convex sets Weak duality and saddle points Linear programming and Hoffman bounds Conjugacy Duality theory Perturbation duality Subdifferential calculus Minimax theorems Calmness Specific structures, applications Maxima of bounded functions Linear conical optimization Polyhedra Infimal convolution Recession functions and the perspective function Duality for nonconvex problems Convex relaxation Applications of the Shapley-Folkman theorem First order optimality conditions Multiobjective optimization Notes 70 Chapter 2. An integration toolbox Measure theory Measurable space Measures Kolmogorov s extension of measures Limits of measurable functions Integration L p spaces Bochner integrals Integral functionals Minimization of Carathéodory integrals 93 5

6 6 CONTENTS 2.2. Measurable multimappings Convex integrands Conjugates of integral functionals Deterministic decision in R m Constrained random decision Linear programming with simple recourse Applications of the Shapley-Folkman theorem Integrals of multimappings Constraints on integral terms Examples and exercices Notes 111 Chapter 3. Sampling and optimizing Examples and motivation Maximum likelihood Convergence in law and related asymptotics Probabilities over metric spaces Convergence in law Central limit theorems Delta theorem First order delta theorem Second order delta theorem Solving equations Error estimates Empirical distribution Minimizing over a sample Uniform convergence of values Asymptotic law Expectation constraints Large deviations Principle of large deviations Error estimates in stochastic programming Notes 133 Chapter 4. Dynamic stochastic optimization Conditional expectation Functional dependency Construction of the conditional expectation Conditional expectation of non integrable functions Computation in some simple cases Convergence theorems Conditional variance Compatibility with a subspace Compatibility with measurability constraints No recourse 146

7 CONTENTS 7 2. Dynamic stochastic programming Dynamic uncertainty Abstract optimality conditions The growing information framework The standard full information framework Independent noises Elementary examples Application to the turbining problem Notes 153 Chapter 5. Algorithms Stochastic dual dynamic programming (SDDP) Static case: Kelley s algorithm [29] Deterministic dual dynamic programming Stochastic case Introduction to linear decision rules About the Frobenius norm Setting Linear programming reformulation Linear conic reformulation Dual bounds in a conic setting 164 Bibliography 167 Index 171

9 1 A convex optimization toolbox Contents 1. Convex functions Optimization problems Separation of convex sets Weak duality and saddle points Linear programming and Hoffman bounds Conjugacy Duality theory Perturbation duality Subdifferential calculus Minimax theorems Calmness Specific structures, applications Maxima of bounded functions Linear conical optimization Polyhedra Infimal convolution Recession functions and the perspective function Duality for nonconvex problems Convex relaxation Applications of the Shapley-Folkman theorem First order optimality conditions Multiobjective optimization Notes Convex functions 9

10 10 1. A CONVEX OPTIMIZATION TOOLBOX 1.1. Optimization problems The language of minimization problems. Denote the set of extended real numbers by R := R { } {+ }. A minimization problem is of the form (P f,k ) Min x f(x); x K, where K is a subset of some set X, and f : X R (we say that f is extended real valued). The domain of f is (1.1) dom(f) := {x X; f(x) < + }. We say that f is proper is its domain is not empty, and if f(x) >, for all x X. The feasible set and value of (P f,k ) are resp. (1.2) F (P f,k ) := dom(f) K; val(p f,k ) := inf{f(x); x F (P )}. Since the infimum over the empty set is +, we have that val(p f,k ) < + iff F (P f,k ). The solution set of (P f,k ) is defined as (1.3) S(P f,k ) := {x F (P f,k ); f(x) = val(p f,k )}. Note that S(P f,k ) = when F (P f,k ) =. A metric over X is a function say d : X X R + such that d(x, y) = 0 iff x = y, that is symmetric: d(x, y) = d(y, x) and that satisfies the triangle inequality (1.4) d(x, z) d(x, y) + d(y, z). We say that X is a metric space if it is endowed with a metric. In that case, we say that f is lower semi continuous, or l.s.c., if for all x X, f(x) lim inf k f(x k ) whenever x k x. A minimizing sequence for problem (P f,k ) is a sequence x k in F (P f,k ) such that f(x k ) val(p f,k ). Such a sequence exists iff F (P f,k ) is non empty. Any (infinite) subsequence of a minimizing sequence is itself a minimizing sequence. If X is a metric space, K is closed and f is l.s.c., then any limit point of a minimizing sequence is solution of (P f,k ). Example 1.1. Consider the problem of minimizing the exponential function over R. The value if finite, but the solution set is empty. Note that minimizing subsequences have no limit point in R Operations on extended real valued functions. In the context of minimization problems, that f(x) = + is just a way to express that x is not feasible. Therefore the following algebraic rules for extended real valued functions are to be used: if f and g are extended real valued functions over X, then h := f + g is the extended real valued function over X defined by { (1.5) h(x) = + if max(f(x), g(x)) = +, f(x) + g(x) otherwise. Note that there is no ambiguity in this definition (taking the usual addition rules in the presence of ± ). The domain of the sum is the intersection of the domains.

11 1. CONVEX FUNCTIONS 11 Example 1.2. With a subset K of X we associate the indicatrix function I K : X R defined by { 0 if x K, (1.6) I K (x) := + otherwise. Let f K (x) := f(x) + I K (x). Then (P f,k ) has the same feasible set, value and set of solutions as (P fk,x). If f is an extended real valued function and λ > 0, we may define λf by the natural rule (1.7) (λf)(x) = λf(x), for all x X. Observe that (P λf,k ) has the same feasible set and set of solutions as (P f,x ), and the values are related by (1.8) val(p λf,k ) = λ val(p f,k ). For λ = 0, we must think of 0f as the limit of λf when λ 0, and therefore set { f(x) if f(x) = ±, (1.9) (0f)(x) = 0 otherwise. Then (P 0f,K ) has the same feasible set as (P f,x ), and its set of solutions is F (P f,x ) if f is proper. Example 1.3. Consider the entropy function f(x) = x log x (with the convention that 0 log 0 = 0) if x 0, and + otherwise. Then 0f is the indicatrix of R +. More generally, if f is proper, then 0f is the indicatrix of its domain Maximization problems. For a maximization problem (D g,k ) Max g(x); x x K we have similar conventions, adapted to the maximization framework. In particular the domain of g is dom(g) = {x X; g(x) > }. The domain of the sum is still the intersection of the domains. Note that (D g,k ) is essentially the same problem as (D g,k ) Min x g(x); x K. Indeed these two problems have the same feasible set and set of solutions, and they have opposite values Convex sets and functions. Let X be a vector space. We say that K X is convex if (1.10) For any x and y in K, and α (0, 1), we have that αx + (1 α)y K. We say that f : X R is convex if { For any x and y in dom(f), and α (0, 1), we have that (1.11) f(αx + (1 α)y) αf(x) + (1 α)f(y). We see that a convex function has a convex domain.

12 12 1. A CONVEX OPTIMIZATION TOOLBOX The epigraph of f : X R is the set (1.12) epi(f) := {(x, α) X R; α f(x)}. Its projection over X is dom(f). One easily checks the following: Lemma 1.4. Let f : X R. Then (i) Its epigraph is convex iff f is convex, (ii) if X is a metric space, its epigraph is closed iff f is l.s.c. Example 1.5. The epigraph of the indicatrix of K X is K R Separation of convex sets. We recall without proof the Hahn-Banach theorem, valid in a vector space setting, and deduce from it some results of separation of convex sets in normed vector spaces The Hahn-Banach theorem. Let X be a vector space. We say that p : X R is positively homogeneous and subadditive if it satisfies { (i) p(αx) = αp(x), for all x X and α > 0, (1.13) (ii) p(x + y) p(x) + p(y), for all x and y in X. Remark 1.6. (a) Taking x = 0 in (1.13)(i), we obtain that p(0) = 0, and so we could as well take α = 0 in (1.13)(i). (b) If β (0, 1), combining the above relations, we obtain that (1.14) p(βx + (1 β)y) βp(x) + (1 β)p(y), i.e., p is convex. Conversely, it is easily checked that a positively homogeneous (finite-valued) convex function is subadditive. The analytical form of the Hahn-Banach theorem, a nontrivial consequence of Zorn s lemma, is as follows (see [10] for a proof): Theorem 1.7. Let p satisfy (1.13), X 1 be a vector subspace of X, and λ be a linear form defined on X 1, that is dominated by p in the sense that (1.15) λ(x) p(x), for all x X 1. Then there exists a linear form µ on X, dominated by p, whose restriction on X 1 coincides with λ. We say that a real vector space X is a normed space when endowed with a mapping X R, x x satisfying the three axioms x 0, with equality iff x = 0, (1.16) αx = α x, for all α R, x X, x + x x + x, (triangle inequality). Then (x, y) x y is a metric over X. We denote the norm of Euclidean spaces, i.e., finite dimensional spaces endowed with the norm ( i x2 i )1/2, by x. A sequence x k in a normed vector space X is said to be a Cauchy sequence if x p x q 0 when p, q. We say that X is a Banach space if every Cauchy sequence has a (necessarily unique) limit.

13 1. CONVEX FUNCTIONS 13 The topological dual X of the normed vector space X is the set of continuous linear forms (applications X R) on X. In the sequel, by dual space we will mean the topological dual. We denote the duality product between x X and x X by x, x X or simply x, x. Note that a linear form say l over X is continuous iff it is continuous at 0, which holds iff sup{l(x); x 1} <. So we may endow X with the norm (1.17) x := sup{ x, x ; x 1}. It is easily checked that X is a Banach space. The dual of R n (space of vertical vectors) is denoted by R n (space of horizontal vectors). In the sequel we may denote the dual norm by x. If X and Y are Banach spaces, we denote by L(X, Y ) the Banach space of linear continuous mappings X Y, endowed with the norm A := sup{ Ax ; x 1}. We denote by B X (resp. B X ) the open (resp. closed) unit ball of X. If x 1 is a continuous linear form on a linear subspace X 1 of X, is norm is defined accordingly: (1.18) x 1 1, = sup{ x, x ; x X 1, x 1}. Here are some other corollaries of the Hahn-Banach theorem. Corollary 1.8. Let x 1 be a continuous linear form on a linear subspace X 1 of the normed space X. Then there exists x X, whose restriction on X 1 coincides with x 1, and such that (1.19) x = x 1 1,. Proof. Apply theorem 1.7 with p(x) := x 1 1, x. Since x, ±x p(x), we have that x 1 implies x, ±x x 1 1,. The result follows. Corollary 1.9. Let x 0 belong to the normed vector space X. Then there exists x X such that x = 1 and x, x 0 = x 0. Proof. Apply corollary 1.8 with X 1 = Rx 0 and x 1 (tx 0) = t x 0, for t R. The orthogonal of E X is the closed subspace of X defined by (1.20) E := {x X ; x, x = 0, for all x E}. Lemma Let E be a subspace of X. Then E = {0} iff E is dense. Proof. a) If E is dense, given x X, there exists a sequence x k in E, x k x and hence, for all x E, x, x = lim k x, x k = 0, proving that x = 0. b) If E is not dense, let x 0 Ē (closure of E). We may assume that x 0 = 1 and that B(x 0, ε) E = for some ε > 0. Let E 0 := E (Rx 0 ) denote the space spanned by E 0 and x 0. Consider the linear form λ on E 0 defined by (1.21) λ(e + αx 0 ) = α, for all e E and α R. Since any x E 0 has a unique decomposition as x = e + αx 0 with e E and α R, the linear form is well-defined. Let such an x satisfy α 0. Since e := e/α does not belong to B(x 0, ε), we have that x = α x 0 e ε α, and hence, λ(x) = α x /ε. If α = 0 we still have λ(x) x /ε. By corollary 1.8, λ has an extension to a continuous linear form on X, which is a nonzero element of E.

14 14 1. A CONVEX OPTIMIZATION TOOLBOX Bidual space, reflexivity. Given x X, the mapping l x : X R, x x, x is by (1.17) linear continuous. Since x, x x x, its norm l x (in the bidual space X ) is not greater than x, and as a consequence of corollary 1.9, is equal to x : the mapping x l x is isometric. This allows to identify X with a closed subspace of X. We say that X is reflexive if X = X. The Hilbert spaces are reflexive, see [10] Separation theorems. We assume here that X is a normed vector space. A (topological) hyperplane of X is a set of the form: (1.22) H x,α := {x X; x, x = α, for some (x, α) X R, x 0}. We call (closed) half-space of X a set of the form (1.23) {x X; x, x α}, where x 0. Definition Let A and B be two subsets of X. We say that the hyperplane H x,α separates A and B if (1.24) x, a α x, b, for all (a, b) A B. We speak of a strict separation if (1.25) x, a < x, b, for all (a, b) A B, and of a strong separation if, for some ε > 0, (1.26) x, a + ε α x, b ε, for all (a, b) A B. We say that x X (nonzero) separates A and B if (1.24) holds for some α, strictly separates A and B if (1.25) holds, and strongly separates A and B if (1.26) holds for some ε > 0 and α. If A is the singleton {a}, then we say that x separates a and B, etc. Given two subsets A and B of a vector space X, we define their Minkowski sum and difference as { A + B = {a + b; a A, b B}, (1.27) A B = {a b; a A, b B}. The first geometric form of the Hahn-Banach theorem is as follows: Theorem Let A and B be two nonempty subsets of the normed vector space X, with empty intersection. If A B is convex and has a nonempty interior, then there exists an hyperplane H x,α separating A and B, such that (1.28) x, a < x, b, whenever (a, b) A B and a b int(a B). Note that A B has a nonempty interior whenever either A or B has a nonempty interior. The proof needs the following concept. Definition Let C be a convex subset of X whose interior contains 0. The gauge function of C is (1.29) g C (x) := inf{β > 0; β 1 x C}. Example If C is the closed unit ball of X, then g C (x) = x for all x X.

15 1. CONVEX FUNCTIONS 15 A gauge function is obviously positively homogeneous and finite. If B(0, ε) C for some ε > 0, then: (1.30) g C (x) x /ε, for all x X, so it is bounded over bounded sets. In addition, for any β > g C (x) and γ > 0, since x βc and B(0, γε) γc, we get x+b(0, γε) (β+γ)c, so that g C (y) g C (x)+γ, for all y B(x, γε). We have proved that (1.31) If B(0, ε) C, then g C is Lipschitz with constant 1/ε. It easily follows that (1.32) {x X; g C (x) < 1} = int(c) C = {x X; g C (x) 1}, Lemma A gauge is subadditive and convex. Proof. Let x and y belong to X. For all β x > g C (x) and β y > g C (y), we have that (β x ) 1 x C and (β y ) 1 y C, so that (1.33) x + y β x + β y = β x β x + β y (β x ) 1 x + β y β x + β y (β y ) 1 y C. Therefore, g C (x+y) β x +β y. Since this holds for any β x > g C (x) and β y > g C (y), we obtain that g C is subadditive. Since g C is positively homogeneous, it easily follows that g c is convex. Proof of theorem Let x 0 int(b A); since A B =, x 0 0. Set (1.34) C := {a b + x 0, b B, a A}. We easily check that 0 int(c). Obviously, x separates A and B iff it separates C and {x 0 }. Let λ be the linear form defined on X 1 := Rx 0 by λ(tx 0 ) = t, for t R. Since A B =, x 0 C, and hence, g C (x 0 ) 1. It easily follows that λ is dominated by g C on X 1. By theorem 1.7, there exists a (possibly not continuous) linear form x on X, dominated by g C, whose restriction on X 1 coincides with λ. Since 0 int(c) we have that (1.30) holds for some ε > 0, and hence, being dominated by g C, x is continuous. It follows that x, x 1, for all x C, or equivalently (1.35) x, a x, b + x, x 0 1, for all (a, b) A B, whereas x, x 0 = 1. Therefore x separates A and B. In addition, if a b int(a B), say B(a b, ε) A B for some ε > 0, then x, a b+e 0 whenever e ε; maximizing over e B(0, ε) we obtain that x, a b ε x. Relation (1.28) follows. Corollary Let E be a closed convex subset of the normed space X. Then there exists an hyperplane that strongly separates any x 0 E and E. Proof. For ε > 0 small enough, the open convex set A := B(x 0, ε) has empty intersection with E. By theorem 1.12, there exists x 0 separating A and E, that is (1.36) x, x 0 + ε x = sup{ x, a ; a A} inf{ x, b ; b E}. The conclusion follows.

16 16 1. A CONVEX OPTIMIZATION TOOLBOX Remark Corollary 1.16 can be reformulated as follows: any closed convex subset of a normed space is the intersection of half spaces in which it is contained. The following example shows that, even in a Hilbert space, one cannot in general separate two convex sets with empty intersection. Example Let X = l 2 be the space of real sequences whose sum of squares of coefficients is summable. Let C be the subset of X of sequences with finitely many nonzero coefficients, the last one being positive. Then C is a convex cone that does not contain 0. Let x separate 0 and C. We can identify the Hilbert space X with its dual, and therefore x with an element of X. Since each element e i of the natural basis belongs to the cone C, we must have x i 0 for all i, and x j > 0 for some j. For any ε > 0 small enough, x := e j + εe j+1 belongs to C, but x, x = x j + εx j+1 < 0. This shows that one cannot separate the convex sets So, 0 and C cannot be separated Relative interior. Again let X be a normed vector space, and E be a convex subset of X. We denote by affhull(e) the intersection of affine spaces containing E, and by affhull(e) its closure; the latter is the smallest closed affine space containing E. The relative interior of E, denoted by rint(e), is the interior of E viewed as a subset of affhull(e). Proposition Let A and B be two nonempty subsets of X, with empty intersection. If A B is convex and has a nonempty relative interior, then there exists an hyperplane H x,α separating A and B, and such that (1.37) x, a < x, b, whenever (a, b) A B and a b rint(a B). Proof. Set E := B A and Y := affhull(e). By theorem 1.12, there exists y in Y separating 0 and E, with strict inequality for rint(e). By theorem 1.7, there exists x X whose restriction to Y is y, and the conclusion holds with x. Remark By the previous proposition when B = {b} is a singleton, noting that rint(a b) = rint(a) b, we obtain that when A is convex, if b rint(a) then there exists x X such that (1.38) x, a < x, b, whenever a rint(a). Since any convex subset of a finite dimensional subspace, not reduced to a singleton, has a nonempty relative interior 1, we deduce the following: Corollary Let A and B be two convex and nonempty subsets of an Euclidean space, with empty intersection. Then there exists an hyperplane H x,α separating A and B, such that (1.37) holds. 1 Except maybe when the set is a singleton and then the dimension is zero, where this is a matter of definition. However the case when A B reduces to a singleton means that both A and B are singletons and then it is easy to separate them.

17 1. CONVEX FUNCTIONS Weak duality and saddle points. Let X and Y be two sets and let L : X Y R. Then we have the weak duality inequality (1.39) sup y Y inf x X Indeed, let (x 0, y 0 ) X Y. Then L(x, y) inf sup x X y Y L(x, y). (1.40) inf L(x, y 0) L(x 0, y 0 ) sup L(x 0, y). x X Removing the middle term, maximizing the left side w.r.t. y 0 and minimizing the right side w.r.t. x 0, we obtain (1.39). We next define the primal and dual values, resp., for x X and y Y, by (1.41) p(x) := sup L(x, y); y Y and the primal and dual problem by (P ) Min x X p(x) (D) Max y Y d(y). y Y d(y) := inf L(x, y), x X The weak duality inequality says that val(d) val(p ). We say that ( x, ȳ) X Y is a saddle point of L over X Y if (1.42) L( x, y) L( x, ȳ) L(x, ȳ), for all (x, y) X Y. An equivalent relation is (1.43) sup L( x, y) = L( x, ȳ) = inf L(x, ȳ). y Y x X Minorizing the left hand term by changing x into the infimum w.r.t. x X and majorizing symetrically the right hand term, we obtain (1.44) inf sup x X y Y L(x, y) L( x, ȳ) sup y Y inf x X L(x, y) which, combined with the weak duality inequality, shows that x S(P ), ȳ S(D) and (1.45) val(d) = val(p ) = L( x, ȳ). But we have more, in fact if we denote by SP (L) the set of saddle points, then: Lemma The following holds: (1.46) {val(d) = val(p ) is finite} SP (L) = S(P ) S(D). Proof. Indeed, let x S(P ) and ȳ S(D). Then (1.47) val(d) = inf x X L(x, ȳ) L( x, ȳ) sup L( x, y) = val(p ). y Y If val(d) = val(p ), then these inequalities are equalities, so that (1.43) holds, and therefore, ( x, ȳ) is a saddle point. The converse implication was already obtained.

18 18 1. A CONVEX OPTIMIZATION TOOLBOX 1.4. Linear programming and Hoffman bounds Linear programming. We assume in this section that X is a vector space (with no associated topology; this abstract setting does not make the proofs more complicated). Consider the infinite-dimensional linear program (LP ) Min c, x ; a i, x b i, i = 1,..., p; x X where c and a i, i = 1,..., p, are linear forms over X, b R p, and, denotes the action of a linear form over X. The associated Lagrangian function L : X R p R is defined as p (1.48) L(x, λ) := c, x + λ i ( a i, x b i ), where the multiplier λ has to belong to R p +. The primal value satisfies { c, x if x F (LP ), (1.49) p(x) = sup L(x, λ) = + otherwise. λ R p + i=1 Therefore (LP ) and the primal problem (of minimizing p(x)) have the same value and set of solutions. Since L(x, λ) = c + p i=1 λ ia i, x λb, we have that { λb if c + p (1.50) d(λ) = inf L(x, λ) = i=1 λ ia i = 0, x otherwise. The dual problem has therefore the same value and set of solutions than the following problem, called dual to (LP ): p (LD) Max λb; c + λ i a i = 0. λ R p + For x F (LP ), we denote the associated set of active constraints by (1.51) I(x) := {1 i p; a i, x = b i }. Consider the optimality system { (i) c + p (1.52) i=1 λ ia i = 0, (ii) λ i 0, a i, x b i, λ i ( a i, x b i ) = 0, i = 1,..., p. Lemma The pair (x, λ) F (LP ) F (LD) is a saddle point of the Lagrangian iff (1.52) holds. Proof. Let x F (LP ) and λ F (LD). Then (1.52)(i) holds, implying that the difference of cost function is equal to p (1.53) c, x + λb = λ i (b i a i, x ). i=1 This sum of nonnegative terms is equal to zero iff the last relation of (1.52)(ii) holds, and then x S(LP ) and λ S(LD), proving that (x, λ) is a saddle point. The converse implication is easily obtained. i=1 We next deal with the existence of solutions.

19 1. CONVEX FUNCTIONS 19 Lemma If (LP ) (resp. (LD)) has a finite value, then its set of solutions is non empty. Proof. The proof of the two cases being similar, it suffices to prove the first statement. Since (LP ) has a finite value, there exists a minimizing sequence x k. Extracting if necessary a subsequence, we may assume that I(x k ) is constant say equal to J. Among such minimizing sequences we may assume that J is of maximal cardinal. If c, x k has, for large enough k, a constant value, then the corresponding x k are solution of (LP ). Otherwise, extracting if necessary a subsequence, we may assume that c, x k+1 < c, x k for all k. Set d k := x k+1 x k, and consider the set E k := {ρ 0; x k + ρd k F (LP )}. Since c, d k < 0 and val(lp ) >, this set is bounded. The maximal element is (1.54) ρ k := argmin{(b i a i, x k )/ a i, d k ; a i, d k > 0}. i We have that y k := x k + ρ k d k F (LP ), c, y k < c, x k and J I(y k ) strictly. Extracting from y k a minimizing sequence with constant set of active constraints, strictly containing J, we contradict the definition of J. Lemma Given y 1,..., y q in R n, the set E := { q i=1 λ iy i, λ 0} is closed. Proof. (i) Assume first that the y i are linearly independent, and denote by Y the generated vector space, of which they are a basis. The coefficients λ i represent the coordinate in this basis, and so, if e k e in E, its coordinates converge to those of e and the result follows. (ii) We next deal with the general case. Let e k e in E. Let us associate with each e k some λ k R q + such that ek = q i=1 λk i yi, and that λ k has minimal support (the support being the set of nonzero components). Taking if necessary a subsequence, we may assume that this support J is constant along the sequence. We claim that {y i, i J} is linearly independent, for if q i=1 µ iy i = 0 with µ 0, and µ i = 0 when λ k i = 0, we can find β k so that µ k := λ k + β k µ is nonnegative and has a support smaller than J, and e k = q i=1 µk i yi, in contradiction with the definition of J. Since {y i, i J} is linearly independent, we deduce as in step (i) that the λ k converge to some λ and that e = q λ i=1 i y i. The conclusion follows. We next prove a strong duality result. Lemma If val(lp ) is finite, then val(lp ) = val(ld) and both S(LP ) and S(LD) are non empty. Proof. By lemma 1.26, (LP ) has a solution x. Set J := I( x), j := J (the cardinal of J); we may assume that J = 1,..., j. Consider the set (1.55) C := {d X; a i, d 0, i J}. Obviously, if d C, then x + ρd F (LP ) for small enough ρ > 0, and consequently (1.56) c, d 0, for all d C.

20 20 1. A CONVEX OPTIMIZATION TOOLBOX Consider the mapping Ax := ( a 1, x,..., a j, x, c, x ) over X, with image E 1 R j+1. We claim that the point z := (0,..., 0, 1) does not belong to the set E 2 := E 1 + R j+1 +. Indeed, otherwise we would have z Ax, for some x X, and so a i, x 0, for all i J, whereas z j+1 = 1 contradicts (1.56). Let z 1,..., z q be a basis of the vector space E 1. Then E 2 is the set of nonnegative linear combinations of {±z 1,..., ±z q, e 1,..., e j+1 }, where by e i we denote the elements of the natural basis of R j+1. By lemma 1.25, E 2 is closed. Corollary 1.16, allows to strictly separate z and E 2. That is, (1.57) λ j+1 = λz < inf y E 2 λy, for some nonzero λ R j+1. Since E 2 is a cone, the above infimum is zero, whence λ j+1 > 0. Changing λ into λ/λ j+1 if necessary, we may assume that λ j+1 = 1. Since E 2 = E 1 + R j+1 + it follows that λ R j+1 +, and since E 1 E 2 we deduce that 0 i J λ ia i + c, d for all d X, meaning that i J λ ia i + c = 0. Let us now set λ i = 0 for i p, i J. Then (1.52) holds at the point x. By lemma 1.23, ( x, λ) is a saddle point and the conclusion follows. Remark It may happen, even in a finite dimensional setting, that both the primal and dual problem are unfeasible, so that they have value + and resp.; consider for instance the problem Min x R { x; 0 x = 1; x 0}, whose dual is Max λ R 2{ λ 1 ; 1 λ 2 = 0; λ 2 0} Hoffman bounds. As an application of linear programming duality we present Hoffman s lemma [27]. Lemma Given a Banach space X, a 1,..., a p in X, and b R p, set (1.58) C b := {x X; a i, x b i, i = 1,..., p}. Then there exists an Hoffman constant M > 0, not depending on b, such that, if C b, then p (1.59) dist(x, C b ) M ( a i, x b i ) +, for all x X. i=1 Proof. a) Define A L(X, R p ) by Ax = ( a 1, x,..., a p, x ). Let x 1,..., x q be elements of X such that (Ax 1,..., Ax q ) is a basis of Im(A). Then q p, and the family {x 1,..., x q } is linearly independent. Denote by H the vector space with basis x 1,..., x q. The Euclidean norm over H is equivalent to the one induced by X, since all norms are equivalent on finite dimensional spaces. b) Let x X. We may express the coordinates of Ax on the basis of the {Ax j } as functions of x, i.e., write Ax = q j=1 α j(x)ax j for some linear function α : X R q, that is continuous. Indeed, by the equivalence of norms in a finite dimensional space, for some positive c not depending on x: (1.60) α(x) c Ax c A x. Setting Cx = q j=1 α j(x)x j, and Bx := x Cx, we may write x = Bx + Cx, and

21 1. CONVEX FUNCTIONS 21 (1.61) ABx = Ax q α j (x)ax j = 0, j=1 i.e., AB = 0. So, C b (assumed to be non empty) is invariant by the addition of an element of the image of B. In particular it contains an element ˆx such that ˆx x H, that is, (1.62) ˆx = x + q β j x j. Specifically, let ˆx be such an element of C b for which β is minimum, that is, β is solution of the problem 1 (1.63) Min γ R q 2 γ 2 ; a i, x + j=1 q γ j x j b i, i = 1,..., p. j=1 The following optimality conditions hold: there exist λ R q + such that (1.64) γ j + as well a p λ i a i, x j = 0, j = 1,..., q. i=1 (1.65) λ i a i, x + q γ j x j b i = 0, i = 1,..., p. j=1 Using first (1.64) and then (1.65), we obtain (1.66) q q γ 2 = λ i a i, γ j x j = λ i ( a i, x b i ) λ j=1 γ 2 j = i,j i=1 q ( a i, x b i ) +. c) Among all possible multipliers λ we may take one with minimal support; From (1.64) we deduce the existence of M 1 > 0 not depending on x and b, such that (1.67) λ M 1 γ. Combining with (1.66), we deduce that q (1.68) γ M 1 ( a i, x b i ) +. i=1 The conclusion follows since, as noticed before, the Euclidean norm on H is equivalent to the one induced by the norm of X. i=1

22 22 1. A CONVEX OPTIMIZATION TOOLBOX Open mapping theorem. We will generalize the previous result, and for this need the following fundamental result in functional analysis. Theorem 1.29 (Open mapping theorem). Let X and Y be Banach spaces, and let A L(X, Y ) be surjective. Then αb Y AB X, for some α > 0. Proof. See e.g. [10]. Corollary Let A and α be as in theorem Then Im(A ) is closed, and (1.69) A λ α λ, for all λ Y. Proof. By the open mapping theorem, we have that (1.70) A λ = sup λ, Ax X α sup λ, y Y = α λ, x 1 y 1 proving (1.69). Let us now check that Im(A ) is closed. Let x k in Im(A ) converge to x. There exists a sequence λ k Y such that x k = A λ k. In view of (1.69), λ k is a Cauchy sequence and hence has a limit λ Y. Therefore x = A λ Y. The conclusion follows. Proposition Let X and Y be Banach spaces, and A L(X, Y ). Then Im(A ) (KerA), with equality if A has closed range. Proof. (a) Let x Im(A ), i.e., x = A y, for some y Y, and x Ker A. Then x, x X = y, Ax Y = 0. Therefore, Im(A ) (Ker A). (b) Assume now that A has closed range. Let x (Ker A). For y Y, set (1.71) v(y) := x, x, where x X satisfies Ax = y. Since x (Ker A), any x such that Ax = y gives the same value of x, x X, and therefore v(y) is well-defined. It is easily checked that it is a linear function. By the open mapping theorem, applied to the restriction of A from X to its image (the latter being a Banach space by hypothesis), there exists x α 1 y Y B X such that Ax = y, so that v(y) α 1 x y Y. So, v is a linear and continuous mapping, i.e., there exists y Y such that v(y) = y, y Y. For all x X, we have therefore x, x X = y, Ax Y = A y, x X, so that x = A y, as was to be proved. Remark See in example another proof, based on duality theory. Example Let X := L 2 (0, 1), Y := L 1 (0, 1), and A L(X, Y ) be the injection of X into Y. Then Ker A is reduced to 0, and therefore its orthogonal is X. On the other hand, we have that for y L (0, 1), A y is the operator in X defined by x 1 0 y (t)x(t)dt. So the image of A is a dense subspace of X, but A is not surjective, and therefore its image is not closed. We next give a useful generalization of Hoffman s lemma 1.28 in the homogeneous case.

23 1. CONVEX FUNCTIONS 23 Lemma Given Banach spaces X and Y, A L(X, Y ) surjective, and a 1,..., a p in X, set C := {x X; Ax = 0; a i, x 0, i = 1,..., p}. Then there exists an Hoffman constant M > 0 such that ( ) p (1.72) dist(x, C) M Ax + ( a i, x ) +, for all x X. i=1 Proof. By the open mapping theorem 1.29, there exists x Ker A such that x x α 1 Ax, where α is given by theorem Therefore for some M > 0 not depending on x: ( (1.73) ai, x ) + ( a i, x ) + + M Ax. Applying lemma 1.28 to x, with Ker A in place of X, we obtain the desired conclusion Conjugacy Basic properties. Let X be a Banach space and f : X R. Its (Legendre- Fenchel) conjugate is the function f : X R defined by (1.74) f (x ) := sup x, x f(x). x X This can be motivated as follows. Let us look for an affine minorant of f of the form x, x β. For given x, the best (i.e., minimal) value of β is precisely f (x ). Being a supremum of affine functions, f is l.s.c. convex. We obviously have the Fenchel-Young inequality (1.75) f (x ) x, x f(x), for all x X and x X. Remark We have that (1.76) f (x ) := for each x if dom(f) =. (1.77) f (x ) > for each x if dom(f). Since the supremum over an empty set is, we may always express f by maximizing over dom(f): (1.78) f (x ) := sup x, x f(x). x dom(f) If f(x) is finite we can write the Fenchel-Young inequality inequality in the more symmetric form (1.79) x, x f(x) + f (x ). Lemma Let f be proper. Then the symmetric form (1.79) of the Fenchel- Young inequality inequality is valid for any (x, x ) in X X. Proof. By (1.77), f (x ) >, and f(x) >, so that (1.79) makes sense and is equivalent to the Fenchel-Young inequality.

24 24 1. A CONVEX OPTIMIZATION TOOLBOX Example Let X be an Hilbert space, and set f : X R by f(x) := 1 2 x 2. We identify X with its dual. Then it happens that f (x) = f(x) for all x in X, leading to the well-known inequality (where the l.h.s. is the scalar product between x and y): (1.80) (x, y) 1 2 x y 2, for all x, y in X. Example Let p > 1. Define f : R R by f(x) := x p /p. For y R, the maximum in x xy f(x) is attained at 0 if y = 0, and otherwise for some x 0 of the same sign as y such that x p 1 = y. Introducing the conjugate exponent p such that 1/p + 1/p = 1, we get f (y) = y p /p, so that xy x p /p + y p /p, for all x, y in R. Similarly, for some p > 1, let f : R n R be defined by f(x) := x p p/p, where x p p = n i=1 x i p. We easily obtain that f (y) = y p p /p, leading to the Young inequality (1.81) n i=1 x i y i 1 p x p p + 1 p y p p, for all x, y in R. Exercice Let A be a symmetric, positive definite n n matrix. (i) Check that the conjugate of f(x) := 1 2 x Ax is f (y) := 1 2 y A 1 y. Taking x = y, deduce the Young inequality (1.82) x x Ax x A 1 x, for all x R n. Conclude that (1.83) A + A 1 2I is positive semidefinite, if A is symmetric and positive definite. Exercice Check that the conjugate of the indicatrix of the (open or closed) unit ball of X is the dual norm. Exercice Let f(x) := αg(x) with α > 0 and g : X R. Show that (1.84) f (x ) = αg (x /α). Exercice Show that the Fenchel conjugate of the exponentiel is the entropy function H with value H(x) = x(log x 1) if x > 0, H(0) = 0, and H(x) = if x < 0. Deduce the inequality xy e x + y(log y 1), for all x R and y > 0. The biconjugate of f is the function f : X R defined by (1.85) f (x) := sup x X x, x f (x ). Proposition The biconjugate f is the supremum of affine minorants of f. Proof. Let x X. If f (x ) =, then f has no affine minorant with slope x. Otherwise, as we already observed, x, x f (x ) is an affine minorant of f with the best possible constant term. The conclusion follows.

25 1. CONVEX FUNCTIONS 25 An hyperplane ( x, β) X R separating (x 0, α 0 ) from epi(f) is such that (1.86) x, x 0 + βα 0 x, x + βα, for all (x, α) epi(f). If dom(f), then for some x dom(f), we can take α + and it follows that β 0. If β = 0, we say that the hyperplane is vertical and the above relation reduces to (1.87) x, x 0 x, x for all x dom(f), i.e., x separates x 0 from dom(f). Otherwise, we say that the separating hyperplane is oblique. We may then assume that β = 1 and we obtain that (1.88) x, x 0 + α 0 x, x + f(x), for all x dom(f). This is equivalent to say that x x, x x 0 + α 0 is an affine minorant of f. Theorem Let f : X R be proper, l.s.c. convex. Then f = f. Proof. a) It suffices to prove that any (x 0, α 0 ) epi(f) can be strictly separated from epi(f) by an oblique hyperplane. Indeed, the corresponding affine minorant guarantees then that (x 0, α 0 ) epi(f ). Since f f, it follows that epi(f) = epi(f ) as is to be proved. b) Since epi(f) is a closed convex subset of X R, by corollary 1.16, it can be strictly separated from (x 0, α 0 ). Note that not all separating hyperplanes are vertical, since otherwise f would have value over its (nonempty) domain. It follows that f has an affine minorant, say x x, x + γ. If the hyperplane strictly separating (x 0, α 0 ) and epi(f) is oblique, it provides an affine minorant of f with value greater than α 0 at the point x 0, as required. If, on the contrary, the hyperplane strictly separating (x 0, α 0 ) and epi(f) is vertical, say y, x x 0 + ε 0 for all x dom(f), with y 0 and ε > 0, then we have that for any β > 0 and all x dom(f): (1.89) f(x) β( y, x x 0 + ε) + x, x + γ, meaning that the above r.h.s. is an affine minorant of f. At the same time, its value at x 0 is βε + x, x 0 + γ, which for β > 0 large enough is larger than α 0. So this r.h.s. is an oblique hyperplane separating (x 0, α 0 ) from epi(f). The conclusion follows. Definition (i) Let E X. The convex hull conv(e) is the smallest convex set containing E, i.e., the set of finite convex combinations (linear combinations with nonnegative weights whose sum is 1) of elements of E. The convex closure of E denoted by conv(e) is the smallest closed convex set containing E (i.e., the intersection of closed convex set containing E). (ii) Let f : X R. The convex closure of f is the function conv(f) : X R whose epigraph is conv(epi(f)) (note that conv(f) is the supremum of l.s.c. convex minorants of f). We have obviously that f = conv(f) iff f is convex and l.s.c.

26 26 1. A CONVEX OPTIMIZATION TOOLBOX Theorem 1.46 (Fenchel-Moreau-Rockafellar). Let f : X R. We have the following alternative: either (i) f = identically, conv(f) has no finite value, and has value at some point, or (ii) f = conv(f) and conv(f)(x) >, for all x X. Proof. If f is identically equal to +, the conclusion is obvious. So we may assume that dom(f). Since f is a l.s.c. convex minorant of f, we have that f conv(f). So, if conv(f)(x 1 ) = for some x 1 X, then f has no affine minorant and f =. In addition, since conv(f) is l.s.c. convex, for any x dom(conv(f)), setting x θ := θx + (1 θ)x 1, we have that (1.90) conv(f)(x) lim θ 1 conv(f)(x θ ) θ conv(f)(x) + (1 θ) conv(f)(x 1 ) =, so that (i) holds. On the contrary, if (i) does not hold, then f has a continuous affine minorant, so that then conv(f)(x) >, for all x X. Being proper, l.s.c. and convex, conv(f) is by theorem 1.44 the supremum of its affine minorants, which coincide with the affine minorants of f. The conclusion follows. Corollary Let f be convex X R. Then (i) conv(f)(x) = lim inf x x f(x ), for all x X, (ii) if f is finite valued and l.s.c. at some x 0 X, then f(x 0 ) = f (x 0 ). Proof. (i) Set g(x) := lim inf x x f(x ). It is easily checked that g is a l.s.c. convex minorant of f, and therefore g conv(f). On the other hand, since conv(f) is a l.s.c. minorant of f, we have that conv(f)(x) lim inf x x f(x ) = g(x), proving (i). (ii) By point (i), since f is finite valued and l.s.c. at x 0, we have that f(x 0 ) = conv(f)(x 0 ) >, and we conclude with theorem Example Let K be a nonempty closed convex subset of X, and set f(x) = if x K, and f(x) = + otherwise. Then f is l.s.c. convex, and f has value everywhere, so that f f Conjugacy in dual spaces. Let X be a Banach space, and g : X R. Its (Legendre-Fenchel) conjugate (in the dual sense) is the function g : X R defined by (1.91) g (x) := sup x X x, x g(x ). So we have the dual Fenchel-Young inequality (1.92) g (x) x, x g(x ), for all x X and x X. Being a supremum of affine functions, g is l.s.c. convex. Its biconjugate g is the Legendre-Fenchel conjugate (in the sense of section 1.5) of g, i.e. (1.93) g (x ) := sup x, x g (x). x X Let us call affine function a function X R of the form x x, x + α, with (x, α) X R; note that this excludes the affine functions of the form x

27 1. CONVEX FUNCTIONS 27 x, x + α, where x X \ X. We call affine minorants of g the affine functions that minorize g. By the same arguments as in section 1.5, we obtain that (1.94) g is the supremum of affine minorants of g, and so we get the following result: Lemma Let g : X R. We have that g = g iff g is a supremum of affine functions. Remark For any f : X R, the Fenchel conjugate f is a supremum of affine functions. It follows that (1.95) f = f. Example Remember the definition (1.6) of the indicatrix function. The support function σ K : X R (sometimes also denoted by σ(, K)) is defined by (1.96) σ K (x ) := sup{ x, x ; x K}. Clearly σ K is the conjugate of I K. If K is closed and convex, then I K is proper, l.s.c. and convex, and hence, is equal to its biconjugate, so that the conjugate of σ K is I K. Otherwise, let K be the smallest closed convex set containing K. It is easily checked that I K = σ K. Since I K is l.s.c. convex and proper it is equal to its biconjugate. We proved that (1.97) I K and σ K are conjugate one to each other Continuity and subdifferentiability. Let f : X R have a finite value at some x X. We define the subdifferential of f at x as the set (1.98) f(x) := {x X ; f(x ) f(x) + x, x x, for all x X}. Equivalently, f(x) is the set of slopes of affine minorants of f that are exact (i.e., equal to f) at the point x. The inequality in (1.98) may be written as (1.99) x, x f(x) x, x f(x ), for all x X. Therefore x f(x) iff x attains the maximum in the definition of f (x ), i.e., we have that (1.100) {x f(x)} {f (x ) + f(x) = x, x }. In other words, x f(x) iff it gives an equality in the Fenchel-Young inequality (1.79). By (1.85), f is the supremum of its affine minorants which are of the form x, x β, β f (x ), for x dom(f ). It follows that the affine minorants that are exact at x for f are those which attain the supremum in (1.85), i.e. (1.101) {x f (x)} {f (x ) + f (x) = x, x }. Also, if f(x), then the corresponding affine minorants, exact at x, are also minorants of f exact at x, and therefore (1.102) { f(x) } {f (x) = f(x)} { f (x) = f(x)}.

28 28 1. A CONVEX OPTIMIZATION TOOLBOX We also may define the subdifferential of a function g : X R with finite value at x as (1.103) g(x ) := {x X; g(y ) g(x ) + y x, x, for all y X }. Similarly to what was done before we can express the above inequality as (1.104) x, x g(x ) y, x g(y ), for all y X. This means that x g(x ) iff x attains the maximum in the definition of g (x), i.e., we have that (1.105) {x g(x )} {g(x ) + g (x) = x, x }. With similar arguments we obtain that (1.106) {x g (x )} {g (x ) + g (x) = x, x }. When g is itself a conjugate function we deduce the following. Lemma Let f : X R have a finite value at some x X. That the equality holds in the Fenchel-Young inequality (1.79) implies that x f (x ); the converse holds if f is proper, l.s.c. convex. Proof. If (1.79) holds with equality, we know that f(x) = f (x) and so, by (1.105) applied to g = f, x f (x ) holds. Conversely, if x f (x ), then by (1.105) applied to g = f, we have that f (x ) + f (x) = x, x. When f is proper, l.s.c. convex, f (x) = f(x), so that the equality holds in the Fenchel-Young inequality, as was to be proved. Remark So, if f is proper, l.s.c. convex, we have that (1.107) x f(x) iff x f (x ). In this sense the Fenchel Legendre transform is an extension of the property of the classical Legendre transform which, under certain conditions, associates with a smooth function f over R n another smooth function ˆf over R n such that y = f (x) iff x = ˆf (y). In the analysis of stochastic problems we will need the following sensitivity analysis results for linear programs. Example Given d R m, b R p and matrices A and M of size p m and p n resp., let f : R n R be defined by (1.108) f(x) := inf {dy; Ay = b + Mx}. y R m + This is the value of a linear program whose dual is (D x ) Max λ(b + Mx); d + λa 0. λ Rp The next lemma gives an expression of f(x). Lemma Let f have a finite value at x X. Then f( x) is non empty and satisfies (1.109) f( x) = { λm; λ S(D x )}.

29 Proof. The conjugate of f is 1. CONVEX FUNCTIONS 29 (1.110) f (x ) = sup x;y 0 {x x dy; Ay = b + Mx} = inf x;y 0 {dy x x; Ay = b + Mx}. Since f( x) is finite, the linear program involved in the above r.h.s. is feasible. By linear programmming duality (lemma 1.26) it has the same value as its dual, and hence, (1.111) f (x ) = sup λ R p { λb; x = λm; d + λa 0}. The Fenchel Young inequality implies (1.112) 0 f( x) + f (x ) x x = f( x) x x + inf λ R p {λb; x = λm; d + λa 0}. Since f( x) is the finite value of a feasible linear program, it is equal to val(d x ). So, let λ S(D x ). The Fenchel Young inequality (1.112) is equivalent to (1.113) λ(b + M x) x x + inf λ R p {λb; x = λm; d + λa 0}. When equality holds, the linear program on the r.h.s. has a solution say λ, and x x = λm x, so that the equality holds iff (1.114) λ(b + M x) = min λ R p {λ(b + M x); x = λm; d + λa 0}. Remember that this is the case of equality in the Fenchel Young inequality, and therefore it holds iff x f( x). Since the cost function and last constraint correspond to those of (D x ), it follows that any solution ˆλ of the linear program on the r.h.s. belongs to S(D x ). We have proved that, if x f(x), then x = λm for some λ S(D x ). The converse obviously holds in view of (1.114). Remark Consider the particular case when b = 0 and M is the opposite of the identity. Rewriting as b the variable x, we obtain that the function (1.115) f(b) := inf {dy; Ay + b = 0} y R m + has, over its domain, a subdifferential equal to the set of solutions of the dual problem (1.116) Max λb; d + λa 0. λ Rp We now show that, for convex functions, a local uniform upper bound implies a Lipschitz property as well as subdifferentiability. Lemma Let f : X R be convex, finitely valued at x 0, and uniformly upper bounded near x 0, i.e., for some a R and r > 0: (1.117) f(x) a whenever x x 0 r. Then f is Lipschitz of constant say L on B(x 0, 1 2 r).

30 30 1. A CONVEX OPTIMIZATION TOOLBOX Proof. Let ε ]0, 1[ and h X, h = εr. Since x 0 ± ε 1 h B(x 0, r), we have by convexity of f that It follows that f(x 0 + h) (1 ε)f(x 0 ) + εf(x 0 + ε 1 h) (1 ε)f(x 0 ) + εa, f(x 0 + h) (1 + ε)f(x 0 ) εf(x 0 ε 1 h) (1 + ε)f(x 0 ) εa. (1.118) f(x 0 + h) f(x 0 ) ε(a f(x 0 )) = r 1 (a f(x 0 )) h. Therefore, for all x B(x 0, r), we have that f(x) b, with b := f(x 0 ) (a f(x 0 )). Let x 1 B(x 0, r 1 ), where r 1 := 1 2 r. Then b f(x) a, for all x B(x 1, r 1 ). Applying (1.118) at the point x 1, with r = r 1, we get (1.119) f(x 1 + h) f(x 1 ) r 1 1 (a b) h, for all h < r 1. Therefore f is Lipschitz with constant r 1 1 (a b) over B(x 0, r 1 ), as was to be proved. Lemma Let f : X R be convex, and Lipschitz of constant L near x 0 X. Then f(x 0 ) is non empty and included in B(0, L). Proof. Let ˆx B(x 0, 1 2r). Set E = {(x, γ) X R; γ > f(x)}. Since f is continuous at x 0, for ε > 0 small enough, (1.120) B(x 0, ε) [f(x 0 ) + 1, ) E, so that E has a non empty interior. By theorem 1.12, there exists (λ, α) X R separating (ˆx, f(ˆx)) and E, i.e., such that (1.121) λ, ˆx + αf(ˆx) λ, x + αγ, for all x dom(f), γ > f(x). Taking x = x 0 and γ > f(x), γ +, we see that α 0. The separating hyperplane cannot be vertical, since ˆx int(dom(f)), so that we may take α = 1. Minimizing w.r.t. γ we obtain that f(x) f(ˆx) λ, x ˆx, proving that λ f(ˆx). We now check that f(x) B(0, L). Assume that x f(x), with x > L. Then there exists d X with d = 1 and x, d > L. Therefore by the definition of a subdifferential, f(x + σd) f(x) (1.122) lim x, d > L, σ 0 σ in contradiction with the fact that L is a local Lipschitz constant. Example Consider the entropy function f(x) = x log x if x 0 (with value 0 at zero), and f(x) = + otherwise. Then f is l.s.c. convex, and the subdifferential is empty at x = 0. So, in general, even in an Euclidean space, a l.s.c. convex function may have an empty subdifferential at some points of its domain. We next introduce a concept that in some sense is an algebraic variant of the interior. Definition Let S X, where X is a vector space. Then we say that x S belongs to the core of S and write x core(s) if, for each h X, there exists ε > 0 such that [x εh, x + εh] S.

Parcours OJD, Ecole Polytechnique et Université Pierre et Marie Curie 05 Mai 2015

Examen du cours Optimisation Stochastique Version 06/05/2014 Mastère de Mathématiques de la Modélisation F. Bonnans Parcours OJD, Ecole Polytechnique et Université Pierre et Marie Curie 05 Mai 2015 Authorized