Optimality Conditions for Nonsmooth Convex Optimization

Similar documents
Optimization and Optimal Control in Banach Spaces

Convex Functions. Pontus Giselsson

Chapter 2 Convex Analysis

POLARS AND DUAL CONES

Appendix B Convex analysis

Convex Analysis Background

Introduction to Convex and Quasiconvex Analysis

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Convex Analysis and Optimization Chapter 2 Solutions

Lecture 1: Background on Convex Analysis

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

4. Convex Sets and (Quasi-)Concave Functions

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Math 341: Convex Geometry. Xi Chen

Convex Analysis and Economic Theory AY Elementary properties of convex functions

LECTURE 12 LECTURE OUTLINE. Subgradients Fenchel inequality Sensitivity in constrained optimization Subdifferential calculus Optimality conditions

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

Chapter 2: Preliminaries and elements of convex analysis

A SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Subdifferential representation of convex functions: refinements and applications

Convex Analysis and Optimization Chapter 4 Solutions

Week 3: Faces of convex sets

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Introduction to Convex Analysis Microeconomics II - Tutoring Class

Constraint qualifications for convex inequality systems with applications in constrained optimization

LECTURE 4 LECTURE OUTLINE

In Progress: Summary of Notation and Basic Results Convex Analysis C&O 663, Fall 2009

Extended Monotropic Programming and Duality 1

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.

Helly's Theorem and its Equivalences via Convex Analysis

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

Handout 2: Elements of Convex Analysis

Convex Sets. Prof. Dan A. Simovici UMB

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Translative Sets and Functions and their Applications to Risk Measure Theory and Nonlinear Separation

Convex Analysis and Economic Theory Fall 2018 Winter 2019

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

Math 273a: Optimization Subgradients of convex functions

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Convex analysis and profit/cost/support functions

Epiconvergence and ε-subgradients of Convex Functions

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

BASICS OF CONVEX ANALYSIS

Refined optimality conditions for differences of convex functions

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

On the convexity of piecewise-defined functions

Dedicated to Michel Théra in honor of his 70th birthday

Math 273a: Optimization Subgradients of convex functions

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε

Chapter 1. Optimality Conditions: Unconstrained Optimization. 1.1 Differentiable Problems

Optimality, identifiability, and sensitivity

Rose-Hulman Undergraduate Mathematics Journal

Set, functions and Euclidean space. Seungjin Han

Optimality, identifiability, and sensitivity

Bodies of constant width in arbitrary dimension

FUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES. 1. Compact Sets

Subdifferentials of convex functions

Necessary and Sufficient Conditions for the Existence of a Global Maximum for Convex Functions in Reflexive Banach Spaces

Lecture 1: Convex Sets January 23

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Convex Optimization and an Introduction to Congestion Control. Lecture Notes. Fabian Wirth

Downloaded 09/27/13 to Redistribution subject to SIAM license or copyright; see

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions

Geometry and topology of continuous best and near best approximations

Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization. October 15, 2008

Identifying Active Constraints via Partial Smoothness and Prox-Regularity

Convex Analysis and Economic Theory Winter 2018

GEORGIA INSTITUTE OF TECHNOLOGY H. MILTON STEWART SCHOOL OF INDUSTRIAL AND SYSTEMS ENGINEERING LECTURE NOTES OPTIMIZATION III

The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1

Elements of Convex Optimization Theory

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

ON GENERALIZED-CONVEX CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION

Inequality Constraints

Boundary Behavior of Excess Demand Functions without the Strong Monotonicity Assumption

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013

A Dual Condition for the Convex Subdifferential Sum Formula with Applications

Sum of two maximal monotone operators in a general Banach space is maximal

Global Maximum of a Convex Function: Necessary and Sufficient Conditions

Journal of Inequalities in Pure and Applied Mathematics

Strongly convex functions, Moreau envelopes and the generic nature of convex functions with strong minimizers

Jensen s inequality for multivariate medians

Convex Sets and Minimal Sublinear Functions

Examples of Convex Functions and Classifications of Normed Spaces

Convexity, Duality, and Lagrange Multipliers

ON SPACE-FILLING CURVES AND THE HAHN-MAZURKIEWICZ THEOREM

SYMBOLIC CONVEX ANALYSIS

MAT-INF4110/MAT-INF9110 Mathematical optimization

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

Sufficiency of Cut-Generating Functions

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers.

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

A Characterization of Polyhedral Convex Sets

The local equicontinuity of a maximal monotone operator

Metric Spaces and Topology

Lecture 2: Convex Sets and Functions

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Transcription:

Optimality Conditions for Nonsmooth Convex Optimization Sangkyun Lee Oct 22, 2014 Let us consider a convex function f : R n R, where R is the extended real field, R := R {, + }, which is proper (f never admits and is not identically equal to + ). Convexity of extended-real valued functions are defined in the same way as the real-valued counterparts, but with additional operations defined on R (e.g. + α = for α > 0). This article discusses the optimality conditions of minimizing extended-real, nonsmooth convex functions. For the purpose we need two ingredients: (i) the continuity of convex functions and (ii) the existence of directional derivatives of convex functions. 1. Terminology A proper function f : R n R is convex if f(αx + (1 α)x ) αf(x) + (1 α)f(x ), x, x R n, α (0, 1) where the inequality is considered in R (defining additional necessary operations such as α =, + =,, etc.) The (effective) domain of a function f is dom f := {x R n : f(x) < + }, which is nonempty when f is proper. The relative interior of a convex set C R n is the interior of C for the topology relative to the affine hull of C, i.e. ri C = {x aff C : δ > 0 s.t. (aff C) B(x, δ) C}. Here B(x, δ) is a closed ball with radius δ centered at x. To see the reason why we need this notion, imagine a flat filled circle in R 3 as for our convex set C. The interior of C is empty, since C does not contain any 3-D ball centered at a point from C. The affine hull aff C is the affine plane containing C, relative to which we can consider another notion of interior using 2-D balls within the plane. When C, then ri C ([HUL96], Theorem 2.1.3). The epigraph of f is the set epi f := {(x, r) R n R : f(x) r}, which is nonempty if f is proper. f is convex if and only if epi f is a convex set in R n R. 2. Continuity of Convex Functions Lemma 1 ([HUL96], Lemma IV.3.1.1) For a proper convex function f, suppose that there exist x 0, δ, m, and M such that m f(x) M, x B(x 0, 2δ). 1

Then f is Lipschitz continuous on B(x 0, δ), that is, f(x ) f(x) M m x x x, x B(x 0, δ). δ Proof For two different points x, x in B(x 0, δ), take x = x + δ x x x x B(x 0, 2δ). This implies that x lies on the line segment [x, x ], x = x x δ δ + x x x + δ + x x x. Using the convexity of f and the given conditions, it leads to f(x ) f(x) x x δ + x x [f(x ) f(x)] x x (M m). δ A similar inequality can be obtained by exchanging x and x, to show the claim. Theorem 2 ([HUL96], Theorem IV.3.1.2, Modified) For a proper convex f, there exists a compact convex subset S of ri dom f such that with L = L(S) 0, f(x ) f(x) L x x x, x S. Proof First, note that ri dom f. Since the theorem statement ignores points outside of aff dom f, we can discuss in R d instead of in R n where d is the dimension of dom f 1. Then we may assume ri dom f = int dom f to simply the proof. Under the assumption, dom f (ri dom f without the assumption, for example) contains d + 1 affinely independent points v 0, v 1,..., v d. Consider a simplex defined as a convex hull of these points, := conv {v 0, v 1,..., v d } dom f, having x 0 in its interior. Then we can find δ > 0 such that B(x 0, 2δ). Then any x B(x 0, 2δ) can be written x = d i=0 α iv i for some α i 0, d i=0 α i = 1, so that the convexity of f gives f(x) d α i f(v i ) M := max{f(v 0 ),..., f(v d )}. i=0 On the other hand, from the convexity of f tells that for all y B(x 0, 2δ), f(x) f(y) + g(y) T (x y) m := min {f(z) + z B(x g(z)t (z x)}. 0,2δ) Note that the minimum exists since B(x 0, 2δ) is a compact set. The claim follows from Lemma 1. This result implies that a proper convex function f : R n R is continuous (even Lipschitz continuous) at x well inside of ri dom f. However, if x approaches the relative boundary rbd dom f := cl dom f \ ri dom f, continuity may break down 2. 1. The dimension of a set C is defined as the dimension of a subspace parallel to aff C. 2. Relative closure is just closure, since affine hull is always closed. If f is finite everywhere, the continuity at boundaries of the domain of f is easier to check using definitions, however it becomes nontrivial in R. 2

3. Directional Derivatives and Subdifferential 3.1 Subdifferential is Nonempty We define a vector g as a subgradient for a function f : R n R at x ri dom f if f(y) f(x) + g T (y x), y R n. The above is called the subgradient inequality. The set of all subgradients defines the subdifferential, that is, f(x) := {g R n : f(y) f(x) + g T (y x) y R n }. Theorem 3 ([HUL96], Proposition IV.1.2.1) For a proper convex function f, f(x) at any x ri dom f is nonempty. This theorem is intuitively understandable, but its proof requires the existence of separating hyperplanes for convex sets involving several theorems. We skip the proof here, but interested readers are advised to check the reference above. 3.2 Existence of Directional Derivatives For any function f : R n R, the one-sided directional derivative of f at x (where f(x) is finite) with respect to a direction d R n is defined as the limit f f(x + ɛd) f(x) (x; d) := lim. ɛ 0 ɛ if the limit exists (we allow + or as limits). Note that f (x; d) = lim ɛ 0 f(x + ɛd) f(x) ɛ Theorem 4 ([Roc97] Theorem 23.1) Let f be convex and be finite near x. For each direction d, the difference quotient in the definition of f (x; d) is nondecreasing function of ɛ > 0, so that f (x; d) exists and f (x; d) = inf ɛ>0 f(x + ɛd) f(x). ɛ Moreover, f (x; d) is a positively homogeneous convex function of d, with f (x; 0) = 0 and f (x; d) f (x; d) for all d. Proof We can express the difference quotient as ɛ 1 h(ɛd) for ɛ > 0, where h(d) = f(x + d) f(x). The convex set epi h is obtained by translating epi f so that (x, f(x)) is moved to (0, 0). On the other hand, ɛ 1 h(ɛd) = (hɛ 1 )(d), where hɛ 1 is defined as the convex function whose epigraph is ɛ 1 epi h. Since epi h contains the origin, ɛ 1 epi h increases as ɛ 1 increases. That is, for each d, the difference quotient (hɛ 1 )(d) decreases as ɛ decreases. Furthermore, (hɛ 1 )(d) is bounded below since f is convex, (hɛ 1 )(d) g T d for some subgradient g f(x). Then by the monotone convergence theorem, the limit f (x; d) exists and inf ɛ>0 (hɛ 1 )(d) = f (x; d). Furthermore, this indicates that f (x; d) is a finite-valued convex function in d. It is also positively homogeneous in d since for any α > 0, f f(x + ɛ(αd)) f(x) f(x + (ɛα)d) f(x) (x; αd) = inf = α inf = αf (x; d). ɛ 0 ɛ ɛα 0 ɛα 3

1 epi h epi h d h(d) (h 1 )(d) f (x; 0) = 0 is obvious from definition. From convexity, which gives the final claim. 1 2 f (x; d) + 1 2 f (x; d) f ( x; d 2 d ) = 0 2 3.3 An Alternative Definition of Subdifferential The previous theorem allows us to characterize a subdifferential using directional derivatives. Theorem 5 The subdifferential f(x) for x ri dom f has an equivalent definition, f(x) := {g R n : g T d f (x; d) d R n }. Proof For any direction d and ɛ > 0 such that y = x + ɛd R n, the subgradient inequality implies that g T f(x + ɛd) f(x) d. ɛ That is, g T f(x + ɛd) f(x) d inf = f (x; d). ɛ>0 ɛ The converse is trivial (d = y x and ɛ = 1). This reveals that the subdifferential f(x) is a compact convex set. 4. Optimality Conditions 4.1 Unconstrained Case Let us consider the optimization problem where f : R n R is proper and convex. Theorem 6 The following statements are equivalent: (i) x is a solution of (1). min f(x) (1) x Rn 4

(ii) 0 f(x ). (iii) f (x; d) 0 d R n. Proof [(ii) (i)] Suppose that g = 0 f(x ). Then the subgradient inequality f(x) f(x ) + g T (x x ) = f(x ), x dom (f), implies that x is a solution. [(i) (iii)] Suppose that x is a solution, i.e., f(x ) f(x) for all x dom (f). For any direction d R n such that x + ɛd dom f, every subgradient g f(x ) satisfies that g T f(x + ɛd) f(x ) d inf = f (x ; d) ɛ>0 ɛ where f (x ; d) 0 since x is a minimizer. [(iii) (ii)] The inequality h(g) = g T d f (x ; d) defines a halfspace for each d, which contains the origin since f (x ; d) 0. The subdifferential f(x ) := {g : g T d f (x ; d) d R n } is the intersection of all such halfspaces, and therefore should contain the origin. The above is my own proof. 4.2 Constrained Case Let us first define geometric objects useful for describing optimality conditions in this case. Definition 7 (Tangent Cone) Let X R n be a nonempty set. A direction d R n is a tangent direction to X at x X if there exists a sequence {x k } S and a scalar sequence {t k } such that x k x, t k 0, x k x t k d as k. The set of all such directions is called the tangent cone to X at x, denoted by T X (x). T X (x) is indeed a cone. T X (x) always contains the origin. T X (x) is always closed. If X is a convex set, T X (x) is also convex. Definition 8 (Normal Cone) Let X R n be a nonempty set. The direction s R n is called normal to X at x X if s, x x 0 x X. The set of all such directions is called the normal cone to X at x, denoted by N X (x). Definition 9 (Polar Cone) For a cone K R n, the polar cone of K is defined by It is easy to check the following: K = {s R n : s, d 0, d K}. 5

N X (x) = (T X (x)). T X (x) = (N X (x)) if X is convex 3. Definition 10 (Indicator Function) I S (x) : R n R {+ } is an indicator function for a nonempty set S R n such that { 0 if x S I S (x) := + o.w. Note that indicator functions are convex. Definition 11 (Support Function) Let S be a nonempty set in R n. The function σ S : R n R {+ } defined by σ S (x) = sup{ s, x : s S} is called the support function of S. Note that from the continuity and linearity of the function s, maximized over S, σ S = σ cl S = σ conv S. Proposition 12 ([HUL96], Example V.2.3.1) For a closed convex cone K R n, the support function σ K (d) becomes { 0 if s, d 0 for all s K σ K (d) = + o.w. That is, σ K (d) is the indicator function of the polar cone K. Since K = K, the support function of K is the indicator of K. Theorem 13 ([HUL96], Theorem V.3.3.3 (i)) Let σ 1 and σ 2 be the support functions of the nonempty closed convex sets S 1 and S 2. For positive scalars t 1 and t 2, t 1 σ 1 + t 2 σ 2 is the support function of cl (t 1 S 1 + t 2 S 2 ). Proof Let S = cl (t 1 S 1 + t 2 S 2 ) which is a closed convex set. By definition, its support function is σ S (d) = sup{ t 1 s 1 + t 2 s 2, d : s 1 S 1, s 2 S 2 }. In the above expression, s 1 and s 2 run independently in their index sets S 1 and S 2. Therefore, we have σ S (d) = t 1 sup s, d + t 2 sup s, d. s S 1 s S 2 The following separation theorem is also useful. Theorem 14 ([HUL96], Theorem III.4.1.1) Let C R n be nonempty closed convex, and let x / C. Then there exists s R n, s 0 such that s, x > sup s, y. y C 3. For a convex cone C, (C ) = cl C (the closure of C), and thus if C is convex and closed then (C ) = C (polar cone theorem). 6

Proof Let P C (x) be the Euclidean projection of x onto C, that is, P C (x) = arg min y C 1 2 y x 2 2. From the optimality condition of smooth convex minimization (see e.g. lecture notes of Numerical Optimization), P C (x) is the projection if and only if x P C (x), y P C (x) 0 y C. Let s := x P C (x) 0. The above inequality gives that 0 s, y x + s = s, y s, x + s 2. That is, s, x s 2 s, y y C. Now we consider the optimization over a nonempty convex set X R n, where f : R n R is convex on X. min x X f(x) (2) Theorem 15 ([HUL96], Theorem VII.1.1.1) The following statements are equivalent: (i) x is a solution of (2). (ii) f (x, x x ) 0 for all x X. (iii) f (x, d) 0 for all d T X (x ). (iv) 0 f(x ) + N X (x ). Proof [(i) (ii) (iii)] Choose a x X. By convexity, x + t(x x ) X for all t [0, 1]. If (i) is true, then (ii) follows since f(x + t(x x )) f(x) t 0 t (0, 1] and f (x, x x ) is the limit of this quantity as t 0. Letting d = x x and using positive homogeneity, we have f (x; d) 0 for all d = α(x x ) with x X and α > 0 (this objective is called the cone generated by X {x }). T X (x ) is the closure of this cone ([HUL96] Proposition III.5.2.1), and (iii) follows from the continuity of the finite convex function f (x ; ). [(iii) (i)] For any x X, we have x x T X (x ), and therefore (iii) implies that 0 f (x, x x ) f(x) f(x ) where the second inequality is due to Theorem 4. This implies (i). [(iii) (iv)] Since f (x; ) is finite everywhere (Theorem 4), we can rewrite (iii) as f (x ; d) + I TX (x )(d) 0 d R n. The indicator I TX (x )(d) is σ NX (x ), the support of the polar cone N X (x ) = (T X (x )). Also, f (x ; d) is the support of f(x ) by definition in Theorem 5. Using Theorem 13, (iii) is equivalent to the condition that 0 σ f(x )(d) + σ NX (x )(d) = σ f(x )+N X (x )(d) d R n. 7

The sum S of the compact convex set f(x ) and the closed convex set N X (x ) is a closed convex set. Suppose that 0 / S. By Theorem 14, then there exists a nonzero vector d 0 R n such that 0, d 0 > sup{ s, d 0 : s S} = σ S (d 0 ) This is a contradiction. Therefore 0 S = f(x ) + N X (x ). References [HUL96] Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal. Convex Analysis and Minimization Algorithms I. Springer-Verlag, 1996. [Roc97] R. T. Rockafellar. Convex Analysis. Princeton Landmarks in Mathematics and Physics. Princeton University Press, 1997. 8