Lecture 2: Convex functions

Similar documents
3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Conve functions 2{2 Conve functions f : R n! R is conve if dom f is conve and y 2 dom f 2 [0 1] + f ( + (1 ; )y) f () + (1 ; )f (y) (1) f is concave i

Convex Functions. Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK)

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex functions. Definition. f : R n! R is convex if dom f is a convex set and. f ( x +(1 )y) < f (x)+(1 )f (y) f ( x +(1 )y) apple f (x)+(1 )f (y)

A function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Optimisation convexe: performance, complexité et applications.

Convex Functions. Pontus Giselsson

A Brief Review on Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Nonlinear Programming Models

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

CS675: Convex and Combinatorial Optimization Fall 2016 Convex Optimization Problems. Instructor: Shaddin Dughmi

4. Convex optimization problems

Lecture 1: Background on Convex Analysis

Constrained Optimization and Lagrangian Duality

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Lecture: Convex Optimization Problems

Convex Functions and Optimization

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

IE 521 Convex Optimization

Convex Optimization in Communications and Signal Processing

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

The proximal mapping

Lecture 1: January 12

Lecture 2: Convex Sets and Functions

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Convex optimization problems. Optimization problem in standard form

EC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1

4. Convex optimization problems

Optimization and Optimal Control in Banach Spaces

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

Convex Optimization. Convex Analysis - Functions

Static Problem Set 2 Solutions

Linear and non-linear programming

Concave and Convex Functions 1

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

CS-E4830 Kernel Methods in Machine Learning

Lecture 4: Convex Functions, Part I February 1

8. Conjugate functions

Convex Optimization & Lagrange Duality

Concave and Convex Functions 1

4. Convex optimization problems (part 1: general)

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Optimality Conditions for Nonsmooth Convex Optimization

Integral Jensen inequality

Convex Analysis and Optimization Chapter 2 Solutions

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013

Exercises. Exercises. Basic terminology and optimality conditions. 4.2 Consider the optimization problem

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Mathematics 530. Practice Problems. n + 1 }

Convex Optimization and Modeling

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Handout 2: Elements of Convex Analysis

Definition of convex function in a vector space

Chapter 3a Topics in differentiation. Problems in differentiation. Problems in differentiation. LC Abueg: mathematical economics

Convex Optimization. Lieven Vandenberghe Electrical Engineering Department, UCLA. Joint work with Stephen Boyd, Stanford University

Chapter 13. Convex and Concave. Josef Leydold Mathematical Methods WS 2018/19 13 Convex and Concave 1 / 44

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

L. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones. definition. spectral decomposition. quadratic representation. log-det barrier 18-1

Solution to EE 617 Mid-Term Exam, Fall November 2, 2017

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

Analysis and Linear Algebra. Lectures 1-3 on the mathematical tools that will be used in C103

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

General Notation. Exercises and Problems

Lecture 8: Basic convex analysis

Convex Optimization for Signal Processing and Communications: From Fundamentals to Applications

Lecture 4: Linear and quadratic problems

Descent methods. min x. f(x)

EE514A Information Theory I Fall 2013

subject to (x 2)(x 4) u,

Convex Optimization M2

EE Applications of Convex Optimization in Signal Processing and Communications Dr. Andre Tkacenko, JPL Third Term

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

5. Duality. Lagrangian

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS

Minimizing Cubic and Homogeneous Polynomials over Integers in the Plane

2) Let X be a compact space. Prove that the space C(X) of continuous real-valued functions is a complete metric space.

Power series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) a n z n. n=0

1 Introduction to Optimization

Multivariable Calculus

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

Transcription:

Lecture 2: Convex functions f : R n R is convex if dom f is convex and for all x, y dom f, θ [0, 1] f is concave if f is convex f(θx + (1 θ)y) θf(x) + (1 θ)f(y) x x convex concave neither x examples (on R) f(x) = x 2 is convex f(x) = log x is concave (dom f = R ++ ) f(x) = 1/x is convex (dom f = R ++ ) 1

Extended-valued extensions for f convex, it s convenient to define the extension f(x) = { f(x) x dom f + x dom f inequality f(θx + (1 θ)y) θ f(x) + (1 θ) f(y) holds for all x, y R n, 0 θ 1 (as an inequality in R {+ }) we ll use same symbol for f and its extension, i.e., we ll implicitly assume convex functions are extended 2

Epigraph & sublevel sets epigraph of a function f is epi f = {(x, t) x dom f, f(x) t } f(x) epi f x f convex function epi f convex set the (α-)sublevel set of f is C(α) = {x dom f f(x) α} f convex sublevel sets are convex (converse false) 3

Differentiable convex functions gradient of f : R n R f(x) = [ f x 1 f x 2 f xn ] T (evaluated at x) first order Taylor approximation at x 0 : first-order condition: for f differentiable, f(x) f(x 0 ) + f(x 0 ) T (x x 0 ) f is convex for all x, x 0 dom f, f(x) f(x) f(x 0 ) + f(x 0 ) T (x x 0 ) x 0 x i.e., 1st order approx. is a global underestimator f(x 0 ) + f(x 0 ) T (x x 0 ) 4

epigraph interpretation for all (x, t) epi f, [ f(x0 ) 1 ] T [ x x 0 t f(x 0 ) ] 0, i.e., ( f(x 0 ), 1) defines supporting hyperplane to epi f at (x 0, f(x 0 )) f(x) ( f(x 0 ), 1) 5

Hessian of a twice differentiable function: 2 f(x) = 2 f x 2 1 2 f x 2 x 1. 2 f xn x 1 2 f x 1 x 2 2 f x 2 2.... 2 f xn x 2 2 f x 1 xn 2 f x 2 xn. 2 f x 2 n (evaluated at x) 2nd order Taylor series expansion around x 0 : f(x) f(x 0 ) + f(x 0 ) T (x x 0 ) + 1 2 (x x 0) T 2 f(x 0 )(x x 0 ) second order condition: for f twice differentiable, f is convex for all x dom f, 2 f(x) 0 6

Simple examples linear and affine functions are convex and concave quadratic function f(x) = x T Px + 2q T x + r convex P 0; concave P 0 (P = P T ) any norm is convex examples on R: x α is convex on R ++ for α 1, α 0; concave for 0 α 1 log x is concave on R ++, x log x is convex on R + e αx is convex x, max(0, x), max(0, x) are convex log x e t2 dt is concave 7

Elementary properties a function is convex iff it is convex on all lines: f convex f(x 0 + th) convex in t for all x 0, h positive multiple of convex function is convex: f convex, α 0 = αf convex sum of convex functions is convex: f 1, f 2 convex = f 1 + f 2 convex extends to infinite sums, integrals: g(x, y) convex in x = g(x, y)dy convex 8

pointwise maximum: f 1, f 2 convex = max{f 1 (x), f 2 (x)} convex (corresponds to intersection of epigraphs) f 2 (x) epimax{f 1, f 2 } f 1 (x) pointwise supremum: f α convex = sup f α convex α A affine transformation of domain f convex = f(ax + b) convex x 9

More examples piecewise-linear functions: f(x) = max i {a T i x + b i} is convex in x (epi f is polyhedron) max distance to any set, sup s S x s, is convex in x f(x) = x [1] + x [2] + x [3] is convex on R n (x [i] is the ith largest x j ) f(x) = ( i x i) 1/n is concave on R n + f(x) = m i=1 log(b i a T i x) 1 is convex (dom f = {x a T i x < b i, i = 1,..., m}) least-squares cost as functions of weights, is concave in w f(w) = inf x i w i (a T i x b i) 2, 10

Convex functions of matrices Tr A T X = i,j A ijx ij is linear in X on R n n log det X 1 is convex on {X S n X 0} proof: let λ i be the eigenvalues of X 1/2 0 HX 1/2 0 f(t) = log det(x 0 + th) 1 = log det X 1 0 + log det(i + tx 1/2 0 HX 1/2 0 ) 1 = log det X 1 0 i log(1 + tλ i ) is a convex function of t (det X) 1/n is concave on {X S n X 0} λ max (X) is convex on S n. proof: λ max (X) = sup y 2 =1 y T Xy X 2 = σ 1 (X) = (λ max (X T X)) 1/2 is convex on R m n proof: X 2 = sup u 2 =1 Xu 2 11

Minimizing over some variables if h(x, y) is convex in x and y, then f(x) = inf y h(x, y) is convex in x corresponds to projection of epigraph, (x, y, t) (x, t) h(x, y) f(x) y x 12

examples if S R n is convex then (min) distance to S, is convex in x if g is convex, then dist(x, S) = inf x s s S f(y) = inf{g(x) Ax = y} is convex in y proof: (assume A R m n has rank m) find B s.t. R(B) = N(A); then Ax = y iff for some z, and hence x = A T (AA T ) 1 y + Bz f(y) = inf z g(at (AA T ) 1 y + Bz) 13

Composition one-dimensional case f(x) = h(g(x)) (g : R n R, h : R R) is convex if g convex; h convex, nondecreasing g concave; h convex, nonincreasing proof: (differentiable functions, x R) examples f = h (g ) 2 + g h f(x) = exp g(x) is convex if g is convex f(x) = 1/g(x) is convex if g is concave, positive f(x) = g(x) p, p 1, is convex if g(x) convex, positive f(x) = i log( f i(x)) is convex on {x f i (x) < 0} if f i are convex 14

Composition k-dimensional case f(x) = h(g 1 (x),..., g k (x)) with h : R k R, g i : R n R is convex if h convex, nondecreasing in each arg.; g i convex h convex, nonincreasing in each arg.; g i concave etc. proof: (differentiable functions, n = 1) examples f = h T g 1. g k + g 1. g k T 2 h g 1. g k f(x) = max i g i (x) is convex if each g i is f(x) = log i exp g i(x) is convex if each g i is 15

Jensen s inequality f : R n R convex two points: θ 1 + θ 2 = 1, θ i 0 = f(θ 1 x 1 + θ 2 x 2 ) θ 1 f(x 1 ) + θ 2 f(x 2 ) more than two points: i θ i = 1, θ i 0 = f( i θ ix i ) i θ if(x i ) continuous version: p(x) 0, p(x) dx = 1 = f( xp(x) dx) f(x)p(x) dx most general form: for any prob. distr. on x, f(e x) E f(x) these are all called Jensen s inequality 16

interpretation of Jensen s inequality: (zero mean) randomization, dithering increases average value of a convex function many (some people claim most) inequalities can be derived from Jensen s inequality example: arithmetic-geometric mean inequality a, b 0 ab (a + b)/2 proof: f(x) = log x is concave on {x x > 0}, so for a, b > 0, ( ) 1 a + b (log a + log b) log 2 2 17

Conjugate functions the conjugate function of f : R n R is f (y) = sup x dom f ( ) y T x f(x) y T x f(x) f is convex (even if f isn t) f (y) will be useful later x f (y) 18

Examples f(x) = log x (dom f = {x x > 0}): f (y) = sup(xy + log x) x>0 = { 1 log( y) if y < 0 + otherwise f(x) = x T Px (P 0): f (y) = sup(y T x x T Px) = 1 x 4 yt P 1 y 19

Quasiconvex functions f : R n R is quasiconvex if every sublevel set is convex y S α = {x dom f f(x) α} f(x) α x S α x can have locally flat regions f is quasiconcave if f is quasiconvex, i.e., superlevel sets {x f(x) α} are convex a function which is both quasiconvex and quasiconcave is called quasilinear f convex (concave) f quasiconvex (quasiconcave) 20

Examples f(x) = x is quasiconvex on R f(x) = log x is quasilinear on R + linear fractional function, f(x) = at x + b c T x + d is quasilinear on the halfspace c T x + d > 0 f(x) = x a 2 x b 2 is quasiconvex on the halfspace {x x a 2 x b 2 } f(a) = degree(a 0 + a 1 t + + a k t k ) on R k+1 21

Properties f is quasiconvex if and only if it is quasiconvex on lines, i.e., f(x 0 + th) quasiconvex in t for all x 0, h modified Jensen s inequality: f is quasiconvex iff for all x, y dom f, θ [0, 1], f(θx + (1 θ)y) max{f(x), f(y)} f(x) x y 22

for f differentiable, f quasiconvex for all x, y dom f f(y) f(x) (y x) T f(x) 0 S α1 x f(x) S α2 S α3 α 1 < α 2 < α 3 positive multiples f quasiconvex, α 0 = αf quasiconvex 23

pointwise maximum f 1, f 2 quasiconvex = max{f 1, f 2 } quasiconvex (extends to supremum over arbitrary set) affine transformation of domain f quasiconvex = f(ax + b) quasiconvex linear-fractional transformation of domain ( ) Ax + b f quasiconvex = f c T x + d on c T x + d > 0 composition with monotone increasing function quasiconvex f quasiconvex, g monotone increasing = g(f(x)) quasiconvex sums of quasiconvex functions are not quasiconvex in general f quasiconvex in x, y = g(x) = inf y f(x, y) quasiconvex in x 24

Nested sets characterization f quasiconvex sublevel sets S α are convex, nested, i.e., α 1 α 2 S α1 S α2 converse: if T α is a nested family of convex sets, then f(x) = inf{α x T α } is quasiconvex. engineering interpretation: T α are specs, tighter for smaller α 25

Examples FIR filter: H(ω) = a 0 + N k=1 a k cos kω 0 db 3 db H(ω) H(0) 50 db π f(a) f(a) π 3dB-bandwidth f(a) = inf {ω > 0 20 log 10 ( H(ω) / H(0) ) 3.0} is a quasiconcave function on {a R N+1 H(0) > 0} why? for H(0) > 0, f(a) ω 0 H(ω) > H(0)/ 2 for 0 ω < ω 0... an (infinite) intersection of halfspaces 26

electron-beam lithography E [0, 1] [0, 1]: desired exposure region E c = [0, 1] [0, 1]\E: desired non-exposure region E E c I(p): e-beam intensity at position p [0, 1] [0, 1] I(p) = i x i g(p p i ), i = 1,..., N x i : intensity of electron beam directed at pixel i g(p): given (point-spread) function 27

pattern transition width define φ(x) as minimum α s.t. I(p) 0.9 I(p) 0.1 for dist(p, E c ) α for dist(p, E) α 2φ(x) 2φ(x) dist(p, E c ) α 0.9 transition region dist(p, E) α 0.1 0 E c E E c φ(x) is quasiconvex 28

Log-concave functions f : R n R + is log-concave (log-convex) if log f is concave (convex) log-convex convex; concave log-concave examples normal density, f(x) = e (1/2)(x x 0 )T Σ 1 (x x 0 ) erfc, f(x) = 2 π x e t2 dt indicator function of convex set C: I C (x) = { 1 x C 0 x C 29

Properties sum of log-concave functions not always log-concave (but sum of log-convex functions is log-convex) products f, g log-concave = fg log-concave (immediate) integrals f(x, y) log-concave in x, y = f(x, y)dy log-concave (not easy to show!) convolutions f, g log-concave = f(x y)g(y)dy log-concave (immediate from the properties above) 30

Log-concave probability densities many common probability density functions are log-concave normal (Σ 0) f(x) = 1 (2π)n det Σ e 1 2 (x x)t Σ 1 (x x) exponential (λ i > 0) f(x) = ( n λ i ) e (λ 1 x 1 + +λ nxn), x R n + i=1 uniform distribution on convex (bounded) set C f(x) = { 1/α x C 0 x C where α is Lebesgue measure of C (i.e., length, area, volume... ) 31

Example: manufacturing yield x manu = x + v x R n : nominal value of design parameters v R n : manufacturing errors; zero mean random variable S R n : specs, i.e., acceptable values of x manu the yield Y (x) = Prob(x + v S) is log-concave if S is a convex set the probability density of v is log-concave 10% 20% 30% 40% 60% 50% 80% 70% 32

example S = {x R 2 x 1 1, x 2 1} v 1, v 2 : independent, normal with σ = 1 yield(x) = Prob(x + v S) = 1 2π ( 1 x 1 e t 2 /2 dt) ( e t 2 /2 dt 1 x 2 ) 10% 30% 50% 3 2.5 70% 90% 95% S 99% 2 x 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 x 1 33

example (continued): max yield vs. cost manufacturing cost c = x 1 + 2x 2 ; max yield for given cost is Y opt (c) = sup x 1 + 2x 2 = c x 1, x 2 0 Y (x) Y opt is log-concave 100% 10% log Y opt (c) = inf x 1 + 2x 2 = c x 1, x 2 0 log Y (x 1, x 2 ) 1% 0.1% 0.01% 0 1 2 3 4 5 6 cost c 34

K-convexity cvx. cone K R m induces generalized inequality K f : R n R m is K-convex if 0 θ 1 = f(θx + (1 θ)y) K θf(x) + (1 θ)f(y) example. K is PSD cone (called matrix convexity). f(x) = X 2 is K-convex on S m let s show that for θ [0, 1], (θx + (1 θ)y ) 2 θx 2 + (1 θ)y 2 (1) for any u R m, u T X 2 u = Xu 2 2 is a (quadratic) convex fct of X, so which implies (1) u T (θx + (1 θ)y ) 2 u θu T X 2 u + (1 θ)u T Y 2 u 35