Lecture 2: Convex functions - PDF Free Download

Lecture 2: Convex functions f : R n R is convex if dom f is convex and for all x, y dom f, θ [0, 1] f is concave if f is convex f(θx + (1 θ)y) θf(x) + (1 θ)f(y) x x convex concave neither x examples (on R) f(x) = x 2 is convex f(x) = log x is concave (dom f = R ++ ) f(x) = 1/x is convex (dom f = R ++ ) 1

Extended-valued extensions for f convex, it s convenient to define the extension f(x) = { f(x) x dom f + x dom f inequality f(θx + (1 θ)y) θ f(x) + (1 θ) f(y) holds for all x, y R n, 0 θ 1 (as an inequality in R {+ }) we ll use same symbol for f and its extension, i.e., we ll implicitly assume convex functions are extended 2

Epigraph & sublevel sets epigraph of a function f is epi f = {(x, t) x dom f, f(x) t } f(x) epi f x f convex function epi f convex set the (α-)sublevel set of f is C(α) = {x dom f f(x) α} f convex sublevel sets are convex (converse false) 3

Differentiable convex functions gradient of f : R n R f(x) = [ f x 1 f x 2 f xn ] T (evaluated at x) first order Taylor approximation at x 0 : first-order condition: for f differentiable, f(x) f(x 0 ) + f(x 0 ) T (x x 0 ) f is convex for all x, x 0 dom f, f(x) f(x) f(x 0 ) + f(x 0 ) T (x x 0 ) x 0 x i.e., 1st order approx. is a global underestimator f(x 0 ) + f(x 0 ) T (x x 0 ) 4

epigraph interpretation for all (x, t) epi f, [ f(x0 ) 1 ] T [ x x 0 t f(x 0 ) ] 0, i.e., ( f(x 0 ), 1) defines supporting hyperplane to epi f at (x 0, f(x 0 )) f(x) ( f(x 0 ), 1) 5

Hessian of a twice differentiable function: 2 f(x) = 2 f x 2 1 2 f x 2 x 1. 2 f xn x 1 2 f x 1 x 2 2 f x 2 2.... 2 f xn x 2 2 f x 1 xn 2 f x 2 xn. 2 f x 2 n (evaluated at x) 2nd order Taylor series expansion around x 0 : f(x) f(x 0 ) + f(x 0 ) T (x x 0 ) + 1 2 (x x 0) T 2 f(x 0 )(x x 0 ) second order condition: for f twice differentiable, f is convex for all x dom f, 2 f(x) 0 6

Simple examples linear and affine functions are convex and concave quadratic function f(x) = x T Px + 2q T x + r convex P 0; concave P 0 (P = P T ) any norm is convex examples on R: x α is convex on R ++ for α 1, α 0; concave for 0 α 1 log x is concave on R ++, x log x is convex on R + e αx is convex x, max(0, x), max(0, x) are convex log x e t2 dt is concave 7

Elementary properties a function is convex iff it is convex on all lines: f convex f(x 0 + th) convex in t for all x 0, h positive multiple of convex function is convex: f convex, α 0 = αf convex sum of convex functions is convex: f 1, f 2 convex = f 1 + f 2 convex extends to infinite sums, integrals: g(x, y) convex in x = g(x, y)dy convex 8

pointwise maximum: f 1, f 2 convex = max{f 1 (x), f 2 (x)} convex (corresponds to intersection of epigraphs) f 2 (x) epimax{f 1, f 2 } f 1 (x) pointwise supremum: f α convex = sup f α convex α A affine transformation of domain f convex = f(ax + b) convex x 9

More examples piecewise-linear functions: f(x) = max i {a T i x + b i} is convex in x (epi f is polyhedron) max distance to any set, sup s S x s, is convex in x f(x) = x [1] + x [2] + x [3] is convex on R n (x [i] is the ith largest x j ) f(x) = ( i x i) 1/n is concave on R n + f(x) = m i=1 log(b i a T i x) 1 is convex (dom f = {x a T i x < b i, i = 1,..., m}) least-squares cost as functions of weights, is concave in w f(w) = inf x i w i (a T i x b i) 2, 10

Convex functions of matrices Tr A T X = i,j A ijx ij is linear in X on R n n log det X 1 is convex on {X S n X 0} proof: let λ i be the eigenvalues of X 1/2 0 HX 1/2 0 f(t) = log det(x 0 + th) 1 = log det X 1 0 + log det(i + tx 1/2 0 HX 1/2 0 ) 1 = log det X 1 0 i log(1 + tλ i ) is a convex function of t (det X) 1/n is concave on {X S n X 0} λ max (X) is convex on S n. proof: λ max (X) = sup y 2 =1 y T Xy X 2 = σ 1 (X) = (λ max (X T X)) 1/2 is convex on R m n proof: X 2 = sup u 2 =1 Xu 2 11

Minimizing over some variables if h(x, y) is convex in x and y, then f(x) = inf y h(x, y) is convex in x corresponds to projection of epigraph, (x, y, t) (x, t) h(x, y) f(x) y x 12

examples if S R n is convex then (min) distance to S, is convex in x if g is convex, then dist(x, S) = inf x s s S f(y) = inf{g(x) Ax = y} is convex in y proof: (assume A R m n has rank m) find B s.t. R(B) = N(A); then Ax = y iff for some z, and hence x = A T (AA T ) 1 y + Bz f(y) = inf z g(at (AA T ) 1 y + Bz) 13

Composition one-dimensional case f(x) = h(g(x)) (g : R n R, h : R R) is convex if g convex; h convex, nondecreasing g concave; h convex, nonincreasing proof: (differentiable functions, x R) examples f = h (g ) 2 + g h f(x) = exp g(x) is convex if g is convex f(x) = 1/g(x) is convex if g is concave, positive f(x) = g(x) p, p 1, is convex if g(x) convex, positive f(x) = i log( f i(x)) is convex on {x f i (x) < 0} if f i are convex 14

Composition k-dimensional case f(x) = h(g 1 (x),..., g k (x)) with h : R k R, g i : R n R is convex if h convex, nondecreasing in each arg.; g i convex h convex, nonincreasing in each arg.; g i concave etc. proof: (differentiable functions, n = 1) examples f = h T g 1. g k + g 1. g k T 2 h g 1. g k f(x) = max i g i (x) is convex if each g i is f(x) = log i exp g i(x) is convex if each g i is 15

Jensen s inequality f : R n R convex two points: θ 1 + θ 2 = 1, θ i 0 = f(θ 1 x 1 + θ 2 x 2 ) θ 1 f(x 1 ) + θ 2 f(x 2 ) more than two points: i θ i = 1, θ i 0 = f( i θ ix i ) i θ if(x i ) continuous version: p(x) 0, p(x) dx = 1 = f( xp(x) dx) f(x)p(x) dx most general form: for any prob. distr. on x, f(e x) E f(x) these are all called Jensen s inequality 16

interpretation of Jensen s inequality: (zero mean) randomization, dithering increases average value of a convex function many (some people claim most) inequalities can be derived from Jensen s inequality example: arithmetic-geometric mean inequality a, b 0 ab (a + b)/2 proof: f(x) = log x is concave on {x x > 0}, so for a, b > 0, ( ) 1 a + b (log a + log b) log 2 2 17

Conjugate functions the conjugate function of f : R n R is f (y) = sup x dom f ( ) y T x f(x) y T x f(x) f is convex (even if f isn t) f (y) will be useful later x f (y) 18

Examples f(x) = log x (dom f = {x x > 0}): f (y) = sup(xy + log x) x>0 = { 1 log( y) if y < 0 + otherwise f(x) = x T Px (P 0): f (y) = sup(y T x x T Px) = 1 x 4 yt P 1 y 19

Quasiconvex functions f : R n R is quasiconvex if every sublevel set is convex y S α = {x dom f f(x) α} f(x) α x S α x can have locally flat regions f is quasiconcave if f is quasiconvex, i.e., superlevel sets {x f(x) α} are convex a function which is both quasiconvex and quasiconcave is called quasilinear f convex (concave) f quasiconvex (quasiconcave) 20

Examples f(x) = x is quasiconvex on R f(x) = log x is quasilinear on R + linear fractional function, f(x) = at x + b c T x + d is quasilinear on the halfspace c T x + d > 0 f(x) = x a 2 x b 2 is quasiconvex on the halfspace {x x a 2 x b 2 } f(a) = degree(a 0 + a 1 t + + a k t k ) on R k+1 21

Properties f is quasiconvex if and only if it is quasiconvex on lines, i.e., f(x 0 + th) quasiconvex in t for all x 0, h modified Jensen s inequality: f is quasiconvex iff for all x, y dom f, θ [0, 1], f(θx + (1 θ)y) max{f(x), f(y)} f(x) x y 22

for f differentiable, f quasiconvex for all x, y dom f f(y) f(x) (y x) T f(x) 0 S α1 x f(x) S α2 S α3 α 1 < α 2 < α 3 positive multiples f quasiconvex, α 0 = αf quasiconvex 23

pointwise maximum f 1, f 2 quasiconvex = max{f 1, f 2 } quasiconvex (extends to supremum over arbitrary set) affine transformation of domain f quasiconvex = f(ax + b) quasiconvex linear-fractional transformation of domain ( ) Ax + b f quasiconvex = f c T x + d on c T x + d > 0 composition with monotone increasing function quasiconvex f quasiconvex, g monotone increasing = g(f(x)) quasiconvex sums of quasiconvex functions are not quasiconvex in general f quasiconvex in x, y = g(x) = inf y f(x, y) quasiconvex in x 24

Nested sets characterization f quasiconvex sublevel sets S α are convex, nested, i.e., α 1 α 2 S α1 S α2 converse: if T α is a nested family of convex sets, then f(x) = inf{α x T α } is quasiconvex. engineering interpretation: T α are specs, tighter for smaller α 25

Examples FIR filter: H(ω) = a 0 + N k=1 a k cos kω 0 db 3 db H(ω) H(0) 50 db π f(a) f(a) π 3dB-bandwidth f(a) = inf {ω > 0 20 log 10 ( H(ω) / H(0) ) 3.0} is a quasiconcave function on {a R N+1 H(0) > 0} why? for H(0) > 0, f(a) ω 0 H(ω) > H(0)/ 2 for 0 ω < ω 0... an (infinite) intersection of halfspaces 26

electron-beam lithography E [0, 1] [0, 1]: desired exposure region E c = [0, 1] [0, 1]\E: desired non-exposure region E E c I(p): e-beam intensity at position p [0, 1] [0, 1] I(p) = i x i g(p p i ), i = 1,..., N x i : intensity of electron beam directed at pixel i g(p): given (point-spread) function 27

pattern transition width define φ(x) as minimum α s.t. I(p) 0.9 I(p) 0.1 for dist(p, E c ) α for dist(p, E) α 2φ(x) 2φ(x) dist(p, E c ) α 0.9 transition region dist(p, E) α 0.1 0 E c E E c φ(x) is quasiconvex 28

Log-concave functions f : R n R + is log-concave (log-convex) if log f is concave (convex) log-convex convex; concave log-concave examples normal density, f(x) = e (1/2)(x x 0 )T Σ 1 (x x 0 ) erfc, f(x) = 2 π x e t2 dt indicator function of convex set C: I C (x) = { 1 x C 0 x C 29

Properties sum of log-concave functions not always log-concave (but sum of log-convex functions is log-convex) products f, g log-concave = fg log-concave (immediate) integrals f(x, y) log-concave in x, y = f(x, y)dy log-concave (not easy to show!) convolutions f, g log-concave = f(x y)g(y)dy log-concave (immediate from the properties above) 30

Log-concave probability densities many common probability density functions are log-concave normal (Σ 0) f(x) = 1 (2π)n det Σ e 1 2 (x x)t Σ 1 (x x) exponential (λ i > 0) f(x) = ( n λ i ) e (λ 1 x 1 + +λ nxn), x R n + i=1 uniform distribution on convex (bounded) set C f(x) = { 1/α x C 0 x C where α is Lebesgue measure of C (i.e., length, area, volume... ) 31

Example: manufacturing yield x manu = x + v x R n : nominal value of design parameters v R n : manufacturing errors; zero mean random variable S R n : specs, i.e., acceptable values of x manu the yield Y (x) = Prob(x + v S) is log-concave if S is a convex set the probability density of v is log-concave 10% 20% 30% 40% 60% 50% 80% 70% 32

example S = {x R 2 x 1 1, x 2 1} v 1, v 2 : independent, normal with σ = 1 yield(x) = Prob(x + v S) = 1 2π ( 1 x 1 e t 2 /2 dt) ( e t 2 /2 dt 1 x 2 ) 10% 30% 50% 3 2.5 70% 90% 95% S 99% 2 x 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 x 1 33

example (continued): max yield vs. cost manufacturing cost c = x 1 + 2x 2 ; max yield for given cost is Y opt (c) = sup x 1 + 2x 2 = c x 1, x 2 0 Y (x) Y opt is log-concave 100% 10% log Y opt (c) = inf x 1 + 2x 2 = c x 1, x 2 0 log Y (x 1, x 2 ) 1% 0.1% 0.01% 0 1 2 3 4 5 6 cost c 34

K-convexity cvx. cone K R m induces generalized inequality K f : R n R m is K-convex if 0 θ 1 = f(θx + (1 θ)y) K θf(x) + (1 θ)f(y) example. K is PSD cone (called matrix convexity). f(x) = X 2 is K-convex on S m let s show that for θ [0, 1], (θx + (1 θ)y ) 2 θx 2 + (1 θ)y 2 (1) for any u R m, u T X 2 u = Xu 2 2 is a (quadratic) convex fct of X, so which implies (1) u T (θx + (1 θ)y ) 2 u θu T X 2 u + (1 θ)u T Y 2 u 35