Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, PDF Free Download

Convex Optimization (EE227A: UC Berkeley) Lecture 4 (Conjugates, subdifferentials) 31 Jan, 2013 Suvrit Sra

Organizational HW1 due: 14th Feb 2013 in class. Please L A TEX your solutions (contact TA if this is an issue) Discussion with classmates is ok Each person must submit his/her individual solutions Acknowledge any help you receive Do not copy! Make sure you understand the solution you submit Cite any source that you use Have fun solving problems! 2 / 30

Recap Eigenvalues, singular values, positive definiteness Convex sets, θ 1 x + θ 2 y C, θ 1 + θ 2 = 1, θ i 0 Convex functions, midpoint convex, recognizing convexity Norms, mixed-norms, matrix norms, dual norms Indicator, distance function, minimum of jointly convex Brief mention of other forms of convexity f ( x+y 2 ) 1 2 [f(x) + f(y)] + continuity = f is cvx 2 f(x) 0 implies f is cvx. 3 / 30

Fenchel Conjugate 4 / 30

Dual norms Def. Let be a norm on R n. Its dual norm is u := sup { u T x x 1 }. 5 / 30

Dual norms Def. Let be a norm on R n. Its dual norm is u := sup { u T x x 1 }. Exercise: Verify that we may write u = sup x 0 u T x x 5 / 30

Dual norms Def. Let be a norm on R n. Its dual norm is u := sup { u T x x 1 }. Exercise: Verify that we may write u = sup x 0 u T x x Exercise: Verify that u is a norm. 5 / 30

Dual norms Def. Let be a norm on R n. Its dual norm is u := sup { u T x x 1 }. Exercise: Verify that we may write u = sup x 0 u T x x Exercise: Verify that u is a norm. u + v = sup { (u + v) T x x 1 } But sup (A + B) sup A + sup B Exercise: Let 1/p + 1/q = 1, where p, q 1. Show that q is dual to p. In particular, the l 2 -norm is self-dual. 5 / 30

Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). 6 / 30

Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! 6 / 30

Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 f (z) = sup x x T z f(x). Rewrite x = αu, and let α 6 / 30

Fenchel conjugate Example f(x) = ax + b; then, f (z) = sup zx (ax + b) x 7 / 30

Fenchel conjugate Example f(x) = ax + b; then, f (z) = sup zx (ax + b) x =, if (z a) 0. 7 / 30

Fenchel conjugate Example f(x) = ax + b; then, f (z) = sup zx (ax + b) x =, if (z a) 0. Thus, dom f = {a}, and f (a) = b. 7 / 30

Fenchel conjugate Example f(x) = ax + b; then, f (z) = sup zx (ax + b) x =, if (z a) 0. Thus, dom f = {a}, and f (a) = b. Example Let a 0, and set f(x) = a 2 x 2 if x a, and + otherwise. Then, f (z) = a 1 + z 2. Example f(x) = 1 2 xt Ax, where A 0. Then, f (z) = 1 2 zt A 1 z. Exercise: If f(x) = max(0, 1 x), then dom f is [ 1, 0], and within this domain, f (z) = z. Hint: Analyze cases: max(0, 1 x) = 0; and max(0, 1 x) = 1 x 7 / 30

Subdifferentials 8 / 30

First order global underestimator f(y) f(x) f(y) + f(y), x y y x f(x) f(y) + f(y), x y 9 / 30

First order global underestimator f(x) g 1 f(y) f(y) + g 1, x y g 2 g 3 y f(x) f(y) + g, x y 9 / 30

Subgradients g 2 g 3 f(x) g 1 f(y) f(y) + g 1, x y g 1, g 2, g 3 are subgradients at y y 10 / 30

Subgradients basic facts f is convex, differentiable: f(y) the unique subgradient at y A vector g is a subgradient at a point y if and only if f(y) + g, x y is globally smaller than f(x). Usually, one subgradient costs approx. as much as f(x) Determining all subgradients at a given point difficult. Subgradient calculus a great achievement in convex analysis 11 / 30

Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable 12 / 30

Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f 1 (x) 12 / 30

Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f 1 (x) f 2 (x) 12 / 30

Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f(x) f 1 (x) f 2 (x) 12 / 30

Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f(x) f 1 (x) f 2 (x) y 12 / 30

Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f(x) f 1 (x) f 2 (x) f 1 (x) > f 2 (x): unique subgradient of f is f 1 (x) y 12 / 30

Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f(x) f 1 (x) f 2 (x) f 1 (x) > f 2 (x): unique subgradient of f is f 1 (x) f 1 (x) < f 2 (x): unique subgradient of f is f 2 (x) f 1 (y) = f 2 (y): subgradients, the segment [f 1 (y), f 2 (y)] (imagine all supporting lines turning about point y) y 12 / 30

Subgradients Def. A vector g R n is called a subgradient at a point y, if for all x dom f, it holds that f(x) f(y) + g, x y 13 / 30

Subdifferential Def. The set of all subgradients at y denoted by f(y). This set is called subdifferential of f at y 14 / 30

Subdifferential Def. The set of all subgradients at y denoted by f(y). This set is called subdifferential of f at y If f is convex, f(x) is nice: If x relative interior of dom f, then f(x) nonempty If f differentiable at x, then f(x) = { f(x)} 14 / 30

Subdifferential example f(x) = x 15 / 30

Subdifferential example f(x) = x +1 f(x) 1 x 15 / 30

Subdifferential example f(x) = x +1 f(x) 1 x 1 x < 0, x = +1 x > 0, [ 1, 1] x = 0. 15 / 30

More examples Example f(x) = x 2. Then, { x 1 2 f(x) := x x 0, {z z 2 1} x = 0. 16 / 30

More examples Example f(x) = x 2. Then, { x 1 2 f(x) := x x 0, {z z 2 1} x = 0. Proof. z 2 x 2 + g, z x 16 / 30

More examples Example f(x) = x 2. Then, { x 1 2 f(x) := x x 0, {z z 2 1} x = 0. Proof. z 2 x 2 + g, z x z 2 g, z 16 / 30

More examples Example f(x) = x 2. Then, { x 1 2 f(x) := x x 0, {z z 2 1} x = 0. Proof. z 2 x 2 + g, z x z 2 g, z = g 2 1. 16 / 30

More examples Example A convex function need not be subdifferentiable everywhere. Let { (1 x 2 2 f(x) := )1/2 if x 2 1, + otherwise. f diff. for all x with x 2 < 1, but f(x) = whenever x 2 1. 17 / 30

Calculus 18 / 30

Recall basic calculus If f and k are differentiable, we know that Addition: (f + k)(x) = f(x) + k(x) Scaling: (αf(x)) = α f(x) 19 / 30

Recall basic calculus If f and k are differentiable, we know that Addition: (f + k)(x) = f(x) + k(x) Scaling: (αf(x)) = α f(x) Chain rule If f : R n R m, and k : R m R p. Let h : R n R p be the composition h(x) = (k f)(x) = k(f(x)). Then, Dh(x) = Dk(f(x))Df(x). 19 / 30

Subgradient calculus Finding one subgradient within f(x) 20 / 30

Subgradient calculus Finding one subgradient within f(x) Determining entire subdifferential f(x) at a point x 20 / 30

Subgradient calculus Finding one subgradient within f(x) Determining entire subdifferential f(x) at a point x Do we have the chain rule? 20 / 30

Subgradient calculus Finding one subgradient within f(x) Determining entire subdifferential f(x) at a point x Do we have the chain rule? Usually not easy! 20 / 30

Subgradient calculus If f is differentiable, f(x) = { f(x)} 21 / 30

Subgradient calculus If f is differentiable, f(x) = { f(x)} Scaling α > 0, (αf)(x) = α f(x) = {αg g f(x)} 21 / 30

Subgradient calculus If f is differentiable, f(x) = { f(x)} Scaling α > 0, (αf)(x) = α f(x) = {αg g f(x)} Addition : (f + k)(x) = f(x) + k(x) (set addition) 21 / 30

Subgradient calculus If f is differentiable, f(x) = { f(x)} Scaling α > 0, (αf)(x) = α f(x) = {αg g f(x)} Addition : (f + k)(x) = f(x) + k(x) (set addition) Chain rule : Let A R m n, b R m, f : R m R, and h : R n R be given by h(x) = f(ax + b). Then, h(x) = A T f(ax + b). Chain rule : h(x) = f k, where k : X Y is diff. h(x) = f(k(x)) Dk(x) = [Dk(x)] T f(k(x)) Max function : If f(x) := max 1 i m f i (x), then f(x) = conv { f i (x) f i (x) = f(x)}, convex hull over subdifferentials of active functions at x 21 / 30

Examples It can happen that (f 1 + f 2 ) f 1 + f 2 22 / 30

Examples It can happen that (f 1 + f 2 ) f 1 + f 2 Example Define f 1 and f 2 by { 2 x if x 0, f 1 (x) := + if x < 0, and f 2 (x) := { + if x > 0, 2 x if x 0. Then, f = max {f 1, f 2 } = I 0, whereby f(0) = R But f 1 (0) = f 2 (0) =. 22 / 30

Examples Example f(x) = x. Then, f(0) = conv {±e 1,..., ±e n }, where e i is i-th canonical basis vector (column of identity matrix). 23 / 30

Examples Example f(x) = x. Then, f(0) = conv {±e 1,..., ±e n }, where e i is i-th canonical basis vector (column of identity matrix). { To prove, notice that f(x) = max 1 i n e T i x }. 23 / 30

Example (S. Boyd) Example Let f(x) = max { s T x s i { 1, 1} } (2 n members) 24 / 30

Example (S. Boyd) Example Let f(x) = max { s T x s i { 1, 1} } (2 n members) ( 1, 1) (1, 1) f at x = (0, 0) 24 / 30

Example (S. Boyd) Example Let f(x) = max { s T x s i { 1, 1} } (2 n members) ( 1, 1) +1 1 (1, 1) 1 f at x = (0, 0) f at x = (1, 0) 24 / 30

Example (S. Boyd) Example Let f(x) = max { s T x s i { 1, 1} } (2 n members) ( 1, 1) +1 +1 (1, 1) 1 (1, 1) 1 1 f at x = (0, 0) f at x = (1, 0) f at x = (1, 1) 24 / 30

Rules for subgradients 25 / 30

Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! 26 / 30

Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! Simple way to obtain some g f(x): 26 / 30

Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! Simple way to obtain some g f(x): Pick any y for which h(x, y ) = f(x) 26 / 30

Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! Simple way to obtain some g f(x): Pick any y for which h(x, y ) = f(x) Pick any subgradient g h(x, y ) 26 / 30

Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! Simple way to obtain some g f(x): Pick any y for which h(x, y ) = f(x) Pick any subgradient g h(x, y ) This g f(x) h(z, y ) h(x, y ) + g T (z x) h(z, y ) f(x) + g T (z x) 26 / 30

Example Suppose a i R n and b i R. And f(x) := max 1 i n (at i x + b i ). This f a max (in fact, over a finite number of terms) 27 / 30

Example Suppose a i R n and b i R. And f(x) := max 1 i n (at i x + b i ). This f a max (in fact, over a finite number of terms) Suppose f(x) = a T k x + b k for some index k 27 / 30

Example Suppose a i R n and b i R. And f(x) := max 1 i n (at i x + b i ). This f a max (in fact, over a finite number of terms) Suppose f(x) = a T k x + b k for some index k Here f(x; y) = f k (x) = a T k x + b k, and f k (x) = { f k (x)} 27 / 30

Subgradient of expectation Suppose f = Ef(x, u), where f is convex in x for each u (an r.v.) f(x) := f(x, u)p(u)du 28 / 30

Subgradient of expectation Suppose f = Ef(x, u), where f is convex in x for each u (an r.v.) f(x) := f(x, u)p(u)du For each u choose any g(x, u) x f(x, u) 28 / 30

Subgradient of expectation Suppose f = Ef(x, u), where f is convex in x for each u (an r.v.) f(x) := f(x, u)p(u)du For each u choose any g(x, u) x f(x, u) Then, g = g(x, u)p(u)du = Eg(x, u) f(x) 28 / 30

Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). 29 / 30

Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). To find a vector g f(x), we may: 29 / 30

Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). To find a vector g f(x), we may: For i = 1 to n, compute g i f i (x) 29 / 30

Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). To find a vector g f(x), we may: For i = 1 to n, compute g i f i (x) Compute u h(f 1 (x),..., f n (x)) Set g = u 1 g 1 + u 2 g 2 + + u n g n ; this g f(x) 29 / 30

References 1 R. T. Rockafellar. Convex Analysis 2 S. Boyd (Stanford); EE364b Lecture Notes. 30 / 30

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013