Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013

Size: px

Start display at page:

Download "Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013"

Elinor Patrick
6 years ago
Views:

1 Convex Optimization (EE227A: UC Berkeley) Lecture 4 (Conjugates, subdifferentials) 31 Jan, 2013 Suvrit Sra

2 Organizational HW1 due: 14th Feb 2013 in class. Please L A TEX your solutions (contact TA if this is an issue) Discussion with classmates is ok Each person must submit his/her individual solutions Acknowledge any help you receive Do not copy! Make sure you understand the solution you submit Cite any source that you use Have fun solving problems! 2 / 30

3 Recap Eigenvalues, singular values, positive definiteness Convex sets, θ 1 x + θ 2 y C, θ 1 + θ 2 = 1, θ i 0 Convex functions, midpoint convex, recognizing convexity Norms, mixed-norms, matrix norms, dual norms Indicator, distance function, minimum of jointly convex Brief mention of other forms of convexity f ( x+y 2 ) 1 2 [f(x) + f(y)] + continuity = f is cvx 2 f(x) 0 implies f is cvx. 3 / 30

4 Fenchel Conjugate 4 / 30

5 Dual norms Def. Let be a norm on R n. Its dual norm is u := sup { u T x x 1 }. 5 / 30

6 Dual norms Def. Let be a norm on R n. Its dual norm is u := sup { u T x x 1 }. Exercise: Verify that we may write u = sup x 0 u T x x 5 / 30

7 Dual norms Def. Let be a norm on R n. Its dual norm is u := sup { u T x x 1 }. Exercise: Verify that we may write u = sup x 0 u T x x Exercise: Verify that u is a norm. 5 / 30

8 Dual norms Def. Let be a norm on R n. Its dual norm is u := sup { u T x x 1 }. Exercise: Verify that we may write u = sup x 0 u T x x Exercise: Verify that u is a norm. u + v = sup { (u + v) T x x 1 } But sup (A + B) sup A + sup B 5 / 30

9 Dual norms Def. Let be a norm on R n. Its dual norm is u := sup { u T x x 1 }. Exercise: Verify that we may write u = sup x 0 u T x x Exercise: Verify that u is a norm. u + v = sup { (u + v) T x x 1 } But sup (A + B) sup A + sup B Exercise: Let 1/p + 1/q = 1, where p, q 1. Show that q is dual to p. In particular, the l 2 -norm is self-dual. 5 / 30

10 Dual norms Def. Let be a norm on R n. Its dual norm is u := sup { u T x x 1 }. Exercise: Verify that we may write u = sup x 0 u T x x Exercise: Verify that u is a norm. u + v = sup { (u + v) T x x 1 } But sup (A + B) sup A + sup B Exercise: Let 1/p + 1/q = 1, where p, q 1. Show that q is dual to p. In particular, the l 2 -norm is self-dual. Hint: Use Hölder s inequality: u T v u p v q 5 / 30

11 Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). 6 / 30

12 Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! 6 / 30

13 Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. 6 / 30

14 Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 6 / 30

15 Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 6 / 30

16 Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 f (z) = sup x x T z f(x). Rewrite x = αu, and let α 6 / 30

17 Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 f (z) = sup x x T z f(x). Rewrite x = αu, and let α Then, z T x x = αz T u αu = α(z T u u ); 6 / 30

18 Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 f (z) = sup x x T z f(x). Rewrite x = αu, and let α Then, z T x x = αz T u αu = α(z T u u ); 6 / 30

19 Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 f (z) = sup x x T z f(x). Rewrite x = αu, and let α Then, z T x x = αz T u αu = α(z T u u ); Case (ii): Since z T x x z, 6 / 30

20 Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 f (z) = sup x x T z f(x). Rewrite x = αu, and let α Then, z T x x = αz T u αu = α(z T u u ); Case (ii): Since z T x x z, x T z x x ( z 1) 0. 6 / 30

21 Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 f (z) = sup x x T z f(x). Rewrite x = αu, and let α Then, z T x x = αz T u αu = α(z T u u ); Case (ii): Since z T x x z, x T z x x ( z 1) 0. x = 0 maximizes x ( z 1), hence f(z) = 0. 6 / 30

22 Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 f (z) = sup x x T z f(x). Rewrite x = αu, and let α Then, z T x x = αz T u αu = α(z T u u ); Case (ii): Since z T x x z, x T z x x ( z 1) 0. x = 0 maximizes x ( z 1), hence f(z) = 0. Thus, f(z) = + if (i), and 0 if (ii), as desired. 6 / 30

23 Fenchel conjugate Example f(x) = ax + b; then, f (z) = sup zx (ax + b) x 7 / 30

24 Fenchel conjugate Example f(x) = ax + b; then, f (z) = sup zx (ax + b) x =, if (z a) 0. 7 / 30

25 Fenchel conjugate Example f(x) = ax + b; then, f (z) = sup zx (ax + b) x =, if (z a) 0. Thus, dom f = {a}, and f (a) = b. 7 / 30

26 Fenchel conjugate Example f(x) = ax + b; then, f (z) = sup zx (ax + b) x =, if (z a) 0. Thus, dom f = {a}, and f (a) = b. Example Let a 0, and set f(x) = a 2 x 2 if x a, and + otherwise. Then, f (z) = a 1 + z 2. 7 / 30

27 Fenchel conjugate Example f(x) = ax + b; then, f (z) = sup zx (ax + b) x =, if (z a) 0. Thus, dom f = {a}, and f (a) = b. Example Let a 0, and set f(x) = a 2 x 2 if x a, and + otherwise. Then, f (z) = a 1 + z 2. Example f(x) = 1 2 xt Ax, where A 0. Then, f (z) = 1 2 zt A 1 z. 7 / 30

28 Fenchel conjugate Example f(x) = ax + b; then, f (z) = sup zx (ax + b) x =, if (z a) 0. Thus, dom f = {a}, and f (a) = b. Example Let a 0, and set f(x) = a 2 x 2 if x a, and + otherwise. Then, f (z) = a 1 + z 2. Example f(x) = 1 2 xt Ax, where A 0. Then, f (z) = 1 2 zt A 1 z. Exercise: If f(x) = max(0, 1 x), then dom f is [ 1, 0], and within this domain, f (z) = z. Hint: Analyze cases: max(0, 1 x) = 0; and max(0, 1 x) = 1 x 7 / 30

29 Subdifferentials 8 / 30

30 First order global underestimator f(y) f(x) f(y) + f(y), x y y x f(x) f(y) + f(y), x y 9 / 30

31 First order global underestimator f(x) g 1 f(y) f(y) + g 1, x y g 2 g 3 y f(x) f(y) + g, x y 9 / 30

32 Subgradients g 2 g 3 f(x) g 1 f(y) f(y) + g 1, x y g 1, g 2, g 3 are subgradients at y y 10 / 30

33 Subgradients basic facts f is convex, differentiable: f(y) the unique subgradient at y A vector g is a subgradient at a point y if and only if f(y) + g, x y is globally smaller than f(x). Usually, one subgradient costs approx. as much as f(x) 11 / 30

34 Subgradients basic facts f is convex, differentiable: f(y) the unique subgradient at y A vector g is a subgradient at a point y if and only if f(y) + g, x y is globally smaller than f(x). Usually, one subgradient costs approx. as much as f(x) Determining all subgradients at a given point difficult. Subgradient calculus a great achievement in convex analysis 11 / 30

35 Subgradients basic facts f is convex, differentiable: f(y) the unique subgradient at y A vector g is a subgradient at a point y if and only if f(y) + g, x y is globally smaller than f(x). Usually, one subgradient costs approx. as much as f(x) Determining all subgradients at a given point difficult. Subgradient calculus a great achievement in convex analysis Without convexity, things become wild! advanced course 11 / 30

36 Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable 12 / 30

37 Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f 1 (x) 12 / 30

38 Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f 1 (x) f 2 (x) 12 / 30

39 Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f(x) f 1 (x) f 2 (x) 12 / 30

40 Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f(x) f 1 (x) f 2 (x) y 12 / 30

41 Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f(x) f 1 (x) f 2 (x) f 1 (x) > f 2 (x): unique subgradient of f is f 1 (x) y 12 / 30

42 Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f(x) f 1 (x) f 2 (x) f 1 (x) > f 2 (x): unique subgradient of f is f 1 (x) f 1 (x) < f 2 (x): unique subgradient of f is f 2 (x) y 12 / 30

43 Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f(x) f 1 (x) f 2 (x) f 1 (x) > f 2 (x): unique subgradient of f is f 1 (x) f 1 (x) < f 2 (x): unique subgradient of f is f 2 (x) f 1 (y) = f 2 (y): subgradients, the segment [f 1 (y), f 2 (y)] (imagine all supporting lines turning about point y) y 12 / 30

44 Subgradients Def. A vector g R n is called a subgradient at a point y, if for all x dom f, it holds that f(x) f(y) + g, x y 13 / 30

45 Subdifferential Def. The set of all subgradients at y denoted by f(y). This set is called subdifferential of f at y 14 / 30

46 Subdifferential Def. The set of all subgradients at y denoted by f(y). This set is called subdifferential of f at y If f is convex, f(x) is nice: If x relative interior of dom f, then f(x) nonempty 14 / 30

47 Subdifferential Def. The set of all subgradients at y denoted by f(y). This set is called subdifferential of f at y If f is convex, f(x) is nice: If x relative interior of dom f, then f(x) nonempty If f differentiable at x, then f(x) = { f(x)} 14 / 30

48 Subdifferential Def. The set of all subgradients at y denoted by f(y). This set is called subdifferential of f at y If f is convex, f(x) is nice: If x relative interior of dom f, then f(x) nonempty If f differentiable at x, then f(x) = { f(x)} If f(x) = {g}, then f is differentiable and g = f(x) 14 / 30

49 Subdifferential example f(x) = x 15 / 30

50 Subdifferential example f(x) = x +1 f(x) 1 x 15 / 30

51 Subdifferential example f(x) = x +1 f(x) 1 x 1 x < 0, x = +1 x > 0, [ 1, 1] x = / 30

52 More examples Example f(x) = x 2. Then, { x 1 2 f(x) := x x 0, {z z 2 1} x = / 30

53 More examples Example f(x) = x 2. Then, { x 1 2 f(x) := x x 0, {z z 2 1} x = 0. Proof. z 2 x 2 + g, z x 16 / 30

54 More examples Example f(x) = x 2. Then, { x 1 2 f(x) := x x 0, {z z 2 1} x = 0. Proof. z 2 x 2 + g, z x z 2 g, z 16 / 30

55 More examples Example f(x) = x 2. Then, { x 1 2 f(x) := x x 0, {z z 2 1} x = 0. Proof. z 2 x 2 + g, z x z 2 g, z = g / 30

56 More examples Example A convex function need not be subdifferentiable everywhere. Let { (1 x 2 2 f(x) := )1/2 if x 2 1, + otherwise. f diff. for all x with x 2 < 1, but f(x) = whenever x / 30

57 Calculus 18 / 30

58 Recall basic calculus If f and k are differentiable, we know that Addition: (f + k)(x) = f(x) + k(x) Scaling: (αf(x)) = α f(x) 19 / 30

59 Recall basic calculus If f and k are differentiable, we know that Addition: (f + k)(x) = f(x) + k(x) Scaling: (αf(x)) = α f(x) Chain rule If f : R n R m, and k : R m R p. Let h : R n R p be the composition h(x) = (k f)(x) = k(f(x)). Then, Dh(x) = Dk(f(x))Df(x). 19 / 30

60 Recall basic calculus If f and k are differentiable, we know that Addition: (f + k)(x) = f(x) + k(x) Scaling: (αf(x)) = α f(x) Chain rule If f : R n R m, and k : R m R p. Let h : R n R p be the composition h(x) = (k f)(x) = k(f(x)). Then, Dh(x) = Dk(f(x))Df(x). Example If f : R n R and k : R R, then using the fact that h(x) = [Dh(x)] T, we obtain h(x) = k (f(x)) f(x). 19 / 30

61 Subgradient calculus Finding one subgradient within f(x) 20 / 30

62 Subgradient calculus Finding one subgradient within f(x) Determining entire subdifferential f(x) at a point x 20 / 30

63 Subgradient calculus Finding one subgradient within f(x) Determining entire subdifferential f(x) at a point x Do we have the chain rule? 20 / 30

64 Subgradient calculus Finding one subgradient within f(x) Determining entire subdifferential f(x) at a point x Do we have the chain rule? Usually not easy! 20 / 30

65 Subgradient calculus If f is differentiable, f(x) = { f(x)} 21 / 30

66 Subgradient calculus If f is differentiable, f(x) = { f(x)} Scaling α > 0, (αf)(x) = α f(x) = {αg g f(x)} 21 / 30

67 Subgradient calculus If f is differentiable, f(x) = { f(x)} Scaling α > 0, (αf)(x) = α f(x) = {αg g f(x)} Addition : (f + k)(x) = f(x) + k(x) (set addition) 21 / 30

68 Subgradient calculus If f is differentiable, f(x) = { f(x)} Scaling α > 0, (αf)(x) = α f(x) = {αg g f(x)} Addition : (f + k)(x) = f(x) + k(x) (set addition) Chain rule : Let A R m n, b R m, f : R m R, and h : R n R be given by h(x) = f(ax + b). Then, h(x) = A T f(ax + b). 21 / 30

69 Subgradient calculus If f is differentiable, f(x) = { f(x)} Scaling α > 0, (αf)(x) = α f(x) = {αg g f(x)} Addition : (f + k)(x) = f(x) + k(x) (set addition) Chain rule : Let A R m n, b R m, f : R m R, and h : R n R be given by h(x) = f(ax + b). Then, h(x) = A T f(ax + b). Chain rule : h(x) = f k, where k : X Y is diff. h(x) = f(k(x)) Dk(x) = [Dk(x)] T f(k(x)) 21 / 30

70 Subgradient calculus If f is differentiable, f(x) = { f(x)} Scaling α > 0, (αf)(x) = α f(x) = {αg g f(x)} Addition : (f + k)(x) = f(x) + k(x) (set addition) Chain rule : Let A R m n, b R m, f : R m R, and h : R n R be given by h(x) = f(ax + b). Then, h(x) = A T f(ax + b). Chain rule : h(x) = f k, where k : X Y is diff. h(x) = f(k(x)) Dk(x) = [Dk(x)] T f(k(x)) Max function : If f(x) := max 1 i m f i (x), then f(x) = conv { f i (x) f i (x) = f(x)}, convex hull over subdifferentials of active functions at x 21 / 30

71 Subgradient calculus If f is differentiable, f(x) = { f(x)} Scaling α > 0, (αf)(x) = α f(x) = {αg g f(x)} Addition : (f + k)(x) = f(x) + k(x) (set addition) Chain rule : Let A R m n, b R m, f : R m R, and h : R n R be given by h(x) = f(ax + b). Then, h(x) = A T f(ax + b). Chain rule : h(x) = f k, where k : X Y is diff. h(x) = f(k(x)) Dk(x) = [Dk(x)] T f(k(x)) Max function : If f(x) := max 1 i m f i (x), then f(x) = conv { f i (x) f i (x) = f(x)}, convex hull over subdifferentials of active functions at x Conjugation: z f(x) if and only if x f (z) 21 / 30

72 Examples It can happen that (f 1 + f 2 ) f 1 + f 2 22 / 30

73 Examples It can happen that (f 1 + f 2 ) f 1 + f 2 Example Define f 1 and f 2 by { 2 x if x 0, f 1 (x) := + if x < 0, and f 2 (x) := { + if x > 0, 2 x if x 0. Then, f = max {f 1, f 2 } = I 0, whereby f(0) = R But f 1 (0) = f 2 (0) =. 22 / 30

74 Examples It can happen that (f 1 + f 2 ) f 1 + f 2 Example Define f 1 and f 2 by { 2 x if x 0, f 1 (x) := + if x < 0, and f 2 (x) := { + if x > 0, 2 x if x 0. Then, f = max {f 1, f 2 } = I 0, whereby f(0) = R But f 1 (0) = f 2 (0) =. However, f 1 (x) + f 2 (x) (f 1 + f 2 )(x) always holds. 22 / 30

75 Examples Example f(x) = x. Then, f(0) = conv {±e 1,..., ±e n }, where e i is i-th canonical basis vector (column of identity matrix). 23 / 30

76 Examples Example f(x) = x. Then, f(0) = conv {±e 1,..., ±e n }, where e i is i-th canonical basis vector (column of identity matrix). { To prove, notice that f(x) = max 1 i n e T i x }. 23 / 30

77 Example (S. Boyd) Example Let f(x) = max { s T x s i { 1, 1} } (2 n members) 24 / 30

78 Example (S. Boyd) Example Let f(x) = max { s T x s i { 1, 1} } (2 n members) ( 1, 1) (1, 1) f at x = (0, 0) 24 / 30

79 Example (S. Boyd) Example Let f(x) = max { s T x s i { 1, 1} } (2 n members) ( 1, 1) +1 1 (1, 1) 1 f at x = (0, 0) f at x = (1, 0) 24 / 30

80 Example (S. Boyd) Example Let f(x) = max { s T x s i { 1, 1} } (2 n members) ( 1, 1) (1, 1) 1 (1, 1) 1 1 f at x = (0, 0) f at x = (1, 0) f at x = (1, 1) 24 / 30

81 Rules for subgradients 25 / 30

82 Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! 26 / 30

83 Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! Simple way to obtain some g f(x): 26 / 30

84 Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! Simple way to obtain some g f(x): Pick any y for which h(x, y ) = f(x) 26 / 30

85 Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! Simple way to obtain some g f(x): Pick any y for which h(x, y ) = f(x) Pick any subgradient g h(x, y ) 26 / 30

86 Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! Simple way to obtain some g f(x): Pick any y for which h(x, y ) = f(x) Pick any subgradient g h(x, y ) This g f(x) 26 / 30

87 Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! Simple way to obtain some g f(x): Pick any y for which h(x, y ) = f(x) Pick any subgradient g h(x, y ) This g f(x) h(z, y ) h(x, y ) + g T (z x) h(z, y ) f(x) + g T (z x) 26 / 30

88 Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! Simple way to obtain some g f(x): Pick any y for which h(x, y ) = f(x) Pick any subgradient g h(x, y ) This g f(x) h(z, y ) h(x, y ) + g T (z x) h(z, y ) f(x) + g T (z x) f(z) h(z, y) (because of sup) f(z) f(x) + g T (z x). 26 / 30

89 Example Suppose a i R n and b i R. And f(x) := max 1 i n (at i x + b i ). This f a max (in fact, over a finite number of terms) 27 / 30

90 Example Suppose a i R n and b i R. And f(x) := max 1 i n (at i x + b i ). This f a max (in fact, over a finite number of terms) Suppose f(x) = a T k x + b k for some index k 27 / 30

91 Example Suppose a i R n and b i R. And f(x) := max 1 i n (at i x + b i ). This f a max (in fact, over a finite number of terms) Suppose f(x) = a T k x + b k for some index k Here f(x; y) = f k (x) = a T k x + b k, and f k (x) = { f k (x)} 27 / 30

92 Example Suppose a i R n and b i R. And f(x) := max 1 i n (at i x + b i ). This f a max (in fact, over a finite number of terms) Suppose f(x) = a T k x + b k for some index k Here f(x; y) = f k (x) = a T k x + b k, and f k (x) = { f k (x)} Hence, a k f(x) works! 27 / 30

93 Subgradient of expectation Suppose f = Ef(x, u), where f is convex in x for each u (an r.v.) f(x) := f(x, u)p(u)du 28 / 30

94 Subgradient of expectation Suppose f = Ef(x, u), where f is convex in x for each u (an r.v.) f(x) := f(x, u)p(u)du For each u choose any g(x, u) x f(x, u) 28 / 30

95 Subgradient of expectation Suppose f = Ef(x, u), where f is convex in x for each u (an r.v.) f(x) := f(x, u)p(u)du For each u choose any g(x, u) x f(x, u) Then, g = g(x, u)p(u)du = Eg(x, u) f(x) 28 / 30

96 Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). 29 / 30

97 Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). To find a vector g f(x), we may: 29 / 30

98 Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). To find a vector g f(x), we may: For i = 1 to n, compute g i f i (x) 29 / 30

99 Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). To find a vector g f(x), we may: For i = 1 to n, compute g i f i (x) Compute u h(f 1 (x),..., f n (x)) 29 / 30

100 Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). To find a vector g f(x), we may: For i = 1 to n, compute g i f i (x) Compute u h(f 1 (x),..., f n (x)) Set g = u 1 g 1 + u 2 g u n g n ; this g f(x) 29 / 30

101 Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). To find a vector g f(x), we may: For i = 1 to n, compute g i f i (x) Compute u h(f 1 (x),..., f n (x)) Set g = u 1 g 1 + u 2 g u n g n ; this g f(x) Compare with f(x) = J h(x), where J matrix of f i (x) 29 / 30

102 Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). To find a vector g f(x), we may: For i = 1 to n, compute g i f i (x) Compute u h(f 1 (x),..., f n (x)) Set g = u 1 g 1 + u 2 g u n g n ; this g f(x) Compare with f(x) = J h(x), where J matrix of f i (x) Exercise: Verify g f(x) by showing f(z) f(x) + g T (z x) 29 / 30

103 References 1 R. T. Rockafellar. Convex Analysis 2 S. Boyd (Stanford); EE364b Lecture Notes. 30 / 30

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives Subgradients subgradients strong and weak subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE364b, Stanford University Basic inequality recall basic inequality