A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Size: px
Start display at page:

Download "A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:"

Transcription

1 1-5: Least-squares I A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: f (x) =(Ax b) T (Ax b) =x T A T Ax 2b T Ax + b T b f (x) = 2A T Ax 2A T b = 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 1 / 199

2 1-5: Least-squares II Regularization, weights: 1 2 λx T x + w 1 (Ax b) w k (Ax b) 2 k Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 2 / 199

3 2-4: Convex Combination and Convex Hull I Convex hull is convex Then x = θ 1 x θ k x k x = θ 1 x θ k x k αx + (1 α) x =αθ 1 x αθ k x k + (1 α) θ 1 x (1 α) θ k x k Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 3 / 199

4 2-4: Convex Combination and Convex Hull II Each coefficient is nonnegative and αθ αθ k + (1 α) θ (1 α) θ k = α + (1 α) = 1 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 4 / 199

5 2-7: Euclidean Balls and Ellipsoid I We prove that any satisfies Let x = x c + Au with u 2 1 (x x c ) T P 1 (x x c ) 1 A = P 1/2 because P is symmetric positive definite. Then u T A T P 1 Au = u T P 1/2 P 1 P 1/2 u 1. Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 5 / 199

6 2-10: Positive Semidefinite Cone I S n + is a convex cone. Let For any θ 1 0, θ 2 0, X 1, X 2 S n + z T (θ 1 X 1 + θ 2 X 2 )z = θ 1 z T X 1 z + θ 2 z T X 2 z 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 6 / 199

7 2-10: Positive Semidefinite Cone II Example: [ ] x y S y z 2 + is equivalent to x 0, z 0, xz y 2 0 If x > 0 or (z > 0) is fixed, we can see that has a parabolic shape z y 2 x Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 7 / 199

8 2-12: Interaction I When t is fixed, {(x 1, x 2 ) 1 x 1 cos t + x 2 cos 2t 1} gives a region between two parallel lines This region is convex Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 8 / 199

9 2-13: Affine Function I f (S) is convex: Let f (x 1 ) f (S), f (x 2 ) f (S) αf (x 1 ) + (1 α)f (x 2 ) =α(ax 1 + b) + (1 α)(ax 2 + b) =A(αx 1 + (1 α)x 2 ) + b f (S) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 9 / 199

10 2-13: Affine Function II f 1 (C) convex: means that Because C is convex, x 1, x 2 f 1 (C) Ax 1 + b C, Ax 2 + b C α(ax 1 + b) + (1 α)(ax 2 + b) =A(αx 1 + (1 α)x 2 ) + b C Thus αx 1 + (1 α)x 2 f 1 (C) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 10 / 199

11 2-13: Affine Function III Scaling: Translation αs = {αx x S} S + a = {x + a x S} Projection T = {x 1 R m (x 1, x 2 ) S, x 2 R n }, S R m R n Scaling, translation, and projection are all affine functions Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 11 / 199

12 2-13: Affine Function IV For example, for projection f (x) = [ I 0 ] [ ] x x 2 I : identity matrix Solution set of linear matrix inequality C = {S S 0} is convex f (x) = x 1 A x m A m B = Ax + b f 1 (C) = {x f (x) 0} is convex Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 12 / 199

13 2-13: Affine Function V But this isn t rigourous because of some problems in arguing f (x) = Ax + b A more formal explanation: C = {s R p2 mat(s) S p and mat(s) 0} is convex f (x) = x 1 vec(a 1 ) + + x m vec(a m ) vec(b) = [ vec(a 1 ) vec(a m ) ] x + ( vec(b)) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 13 / 199

14 2-13: Affine Function VI f 1 (C) = {x mat(f (x)) S p and mat(f (x)) 0} is convex Hyperbolic cone: C = {(z, t) z T z t 2, t 0} is convex (by drawing a figure in 2 or 3 dimensional space) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 14 / 199

15 2-13: Affine Function VII We have that is affine. Then is convex f (x) = [ ] P 1/2 x c T = x f 1 (C) = {x f (x) C} [ P 1/2 c T ] x = {x x T Px (c T x) 2, c T x 0} Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 15 / 199

16 Perspective and linear-fractional function I Image convex: if S is convex, check if {P(x, t) (x, t) S} convex or not Note that S is in the domain of P Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 16 / 199

17 Perspective and linear-fractional function II Assume We hope (x 1, t 1 ), (x 2, t 2 ) S α x 1 t 1 + (1 α) x 2 t 2 = P(A, B), where (A, B) S Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 17 / 199

18 Perspective and linear-fractional function III We have α x 1 t 1 + (1 α) x 2 t 2 = αt 2x 1 + (1 α)t 1 x 2 t 1 t 2 = αt 2x 1 + (1 α)t 1 x 2 αt 1 t 2 + (1 α)t 1 t 2 αt 2 αt = 2 +(1 α)t 1 x 1 + (1 α)t 1 αt 2 +(1 α)t 1 x 2 αt 2 αt 2 +(1 α)t 1 t 1 + (1 α)t 1 αt 2 +(1 α)t 1 t 2 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 18 / 199

19 Perspective and linear-fractional function IV Let We have Further because θ = αt 2 αt 2 + (1 α)t 1 θx 1 + (1 θ)x 2 θt 1 + (1 θ)t 2 = A B (A, B) S (x 1, t 1 ), (x 2, t 2 ) S Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 19 / 199

20 Perspective and linear-fractional function V and Inverse image is convex Given C a convex set is convex S is convex P 1 (C) = {(x, t) P(x, t) = x/t C} Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 20 / 199

21 Perspective and linear-fractional function VI Let Do we have (x 1, t 1 ) : x 1 /t 1 C (x 2, t 2 ) : x 2 /t 2 C θ(x 1, t 1 ) + (1 θ)(x 2, t 2 ) P 1 (C)? That is, θx 1 + (1 θ)x 2 θt 1 + (1 θ)t 2 C? Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 21 / 199

22 Perspective and linear-fractional function VII Let Earlier we had θx 1 + (1 θ)x 2 θt 1 + (1 θ)t 2 = α x 1 t 1 + (1 α) x 2 t 2, θ = αt 2 αt 2 + (1 α)t 1 Then (α(t 2 t 1 ) + t 1 )θ = αt 2 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 22 / 199

23 Perspective and linear-fractional function VIII t 1 θ = αt 2 αt 2 θ + αt 1 θ t 1 θ α = t 1 θ + (1 θ)t 2 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 23 / 199

24 2-16: Generalized inequalities I K contains no line: x with x K and x K x = 0 Nonnegative polynomial on [0, 1] When n = 2 t = 1 x 1 tx 2, t [0, 1] Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 24 / 199

25 2-16: Generalized inequalities II t = 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 25 / 199

26 2-16: Generalized inequalities III t [0, 1] It really becomes a proper cone Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 26 / 199

27 2-17: I Properties: implies that x K y, u K v x y K u v K From the definition of a convex cone, (x y) + (u v) K Then x + u K y + v Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 27 / 199

28 2-18: Minimum and minimal elements I The minimum element S x 1 + K A minimal element (x 2 K) S = {x 2 } Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 28 / 199

29 2-19: Separating hyperplane theorem I We consider a simplified situation and omit part of the proof Assume inf u v > 0 u C,v D and minimum attained at c, d We will show that a d c, b d 2 c 2 forms a separating hyperplane a T x = b Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 29 / 199 2

30 2-19: Separating hyperplane theorem II c d a T x = 2 b 1 1 = (d c) T d, b = a = d c 2 = (d c) T c = d T d c T c 2 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 30 / 199

31 2-19: Separating hyperplane theorem III Assume the result is wrong so there is u D such that a T u b < 0 We will derive a point u in D but closer to c than d. That is, u c < d c Then we have a contradiction The concept Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 31 / 199

32 2-19: Separating hyperplane theorem IV c u u d a T x = b a T u b = (d c) T u d T d c T c 2 implies that < 0 (d c) T (u d) d c 2 < 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 32 / 199

33 2-19: Separating hyperplane theorem V d d + t(u d) c 2 dt t=0 =2(d + t(u d) c) T (u d) =2(d c) T (u d) < 0 There exists a small t (0, 1) such that However, d + t(u d) c < d c so there is a contradiction d + t(u d) D, t=0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 33 / 199

34 2-20: Supporting hyperplane theorem I Case 1: C has an interior region Consider 2 sets: interior of C versus {x 0 }, where x 0 is any boundary point If C is convex, then interior of C is also convex Then both sets are convex We can apply results in slide 2-19 so that there exists a such that a T x a T x 0, x interior of C Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 34 / 199

35 2-20: Supporting hyperplane theorem II Then for all boundary point x we also have a T x a T x 0 because any boundary point is the limit of interior points Case 2: C has no interior region In this situation, C is like a line in R 3 (so no interior). Then of course it has a supporting hyperplane Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 35 / 199

36 3-4: Examples on R n and R m n I A, X R m n tr(a T X ) = (A T X ) jj j = A T ji X ij = A ij X ij j i j i Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 36 / 199

37 3-7: First-order Condition I An open set: for any x, there is a ball covering x such that this ball is in the set Global underestimator: z f (x) y x = f (x) z = f (x) + f (x)(y x) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 37 / 199

38 3-7: First-order Condition II : Because domain f is convex, for all 0 < t 1, x + t(y x) domain f when t 0, : f (x + t(y x)) (1 t)f (x) + tf (y) f (y) f (x) + f (x + t(y x)) f (x) t f (y) f (x) + f (x) T (y x) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 38 / 199

39 3-7: First-order Condition III For any 0 θ 1, z = θx + (1 θ)y f (x) f (z) + f (z) T (x z) =f (z) + f (z) T (1 θ)(x y) f (y) f (z) + f (z) T (y z) =f (z) + f (z) T θ(y x) θf (x) + (1 θ)f (y) f (z) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 39 / 199

40 3-7: First-order Condition IV First-order condition for strictly convex function: f is strictly convex if and only if f (y) > f (x) + f (x) T (y x) : it s easy by directly modifying to > f (x) >f (z) + f (z) T (x z) =f (z) + f (z) T (1 θ)(x y) f (y) > f (z)+ f (z) T (y z) = f (z)+ f (z) T θ(y x) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 40 / 199

41 3-7: First-order Condition V : Assume the result is wrong. From the 1st-order condition of a convex function, x, y such that x y and f (x) T (y x) = f (y) f (x) (1) For this (x, y), from the strict convexity f (x + t(y x)) f (x) <tf (y) tf (x) = f (x) T t(y x), t (0, 1) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 41 / 199

42 3-7: First-order Condition VI Therefore, f (x +t(y x)) < f (x)+ f (x) T t(y x), t (0, 1) However, this contradicts the first-order condition: f (x +t(y x)) f (x)+ f (x) T t(y x), t (0, 1) This proof was given by a student of this course before Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 42 / 199

43 3-8: Second-order condition I Proof of the 2nd-order condition: We consider only the simpler condition of n = 1 f (x + t) f (x) + f (x)t lim 2f (x + t) f (x) f (x)t t 0 t 2 2(f (x + t) f (x)) = lim = f (x) 0 t 0 2t Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 43 / 199

44 3-8: Second-order condition II f (x + t) = f (x) + f (x)t f ( x)t 2 f (x) + f (x)t by 1st-order condition The extension to general n is straightforward If 2 f (x) 0, then f is strictly convex Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 44 / 199

45 3-8: Second-order condition III Using 1st-order condition for strictly convex function: f (y) =f (x) + f (x) T (y x) (y x)t 2 f ( x)(y x) >f (x) + f (x) T (y x) Note that in our proof of 1st-order condition for strictly convex functions we used 2nd-order condition of convex function, so no contradiction here Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 45 / 199

46 3-8: Second-order condition IV It s possible that f is strictly convex but 2 f (x) 0 Example: Details omitted f (x) = x 4 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 46 / 199

47 3-9: Examples I Quadratic-over-linear f x = 2x y, f y = x2 y 2 2 f x x = 2 y, 2 f x y = 2x y, 2 f 2 y y = 2x 3 y, 3 [ ] 2 y [y ] x y 3 x = 2 [ ] [ ] y 2 1 xy y 3 xy x 2 y x y = 2 2 x x 2 y 2 y 3 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 47 / 199

48 3-10 I f (x) = log k exp x k f (x) = ex 1 k ex k. exn k ex k 2 iif = ( k ex k )ex i e x i ex i ( k ex k ) 2, 2 ijf = Note that if z k = exp x k exi exj ( k ex k ) 2, i j Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 48 / 199

49 3-10 II then (zz T ) ij = z i (z T ) j = z i z j Cauchy-Schwarz inequality (a 1 b a n b n ) 2 (a an)(b bn) 2 Note that a k = v k zk, b k = z k z k > 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 49 / 199

50 3-12: Jensen s inequality I General form f ( p(z)zdz) p(z)f (z)dz Discrete situation f ( p i z i ) p i f (z i ), p i = 1 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 50 / 199

51 3-12: Jensen s inequality II Proof: f (p 1 z 1 + p 2 z 2 + p 3 z 3 ) ( ( )) p1 z 1 + p 2 z 2 (1 p 3 ) f + p 3 f (z 3 ) 1 p ( 3 p1 (1 p 3 ) f (z 1 ) + p ) 2 f (z 2 ) + p 3 f (z 3 ) 1 p 3 1 p 3 =p 1 f (z 1 ) + p 2 f (z 2 ) + p 3 f (z 3 ) Note that p 1 1 p 3 + p 2 1 p 3 = 1 p 3 1 p 3 = 1 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 51 / 199

52 Positive weighted sum & composition with affine function I Composition with affine function: We know f (x) is convex Is g(x) = f (Ax + b) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 52 / 199

53 Positive weighted sum & composition with affine function II convex? g((1 α)x 1 + αx 2 ) =f (A((1 α)x 1 + αx 2 ) + b) =f ((1 α)(ax 1 + b) + α(ax 2 + b)) (1 α)f (Ax 1 + b) + αf (Ax 2 + b) =(1 α)g(x 1 ) + αg(x 2 ) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 53 / 199

54 3-15: Pointwise maximum I Proof of the convexity f ((1 α)x 1 + αx 2 ) = max(f 1 ((1 α)x 1 + αx 2 ),..., f m ((1 α)x 1 + αx 2 )) max((1 α)f 1 (x 1 ) + αf 1 (x 2 ),..., (1 α)f m (x 1 ) + αf m (x 2 )) (1 α) max(f 1 (x 1 ),..., f m (x 1 ))+ α max(f 1 (x 2 ),..., f m (x 2 )) (1 α)f (x 1 ) + αf (x 2 ) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 54 / 199

55 3-15: Pointwise maximum II For consider all combinations f (x) = x [1] + + x [r] ( ) n r Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 55 / 199

56 3-16: Pointwise supremum I The proof is similar to pointwise maximum Support function of a set C: When y is fixed, f (x, y) = y T x is linear (convex) in x Maximum eigenvales of symmetric matrix f (X, y) = y T Xy is a linear function of X when y is fixed Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 56 / 199

57 3-19: Minimization I Proof: Let ɛ > 0. y 1, y 2 C such that f (x 1, y 1 ) g(x 1 ) + ɛ f (x 2, y 2 ) g(x 2 ) + ɛ g(θx 1 + (1 θ)x 2 ) = inf y C f (θx 1 + (1 θ)x 2, y) f (θx 1 + (1 θ)x 2, θy 1 + (1 θ)y 2 ) θf (x 1, y 1 ) + (1 θ)f (x 2, y 2 ) θg(x 1 ) + (1 θ)g(x 2 ) + ɛ Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 57 / 199

58 3-19: Minimization II Note that the first inequality use the peroperty that C is convex to have θy 1 + (1 θ)y 2 C Because the above inequality holds for all ɛ > 0, g(θx 1 + (1 θ)x 2 ) θg(x 1 ) + (1 θ)g(x 2 ) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 58 / 199

59 3-19: Minimization III first example: The goal is to prove A BC 1 B T 0 Instead of a direct proof, here we use the property in this slide. First we have that f (x, y) is convex in (x, y) because [ ] A B B T 0 C Consider min y f (x, y) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 59 / 199

60 3-19: Minimization IV Because C 0, the minimum occurs at 2Cy + 2B T x = 0 Then y = C 1 B T x g(x) = x T Ax 2x T BC 1 Bx + x T BC 1 CC 1 B T x = x T (A BC 1 B T )x Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 60 / 199

61 3-19: Minimization V is convex. The second-order condition implies that A BC 1 B T 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 61 / 199

62 3-21: the conjugate function I This function is useful later When y is fixed, maximum happens at y = f (x) (2) by taking the derivative on x Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 62 / 199

63 3-21: the conjugate function II Explanation of the figure: when y is fixed z = xy is a straight line passing through the origin, where y is the slope of the line. Check under which x, yx and f (x) have the largest distance From the figure, the largest distance happens when (2) holds Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 63 / 199

64 z = x 0 f (x 0 ) + f (x 0 ) = x 0 y + f (x 0 ) = f (y) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 64 / : the conjugate function III About the point The tangent line is (0, f (y)) z f (x 0 ) x x 0 = f (x 0 ) where x 0 is the point satisfying When x = 0, y = f (x 0 )

65 3-21: the conjugate function IV f is convex: Given x, y T x f (x) is linear (convex) in y. Then we apply the property of pointwise supremum Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 65 / 199

66 3-22: examples I negative logarithm f (x) = log x x (xy + log x) = y + 1 x = 0 If y < 0, the picture of xy + log x Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 66 / 199

67 3-22: examples II Then xy + log x = 1 log( y) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 67 / 199

68 3-22: examples III strictly convex quadratic Qx = y, x = Q 1 y y T x 1 2 x T Qx =y T Q 1 y 1 2 y T Q 1 QQ 1 y = 1 2 y T Q 1 y Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 68 / 199

69 3-23: quasiconvex functions I Figure on slide: S α = [a, b], S β = (, c] Both are convex The figure is an example showing that quasi convex may not be convex Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 69 / 199

70 3-26: properties of quasiconvex functions I Modified Jensen inequality: f quasiconvex if and only if f (θx +(1 θ)y) max{f (x), f (y)}, x, y, θ [0, 1]. Let S is convex = max{f (x), f (y)} x S, y S θx + (1 θ)y S Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 70 / 199

71 3-26: properties of quasiconvex functions II f (θx + (1 θ)y) and the result is obtained If results are wrong, there exists α such that S α is not convex. x, y, θ with x, y S α, θ [0, 1] such that θx + (1 θ)y / S α Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 71 / 199

72 3-26: properties of quasiconvex functions III Then f (θx + (1 θ)y) > α max{f (x), f (y)} This violates the assumption First-order condition (this is exercise 3.43): Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 72 / 199

73 3-26: properties of quasiconvex functions IV f ((1 t)x + ty) max(f (x), f (y)) = f (x) f (x + t(y x)) f (x) 0 t f (x + t(y x)) f (x) lim = f (x) T (y x) 0 t 0 t Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 73 / 199

74 3-26: properties of quasiconvex functions V : If results are wrong, there exists α such that S α is not convex. x, y, θ with x, y S α, θ [0, 1] such that Then θx + (1 θ)y / S α f (θx + (1 θ)y) > α max{f (x), f (y)} (3) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 74 / 199

75 3-26: properties of quasiconvex functions VI Because f is differentiable, it is continuous. Without loss of generality, we have f (z) f (x), f (y), z between x and y Then z = x + θ(y x), θ (0, 1) f (z) T ( θ(y x)) 0 f (z) T (y x θ(y x)) 0 f (z) T (y x) = 0, θ (0, 1) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 75 / 199

76 3-26: properties of quasiconvex functions VII This contradicts (3). f (x + θ(y x)) =f (x) + f (t) T θ(y x) =f (x), θ [0, 1) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 76 / 199

77 3-27: Log-concave and log-convex functions I Powers: log(x a ) = a log x log x is concave Probability densities: log f (x) = 1 2 (x x)t Σ 1 (x x) + constant Σ 1 is positive definite. Thus log f (x) is concave Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 77 / 199

78 3-27: Log-concave and log-convex functions II Cumulative Gaussian distribution d 2 d 2 x log Φ(x) = log x e u2 /2 du d dx log Φ(x) = e x /2 du e u2 log Φ(x) x 2/2 = ( x e u2 /2 du)e x 2 /2 ( x) e x 2 /2 e x 2 /2 ( x e u2 /2 du) 2 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 78 / 199

79 3-27: Log-concave and log-convex functions III Need to prove that ( x Because e u2 /2 du ) x + e x 2 /2 > 0 x u for all u (, x], Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 79 / 199

80 3-27: Log-concave and log-convex functions IV we have = ( x x x ) e u2 /2 du x + e x 2 /2 xe u2 /2 du + e x 2 /2 ue u2 /2 du + e x 2 /2 = e u2 /2 x + e x 2 /2 = e x 2 /2 + e x 2 /2 = 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 80 / 199

81 3-27: Log-concave and log-convex functions V This proof was given by a student (and polished by another student) of this course before Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 81 / 199

82 4-3: Optimal and locally optimal points I f 0 (x) = 1/x f 0 (x) = x log x Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 82 / 199

83 4-3: Optimal and locally optimal points II f 0 (x) = x 3 3x f 0(x) = 1 + log x = 0 x = e 1 = 1/e Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 83 / 199

84 4-3: Optimal and locally optimal points III f 0(x) = 3x 2 3 = 0 x = ±1 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 84 / 199

85 f 0 (y) f 0 (x), for all feasible y Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 85 / : Optimality criterion for differentiable f 0 I : easy From first-order condition Together with we have f 0 (y) f 0 (x) + f 0 (x) T (y x) f 0 (x) T (y x) 0

86 4-9: Optimality criterion for differentiable f 0 II Assume the result is wrong. Then f 0 (x) T (y x) < 0 Let z(t) = ty + (1 t)x d dt f 0(z(t)) = f 0 (z(t)) T (y x) d dt f 0(z(t)) = f 0 (x) T (y x) < 0 t=0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 86 / 199

87 A(tx+(1 t)y) = tax+(1 t)ay = tb+(1 b)b = b Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 87 / : Optimality criterion for differentiable f 0 III There exists t such that f 0 (z(t)) < f 0 (x) Note that is feasible because z(t) f i (z(t)) tf i (x) + (1 t)f i (y) 0 and

88 4-9: Optimality criterion for differentiable f 0 IV Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 88 / 199

89 4-10 I Unconstrained problem: Let y = x t f 0 (x) It is feasible (unconstrained problem). Optimality condition implies Thus f 0 (x) T (y x) = t f 0 (x) 2 0 f 0 (x) = 0 Equality constrained problem Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 89 / 199

90 4-10 II Easy. For any feasible y, Ay = b f 0 (x)(y x) = ν T A(y x) = ν T (b b) = 0 0 So x is optimal : more complicated. We only do a rough explanation Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 90 / 199

91 4-10 III From optimality condition f 0 (x) T ν = f 0 (x) T ((x + ν) x) 0, ν N(A) N(A) is a subspace in 2-D. Thus ν N(A) ν N(A) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 91 / 199

92 4-10 IV f 0 N(A) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 92 / 199

93 4-10 V We have f 0 (x) T ν = 0, ν N(A) f 0 (x) N(A), f 0 (x) R(A T ) ν such that f 0 (x) + A T ν = 0 Minimization over nonnegative orthant Easy Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 93 / 199

94 4-10 VI For any y 0, i f 0 (x)(y i x i ) = { i f 0 (x)y i 0 if x i = 0 0 if x i > 0. Therefore, and f 0 (x) T (y x) 0 x is optimal Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 94 / 199

95 4-10 VII If x i = 0, we claim i f 0 (x) 0 Otherwise, i f 0 (x) < 0 Let y = x except y i f 0 (x) T (y x) = i f 0 (x)(y i x i ) This violates the optimality condition Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 95 / 199

96 4-10 VIII If x i > 0, we claim i f 0 (x) = 0 Otherwise, assume i f 0 (x) > 0 Consider y = x except y i = x i /2 > 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 96 / 199

97 4-10 IX It is feasible. Then f 0 (x) T (y x) = i f 0 (x)(y i x i ) = i f 0 (x)x i /2 < 0 violates the optimality condition. The situation for i f 0 (x) < 0 is similar Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 97 / 199

98 4-23: examples I c E(C) Σ E C ((C c)(c c)) Var(C T x) = E C ((C T x c T x)(c T x c T x)) = E C (x T (C c)(c c) T x) = x T Σx Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 98 / 199

99 4-25: second-order cone programming I Cone was defined on slide 2-8 {(x, t) x t} Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 99 / 199

100 4-35: generalized inequality constraint I f i R n R k i K i -convex: f i (θx + (1 θ)y) Ki θf i (x) + (1 θ)f i (y) See page 3-31 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 100 / 199

101 4-37: LP and SOCP as SDP I LP and equivalent SDP a 11 a 1n x 1 Ax =.. a m1 a mn x n 11 1n a x n a... x 1 1 b... a m1 0 a mn b m Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 101 / 199

102 4-37: LP and SOCP as SDP II For SOCP and SDP we will use results in 4-39: [ tip p A p q ] A T 0 A ti T A t 2 I q q, t 0 q q Now Thus p = m, q = 1 A = A i x + b i, t = c T i x + d i A i x + b i 2 (c T i x + d i ) 2, c T i x + d i 0 A i x + b i c T i x + d i Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 102 / 199

103 4-39: matrix norm minimization I Following 4-38, we have the following equivalent problem min subject to t A 2 t We then use A 2 t A T A t 2 I, t 0 [ ] ti A 0 ti A T Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 103 / 199

104 4-39: matrix norm minimization II to have the SDP min subject to t[ ] ti A(x) A(x) T 0 ti Next we prove [ ] tip p A p q A T 0 A ti T A t 2 I q q, t 0 q q Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 104 / 199

105 4-39: matrix norm minimization III we immediately have t 0 If t > 0, [ v T A T tv ] [ ] [ ] T ti p p A p q Av A T ti q q tv = [ v T A T tv ] [ ] T tav + tav A T Av + t 2 v =t(t 2 v T v v T A T Av) 0 v T (t 2 I A T A)v 0, v Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 105 / 199

106 4-39: matrix norm minimization IV and hence If t = 0 t 2 I A T A 0 [ v T A T v ] [ ] [ ] T 0 A Av A T 0 v = [ v T A T v ] [ ] T Av A T Av = 2v T A T Av 0, v Therefore A T A 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 106 / 199

107 4-39: matrix norm minimization V Consider [ u T v ] [ ] [ ] T ti A u A T ti v = [ u T v ] [ ] T tu + Av A T u + tv =tu T u + 2v T A T u + tv T v We hope to have tu T u + 2v T A T u + tv T v 0, (u, v) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 107 / 199

108 4-39: matrix norm minimization VI If t > 0 has optimum at We have min tu T u + 2v T A T u + tv T v u u = Av t tu T u + 2v T A T u + tv T v =tv T v v T A T Av t = 1 t v T (t 2 I A T A)v 0. Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 108 / 199

109 4-39: matrix norm minimization VII Hence [ ] ti A A T 0 ti If t = 0 A T A 0 Thus v T A T Av 0, v T A T Av = Av 2 = 0 Av = 0, v [ u T v ] [ [ ] T 0 A u A 0] T v = [ u T v ] [ ] T 0 A T = 0 0 u Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 109 / 199

110 4-39: matrix norm minimization VIII Thus [ ] 0 A A T 0 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 110 / 199

111 4-40: Vector optimization I Though f 0 (x) is a vector note that f i (x) is still R n R 1 K-convex See 3-31 though we didn t discuss it earlier Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 111 / 199

112 4-41: optimal and pareto optimal points I Optimal Pareto optimal O {x} + K (x K) O = {x} Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 112 / 199

113 5-3: Lagrange dual function I Note that g is concave no matter if the original problem is convex or not f 0 (x) + λ i f i (x) + ν i h i (x) is convex (linear) in λ, ν for each x Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 113 / 199

114 5-3: Lagrange dual function II Use pointwise supremum on 3-16 sup( f 0 (x) λ i f i (x) ν i h i (x)) x D is convex. Hence inf(f 0 (x) + λ i f i (x) + ν i h i (x)) is concave. Note that sup( ) = convex = inf( ) = concave Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 114 / 199

115 5-8: Lagrange dual and conjugate function I f 0 ( A T λ c T ν) = sup(( A T λ c T ν) T x f 0 (x)) x = inf (f 0(x) + (A T λ + c T ν) T x) x Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 115 / 199

116 5-10: weak and strong duality I We don t discuss the SDP problem on this slide because we omitted 5-7 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 116 / 199

117 5-11: Slater s constraint qualification I We omit the proof because of no time linear inequality do not need to hold with strict inequality : for linear inequalities we DO NOT need constraint qualification We will see some explanation later Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 117 / 199

118 5-12: inequality from LP I If we have only linear constraints, then constraint qualification holds Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 118 / 199

119 5-15: geometric interpretation I Explanation of g(λ): when λ is fixed λu + t = is a line. We lower until it touches the boundary of G The value then becomes g(λ) When u = 0 t = so we see the point marked as g(λ) on t-axis Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 119 / 199

120 5-15: geometric interpretation II We have λ 0, so λu + t = must be like rather than Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 120 / 199

121 5-15: geometric interpretation III Explanation of p : In G, only points satisfying u 0 are feasible Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 121 / 199

122 5-16 I We do not discuss a formal proof of Slater condition strong duality Instead, we explain this result by figures Reason of using A: G may not be convex Example: min x 2 subject to x Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 122 / 199

123 5-16 II This is a convex optimization problem G = {(x + 2, x 2 ) x R} is only a quadratic curve Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 123 / 199

124 5-16 III The curve is not convex However, A is convex A Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 124 / 199

125 5-16 IV Primal problem: x = 2 optimal objective value = 4 Dual problem: g(λ) = min x x 2 + λ(x + 2) x = λ/2 max λ 0 λ λ optimal λ = 4 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 125 / 199

126 5-16 V optimal objective value = = 4 Proving that A is convex (u 1, t 1 ) A, (u 2, t 2 ) A x 1, x 2 such that f 1 (x 1 ) u 1, f 0 (x 1 ) t 1 Consider f 1 (x 2 ) u 2, f 0 (x 2 ) t 2 x = θx 1 + (1 θ)x 2 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 126 / 199

127 5-16 VI We have f 1 (x) θu 1 + (1 θ)u 2 So f 0 (x) θt 1 + (1 θ)t 2 [ ] [ ] [ ] u u1 u2 = θ + (1 θ) A t t 1 t 2 Note that we have Slater condition strong duality However, it s possible that Slater condition doesn t hold but strong duality holds Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 127 / 199

128 5-16 VII Example from exercise 5.22: min x subject to x 2 0 Slater condition doesn t hold because no x satisfies x 2 < 0 G = {(x 2, x) x R} Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 128 / 199

129 5-16 VIII t u There is only one feasible point (0, 0) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 129 / 199

130 5-16 IX t A u Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 130 / 199

131 5-16 X Dual problem Strong duality holds g(λ) = min x + x 2 λ x { 1/(2λ) if λ > 0 x = if λ = 0 max λ 0 1/(4λ) λ, objective value 0 d = 0, p = 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 131 / 199

132 5-17: complementary slackness I In deriving the inequality we use h i (x ) = 0 and f i (x ) 0 Complementary slackness compare the earlier results in 4-10 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 132 / 199

133 5-19: KKT conditions for convex problem I For the problem on p5-16, neither slater condition nor KKT condition holds 1 λ0 Therefore, for convex problems, KKT optimality but not vice versa. Next we explain why for linear constraints we don t need constraint qualification Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 133 / 199

134 5-19: KKT conditions for convex problem II Consider the situation of inequality constraints only: min subject to f 0 (x) f i (x) 0, i = 1,..., m Consider an optimial solution x. We would like to prove that x satisfies KKT condition Because x is optimal, from the optimality condition on slide 4-9, for any feasible direction δx, f 0 (x) T δx 0. Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 134 / 199

135 f i (x) T δx 0 if f i (x) = 0. Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 135 / : KKT conditions for convex problem III A feasible δx means Because f i (x + δx) 0, i f i (x + δx) f i (x) + f i (x) T δx from we have f i (x) 0, i

136 0 and T f i (x) = 0, i : f i (x) = 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 136 / : KKT conditions for convex problem IV We claim that f 0 (x) = λ i 0,f i (x)=0 λ i f i (x) (4) Assume the result is wrong. First let s consider f 0 (x) =linear combination of { f i (x) f i (x) = 0} +, where

137 5-19: KKT conditions for convex problem V Then there exists α < 0 such that satisfies and δx α f i (x) T δx = 0 if f i (x) = 0 f i (x + δx) 0 if f i (x) < 0 f 0 (x) T δx = α T < 0 This contradicts the optimality condition Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 137 / 199

138 5-19: KKT conditions for convex problem VI By a similar setting we can further prove (4). Assume f 0 (x) = λ i f i (x) i:f i (x)=0 and there exists i such that λ i < 0, f i (x) 0, and f i (x) = 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 138 / 199

139 5-19: KKT conditions for convex problem VII Let Then λ = arg min λ f i (x) = f i (x) i:i i,f i (x)=0 i:i i,f i (x)=0 f i (x)λ i f i (x) λ i f i (x) T = 0, i i, f i (x) = 0 (5) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 139 / 199

140 5-19: KKT conditions for convex problem VIII We have If then f i (x) T 0 f i (x) T = 0 λ i f i (x) can be rearranged to use linear combination of { f i (x) i i, f i (x) = 0} Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 140 / 199

141 5-19: KKT conditions for convex problem IX Otherwise Let From (5), f i (x) T > 0 δx = α, α < 0. f i (x) T δx = 0, i i, f i (x) = 0 f i (x) T δx = α f i (x) T < 0. Hence δx is a feasible direction. Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 141 / 199

142 5-19: KKT conditions for convex problem X However, f 0 (x) T δx = αλ i f i (x) T < 0 contradicts the optimality condition This proof is not rigourous because of For linear the proof becomes rigourous Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 142 / 199

143 5-25 I Explanation of f 0 (ν) inf y (f 0(y) ν T y) = sup(ν T y f 0 (y)) = f0 (ν) y where f0 (ν) is the conjugate function Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 143 / 199

144 5-26 I The original problem Dual norm: If ν > 1, g(λ, ν) = inf x Ax b = constant ν sup{ν T y y 1} ν T y > 1, y 1 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 144 / 199

145 5-26 II inf y + ν T y y ν T y < 0 Hence ty ν T (ty ) as t inf y If ν 1, we claim that y + νt y = inf y y + νt y = 0 y = 0 y + ν T y = 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 145 / 199

146 5-26 III If y such that y + ν T y < 0 then y < ν T y We can scale y so that sup{ν T y y 1} > 1 but this causes a contradiction Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 146 / 199

147 5-27: implicit constraint I The dual function c T x + ν T (Ax b) = b T ν + x T (A T ν + c) inf x i(a T ν + c) i = (A T ν + c) i 1 x i 1 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 147 / 199

148 5-30: semidefinite program I From 5-29 we need that Z is non-negative in the dual cone of S k + Dual cone of S k + is S k + (we didn t discuss dual cone so we assume this result) Why tr(z( ))? We are supposed to do component-wise produt between Z and x 1 F x n F n G Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 148 / 199

149 5-30: semidefinite program II Trace is the component-wise product tr(ab) = i = i (AB) ii A ij B ji = j i A ij B ij j Note that we take the property that B is symmetric Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 149 / 199

150 7-4 I Uniform noise p(z) = { 1 2a if z a 0 otherwise Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 150 / 199

151 8-10: Dual of maximum margin problem I Largangian: a 2 λ i (a T x i + b 1) + i i = a 2 + at ( λ i x i + µ i y i ) i i µ i (a T y i + b + 1) + b( i λ i + i µ i ) + i λ i + i µ i Because of b( i λ i + i µ i ) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 151 / 199

152 8-10: Dual of maximum margin problem II we have inf L a,b { a inf a = 2 i λ ia T x i + i µ ia T y i if i λ i = i µ i if i λ i i µ i For inf a a 2 i λ i a T x i + i µ i a T y i Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 152 / 199

153 8-10: Dual of maximum margin problem III we can denote it as inf a a 2 + v T a where v is a vector. We cannot do derivative because a is not differentiable. Formal solution: Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 153 / 199

154 8-10: Dual of maximum margin problem IV Case 1: If v 1/2: so However, a T v a v a 2 inf a a 2 + v T a 0. a = 0 a 2 + v T a = 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 154 / 199

155 8-10: Dual of maximum margin problem V Therefore If v > 1/2, let inf a a 2 + v T a = t 2 t v a 2 + v T a = 0. a = tv v =t( 1 2 v ) if t Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 155 / 199

156 8-10: Dual of maximum margin problem VI Thus inf a a 2 + v T a = Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 156 / 199

157 8-14. θ = vec(p) z i z j q, F (z) =. r z i. 1 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 157 / 199

158 10-3: initial point and sublevel set I The condition that S is closed if f (x) as x boundary of domain f Proof: if not, consider {x i } S such that Then x i boundary f (x i ) > f (x 0 ) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 158 / 199

159 10-3: initial point and sublevel set II and Example S is not closed m f (x) = log( exp(ai T x + b i )) i=1 domain = R n Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 159 / 199

160 10-3: initial point and sublevel set III Example f (x) = i log(b i a T i x) We use the condition that domain R n f (x) as x boundary of domain f Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 160 / 199

161 f (y) f (x 0 ) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 161 / : strong convexity and implications I S is bounded. Otherwise, there exists a set {y i y i = x + i } S satisfying Then lim i = i f (y i ) f (x) + f (x) T i + m 2 i 2 This contradicts

162 10-4: strong convexity and implications II Proof of and From p > f (x) p 1 f (x) 2 2m f (y) f (x) + f (x) T (y x) + m x y 2 2 Minimize the right-hand side with respect to y f (x) + m(y x) = 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 162 / 199

163 10-4: strong convexity and implications III ỹ = x f (x) m f (y) f (x) + f (x) T (ỹ x) + m ỹ x 2 2 = f (x) 1 2m f (x) 2, y Then p f (x) 1 f (x) 2 2m and f (x) p 1 f (x) 2 2m Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 163 / 199

164 10-5: descent methods I If then f (x + t x) < f (x) f (x) T x < 0 Proof: From the first-order condition of a convex function Then f (x + t x) f (x) + t f (x) T x t f (x) T x f (x + t x) f (x) < 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 164 / 199

165 10-6: line search types I Why α (0, 1 2 )? The use of 1/2 is for convergence though we won t discuss details Finite termination of backtracking line search. We argue that t > 0 such that f (x + t x) < f (x) + αt f (x) T x, t (0, t ) Otherwise, {t k } 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 165 / 199

166 10-6: line search types II such that f (x + t k x) f (x) + αt k f (x) T x, k However, f (x + t k x) f (x) lim t k 0 t k = f (x) T x α f (x) T x cause a contradiction f (x) T x < 0 and α > 0 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 166 / 199

167 10-6: line search types III Geometric interpretation: the tangent line passes through (0, f (x)), so the equation is y f (x) t 0 = f (x) T x Because f (x) T x < 0, we see that the line of f (x) + αt f (x) T x Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 167 / 199

168 10-6: line search types IV is above that of f (x) + t f (x) T x Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 168 / 199

169 10-7 I Linear convergence. We consider exact line search; proof for backtracking line search is more complicated S closed and bounded 2 f (x) MI, x S f (y) f (x) + f (x) T (y x) + M 2 y x 2 Solve min t f (x) t f (x) T f (x) + t2 M 2 f (x)t f (x) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 169 / 199

170 10-7 II t = 1 M f (x next ) f (x 1 1 f (x)) f (x) M 2M f (x)t f (x) The first inequality is from the fact that we use exact line search f (x next ) p f (x) p 1 2M f (x)t f (x) From slide 10-4, f (x) 2 2m(f (x) p ) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 170 / 199

171 10-7 III Hence f (x next ) p (1 m M )(f (x) p ) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 171 / 199

172 10-8 I Assume x k 1 = γ( γ 1 γ + 1 )k, x k 2 = ( γ 1 γ + 1 )k, f (x 1, x 2 ) = [ x1 γx 2 ] 1 min t 2 ((x 1 tx 1 ) 2 + γ(x 2 tγx 2 ) 2 ) 1 min t 2 (x 1 2 (1 t) 2 + γx2 2 (1 tγ) 2 ) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 172 / 199

173 10-8 II x 2 1 (1 t) + γx 2 2 (1 tγ)( γ) = 0 x tx 2 1 γ 2 x γ 3 tx 2 2 = 0 t = x γ2 x 2 2 x γ3 x 2 2 t(x γ 3 x 2 2 ) = x γ 2 x 2 2 = γ2 ( γ 1 γ+1 )2k + γ 2 ( γ 1 γ+1 )2k γ 2 ( γ 1 γ+1 )2k + γ 3 ( γ 1 γ+1 )2k = 2γ2 γ 2 + γ = γ [ ] x k+1 = x k t f (x k x k ) = 1 (1 t) (1 γt) x k 2 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 173 / 199

174 10-8 III x k+1 1 = γ( γ 1 γ + 1 )k ( γ γ ) = γ(γ 1 γ + 1 )k+1 x k+1 2 = ( γ 1 γ + 1 )k (1 2γ 1 + γ ) = ( γ 1 γ + 1 )k ( 1 γ 1 + γ ) = ( γ 1 γ + 1 )k+1 Why gradient is orghogonal to the tangent line of the contour curve? Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 174 / 199

175 10-8 IV Assume f (g(t)) is the countour with g(0) = x Then 0 = f (g(t)) f (g(0)) f (g(t)) f (g(0)) 0 = lim t 0 t = lim f (g(t)) T g(t) t 0 = f (x) T g(0) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 175 / 199

176 10-8 V where is the tangent line x + t g(0) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 176 / 199

177 10-10 I linear convergence: from slide 10-7 f (x k ) p c k (f (x 0 ) p ) log(c k (f (x 0 ) p )) = k log c + log(f (x 0 ) p ) is a straight line. Note that now k is the x-axis Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 177 / 199

178 10-11: steepest descent method I (unnormalized) steepest descent direction: x sd = f (x) x nsd Here is the dual norm We didn t discuss much about dual norm, but we can still explain some examples on Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 178 / 199

179 10-12 I Euclidean: x nsd is by solving min f T v subject to v = 1 f T v = f v cos θ = f when cos θ = π x nsd = f (x) f (x) f (x) = f (x) f (x) f (x) x nsd = f (x) = f (x) f (x) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 179 / 199

180 10-12 II Quadratic norm: x nsd is by solving min f T v subject to v T Pv = 1 Now v P = v T Pv, where P is symmetric positive definite Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 180 / 199

181 10-12 III Let w = P 1/2 v The optimization problem becomes min w f T P 1/2 w subject to w = 1 optimal w = P 1/2 f P 1/2 f = P 1/2 f f T P 1 f Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 181 / 199

182 10-12 IV optimal v = P 1 f f T P 1 f Dual norm Therefore x sd z = P 1/2 z = f T P 1 f P 1 f f T P 1 f = P 1 f Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 182 / 199

183 10-12 V Explanation of the figure: f (x) T x nsd = f (x) x nsd cos θ f (x) is a constant. From a point x nsd on the boundary, the projected point on f (x) indicates x nsd cos θ In the figure, we see that the chosen x nsd has the largest x nsd cos θ We omit the discusssion of l 1 -norm Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 183 / 199

184 10-13 I The two figures are by using two P matrices The left one has faster convergence Gradient descent after change of variables min x x = P 1/2 x, x = P 1/2 x f (x) min f (P 1/2 x) x x x αp 1/2 x f (P 1/2 x) P 1/2 x P 1/2 x αp 1/2 x f (x) x x αp 1 x f (x) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 184 / 199

185 10-14 I f (x) x Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 185 / 199

186 10-14 II Solve f (x) = 0 Fining the tangent line at x k : x k : the current iterate Let y = 0 y f (x k ) x x k = f (x k ) x k+1 = x k f (x k )/f (x k ) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 186 / 199

187 10-16 I ˆf (y) = f (x) + f (x) T (y x) (y x)t 2 f (x)(y x) ˆf (y) = 0 = f (x) + 2 f (x)(y x) y x = 2 f (x) 1 f (x) inf y ˆf (y) = f (x) 1 2 f (x)t 2 f (x) 1 f (x) f (x) inf y ˆf (y) = 1 2 λ(x)2 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 187 / 199

188 10-16 II Norm of the Newton step in the quadratic Hessian norm x nt = 2 f (x) 1 f (x) x T nt 2 f (x) x nt = f (x) T 2 f (x) 1 f (x) = λ(x) 2 Directional derivative in the Newton direction f (x + t nt ) f (x) lim t 0 t = f (x) T x nt = f (x) T 2 f (x) 1 f (x) = λ(x) 2 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 188 / 199

189 10-16 III Affine invariant f (y) f (Ty) = f (x) Assume T is an invertable square matrix. Then λ(y) = λ(ty) Proof: f (y) = T T f (Ty) 2 f (y) = T T 2 f (Ty)T Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 189 / 199

190 10-16 IV λ(y) 2 = f (y) T 2 f (y) 1 f (y) = f (Ty) T TT 1 2 f (Ty) 1 T T T T f (Ty) = f (Ty) T 2 f (Ty) 1 f (Ty) = λ(ty) 2 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 190 / 199

191 10-17 I Affine invariant y nt = 2 f (y) 1 f (y) = T 1 2 f (Ty) 1 f (Ty) = T 1 x nt Note that y k = T 1 x k so y k+1 = T 1 x k+1 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 191 / 199

192 10-17 II But how about line search f (y) T y nt = f (Ty) T TT 1 x nt = f (x) T x nt Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 192 / 199

193 10-19 I η (0, m2 L ) f (x k ) η m2 L L 2m 2 f (x k) 1 2 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 193 / 199

194 10-20 I Let f (x l ) f (x ) 1 2m f (x l) 2 1 2m 4m 4 L 2 (1 2 )2l k 2 = 2m3 L 2 (1 2 )2l k+1 ɛ ɛ 0 = 2m3 L 2 (from p10-4) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 194 / 199

195 10-20 II In at most iterations, we have log 2 ɛ 0 2 l k+1 log 2 ɛ 2 l k+1 log 2 (ɛ 0 /ɛ) l k 1 + log 2 log 2 (ɛ 0 /ɛ) k f (x 0) p r f (x 0 ) p r + log 2 log 2 (ɛ 0 /ɛ) f (x l ) f (x ) ɛ Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 195 / 199

196 10-29: implementation I λ(x) = ( f (x) 2 f (x) 1 f (x)) 1/2 = (g T L T L 1 g) 1/2 = L 1 g 2 Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 196 / 199

197 10-30: example of dense Newton systems with structure I ψ 1 (x 1) f (x) =. + A T ψ 0 (Ax + b) ψ n(x n ) ψ 2 1 (x 1) f (x) =... + A T 2 ψ0(ax 2 + b)a ψ n(x n ) method 2: x = D 1 ( g A T L o w) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 197 / 199

198 10-30: example of dense Newton systems with structure II L T 0 AD 1 ( g A T L 0 w) = w Cost (I + L T 0 AD 1 A T L 0 )w = L T 0 AD 1 g L 0 : p p A T L 0 : n p, cost : O(np) (L T 0 A)D 1 (A T L 0 ) : O(p 2 n) Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 198 / 199

199 10-30: example of dense Newton systems with structure III Note that Cholesky factorization of H 0 costs as 1 3 p3 p 2 n p n Chih-Jen Lin (National Taiwan Univ.) Optimization and Machine Learning 199 / 199

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: 1-5: Least-squares I A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: f (x) =(Ax b) T (Ax b) =x T A T Ax 2b T Ax + b T b f (x) = 2A T Ax 2A T b = 0 Chih-Jen Lin (National

More information

A Brief Review on Convex Optimization

A Brief Review on Convex Optimization A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

More information

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

4. Convex optimization problems

4. Convex optimization problems Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization

More information

Lecture: Duality.

Lecture: Duality. Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Lecture: Duality of LP, SOCP and SDP

Lecture: Duality of LP, SOCP and SDP 1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

Convex optimization problems. Optimization problem in standard form

Convex optimization problems. Optimization problem in standard form Convex optimization problems optimization problem in standard form convex optimization problems linear optimization quadratic optimization geometric programming quasiconvex optimization generalized inequality

More information

Lecture: Convex Optimization Problems

Lecture: Convex Optimization Problems 1/36 Lecture: Convex Optimization Problems http://bicmr.pku.edu.cn/~wenzw/opt-2015-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/36 optimization

More information

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

4. Convex optimization problems (part 1: general)

4. Convex optimization problems (part 1: general) EE/AA 578, Univ of Washington, Fall 2016 4. Convex optimization problems (part 1: general) optimization problem in standard form convex optimization problems quasiconvex optimization 4 1 Optimization problem

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Convex Optimization & Lagrange Duality

Convex Optimization & Lagrange Duality Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

4. Convex optimization problems

4. Convex optimization problems Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization

More information

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions 3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions

More information

Unconstrained minimization

Unconstrained minimization CSCI5254: Convex Optimization & Its Applications Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions 1 Unconstrained

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Convex Optimization Fourth lecture, 05.05.2010 Jun.-Prof. Matthias Hein Reminder from last time Convex functions: first-order condition: f(y) f(x) + f x,y x, second-order

More information

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions 3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions

More information

Interior Point Algorithms for Constrained Convex Optimization

Interior Point Algorithms for Constrained Convex Optimization Interior Point Algorithms for Constrained Convex Optimization Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Inequality constrained minimization problems

More information

Optimisation convexe: performance, complexité et applications.

Optimisation convexe: performance, complexité et applications. Optimisation convexe: performance, complexité et applications. Introduction, convexité, dualité. A. d Aspremont. M2 OJME: Optimisation convexe. 1/128 Today Convex optimization: introduction Course organization

More information

EE/AA 578, Univ of Washington, Fall Duality

EE/AA 578, Univ of Washington, Fall Duality 7. Duality EE/AA 578, Univ of Washington, Fall 2016 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Convex Functions Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Definition convex function Examples

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus 1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Lecture 2: Convex Sets and Functions

Lecture 2: Convex Sets and Functions Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are

More information

Unconstrained minimization: assumptions

Unconstrained minimization: assumptions Unconstrained minimization I terminology and assumptions I gradient descent method I steepest descent method I Newton s method I self-concordant functions I implementation IOE 611: Nonlinear Programming,

More information

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40 Outline Unconstrained minimization

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

Convex Optimization in Communications and Signal Processing

Convex Optimization in Communications and Signal Processing Convex Optimization in Communications and Signal Processing Prof. Dr.-Ing. Wolfgang Gerstacker 1 University of Erlangen-Nürnberg Institute for Digital Communications National Technical University of Ukraine,

More information

subject to (x 2)(x 4) u,

subject to (x 2)(x 4) u, Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the

More information

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Duality Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Lagrangian Consider the optimization problem in standard form

More information

Lecture 2: Convex functions

Lecture 2: Convex functions Lecture 2: Convex functions f : R n R is convex if dom f is convex and for all x, y dom f, θ [0, 1] f is concave if f is convex f(θx + (1 θ)y) θf(x) + (1 θ)f(y) x x convex concave neither x examples (on

More information

Convex Functions. Pontus Giselsson

Convex Functions. Pontus Giselsson Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

Linear and non-linear programming

Linear and non-linear programming Linear and non-linear programming Benjamin Recht March 11, 2005 The Gameplan Constrained Optimization Convexity Duality Applications/Taxonomy 1 Constrained Optimization minimize f(x) subject to g j (x)

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness. CS/ECE/ISyE 524 Introduction to Optimization Spring 2016 17 14. Duality ˆ Upper and lower bounds ˆ General duality ˆ Constraint qualifications ˆ Counterexample ˆ Complementary slackness ˆ Examples ˆ Sensitivity

More information

Summer School: Semidefinite Optimization

Summer School: Semidefinite Optimization Summer School: Semidefinite Optimization Christine Bachoc Université Bordeaux I, IMB Research Training Group Experimental and Constructive Algebra Haus Karrenberg, Sept. 3 - Sept. 7, 2012 Duality Theory

More information

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods March 23, 2018 1 / 35 This material is covered in S. Boyd, L. Vandenberge s book Convex Optimization https://web.stanford.edu/~boyd/cvxbook/.

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given. HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard

More information

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as Chapter 8 Geometric problems 8.1 Projection on a set The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as dist(x 0,C) = inf{ x 0 x x C}. The infimum here is always achieved.

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual

More information

The proximal mapping

The proximal mapping The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function

More information

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

In English, this means that if we travel on a straight line between any two points in C, then we never leave C. Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from

More information

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics 1. Introduction Convex Optimization Boyd & Vandenberghe mathematical optimization least-squares and linear programming convex optimization example course goals and topics nonlinear optimization brief history

More information

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics 1. Introduction Convex Optimization Boyd & Vandenberghe mathematical optimization least-squares and linear programming convex optimization example course goals and topics nonlinear optimization brief history

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Lecture 18: Optimization Programming

Lecture 18: Optimization Programming Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

More information

Lagrangian Duality and Convex Optimization

Lagrangian Duality and Convex Optimization Lagrangian Duality and Convex Optimization David Rosenberg New York University February 11, 2015 David Rosenberg (New York University) DS-GA 1003 February 11, 2015 1 / 24 Introduction Why Convex Optimization?

More information

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010 I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0

More information

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives Subgradients subgradients and quasigradients subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE392o, Stanford University Basic inequality recall basic

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form: 0.1 N. L. P. Katta G. Murty, IOE 611 Lecture slides Introductory Lecture NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP does not include everything

More information

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications Professor M. Chiang Electrical Engineering Department, Princeton University March

More information

Lecture 8. Strong Duality Results. September 22, 2008

Lecture 8. Strong Duality Results. September 22, 2008 Strong Duality Results September 22, 2008 Outline Lecture 8 Slater Condition and its Variations Convex Objective with Linear Inequality Constraints Quadratic Objective over Quadratic Constraints Representation

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

Equality constrained minimization

Equality constrained minimization Chapter 10 Equality constrained minimization 10.1 Equality constrained minimization problems In this chapter we describe methods for solving a convex optimization problem with equality constraints, minimize

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian

More information

Additional Homework Problems

Additional Homework Problems Additional Homework Problems Robert M. Freund April, 2004 2004 Massachusetts Institute of Technology. 1 2 1 Exercises 1. Let IR n + denote the nonnegative orthant, namely IR + n = {x IR n x j ( ) 0,j =1,...,n}.

More information

Inequality Constraints

Inequality Constraints Chapter 2 Inequality Constraints 2.1 Optimality Conditions Early in multivariate calculus we learn the significance of differentiability in finding minimizers. In this section we begin our study of the

More information

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture Note 5: Semidefinite Programming for Stability Analysis ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State

More information

Tutorial on Convex Optimization for Engineers Part II

Tutorial on Convex Optimization for Engineers Part II Tutorial on Convex Optimization for Engineers Part II M.Sc. Jens Steinwandt Communications Research Laboratory Ilmenau University of Technology PO Box 100565 D-98684 Ilmenau, Germany jens.steinwandt@tu-ilmenau.de

More information

10 Numerical methods for constrained problems

10 Numerical methods for constrained problems 10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside

More information

Convex functions. Definition. f : R n! R is convex if dom f is a convex set and. f ( x +(1 )y) < f (x)+(1 )f (y) f ( x +(1 )y) apple f (x)+(1 )f (y)

Convex functions. Definition. f : R n! R is convex if dom f is a convex set and. f ( x +(1 )y) < f (x)+(1 )f (y) f ( x +(1 )y) apple f (x)+(1 )f (y) Convex functions I basic properties and I operations that preserve convexity I quasiconvex functions I log-concave and log-convex functions IOE 611: Nonlinear Programming, Fall 2017 3. Convex functions

More information

Convex Functions. Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK)

Convex Functions. Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK) Convex Functions Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK) Course on Convex Optimization for Wireless Comm. and Signal Proc. Jointly taught by Daniel P. Palomar and Wing-Kin (Ken) Ma

More information

8. Geometric problems

8. Geometric problems 8. Geometric problems Convex Optimization Boyd & Vandenberghe extremal volume ellipsoids centering classification placement and facility location 8 Minimum volume ellipsoid around a set Löwner-John ellipsoid

More information

8. Geometric problems

8. Geometric problems 8. Geometric problems Convex Optimization Boyd & Vandenberghe extremal volume ellipsoids centering classification placement and facility location 8 1 Minimum volume ellipsoid around a set Löwner-John ellipsoid

More information

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lagrange duality. The Lagrangian. We consider an optimization program of the form Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

Lecture: Introduction to LP, SDP and SOCP

Lecture: Introduction to LP, SDP and SOCP Lecture: Introduction to LP, SDP and SOCP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2015.html wenzw@pku.edu.cn Acknowledgement:

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

Applications of Linear Programming

Applications of Linear Programming Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Duality in Nonlinear Optimization ) Tamás TERLAKY Computing and Software McMaster University Hamilton, January 2004 terlaky@mcmaster.ca Tel: 27780 Optimality

More information

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers.

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers. Chapter 3 Duality in Banach Space Modern optimization theory largely centers around the interplay of a normed vector space and its corresponding dual. The notion of duality is important for the following

More information

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS Here we consider systems of linear constraints, consisting of equations or inequalities or both. A feasible solution

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

15. Conic optimization

15. Conic optimization L. Vandenberghe EE236C (Spring 216) 15. Conic optimization conic linear program examples modeling duality 15-1 Generalized (conic) inequalities Conic inequality: a constraint x K where K is a convex cone

More information

Lecture 9 Sequential unconstrained minimization

Lecture 9 Sequential unconstrained minimization S. Boyd EE364 Lecture 9 Sequential unconstrained minimization brief history of SUMT & IP methods logarithmic barrier function central path UMT & SUMT complexity analysis feasibility phase generalized inequalities

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem: HT05: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford Convex Optimization and slides based on Arthur Gretton s Advanced Topics in Machine Learning course

More information

Duality Theory of Constrained Optimization

Duality Theory of Constrained Optimization Duality Theory of Constrained Optimization Robert M. Freund April, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 The Practical Importance of Duality Duality is pervasive

More information

Lecture 7: Convex Optimizations

Lecture 7: Convex Optimizations Lecture 7: Convex Optimizations Radu Balan, David Levermore March 29, 2018 Convex Sets. Convex Functions A set S R n is called a convex set if for any points x, y S the line segment [x, y] := {tx + (1

More information

Homework Set #6 - Solutions

Homework Set #6 - Solutions EE 15 - Applications of Convex Optimization in Signal Processing and Communications Dr Andre Tkacenko JPL Third Term 11-1 Homework Set #6 - Solutions 1 a The feasible set is the interval [ 4] The unique

More information

Lecture 14 Barrier method

Lecture 14 Barrier method L. Vandenberghe EE236A (Fall 2013-14) Lecture 14 Barrier method centering problem Newton decrement local convergence of Newton method short-step barrier method global convergence of Newton method predictor-corrector

More information

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem: CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

More information

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008. 1 ECONOMICS 594: LECTURE NOTES CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS W. Erwin Diewert January 31, 2008. 1. Introduction Many economic problems have the following structure: (i) a linear function

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information