In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem

Size: px

Start display at page:

Download "In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem"

Kristian Logan
5 years ago
Views:

1 1 Conve Analsis Main references: Vandenberghe UCLA): EECS236C - Optimiation methods for large scale sstems, vandenbe/ee236c.html Parikh and Bod, Proimal algorithms, slides and note. bod/papers/pro_algs.html Bod, ADMM bod/admm.html Simon Foucart and Holger Rauhut, Appendi B. 1.1 Motivations: Conve optimiation problems In applications, we encounter man constrained optimiation problems. Eamples Basis pursuit: eact sparse recover problem or robust recover problem Image processing: The constrained can be a conve set C. That is we can define an indicator function min 1 subject to A = b. min 1 subject to A b 2 2 ɛ. min 1 subject to A b 2 2 ɛ. min f 0 ) subject to A C ι C ) = { 0 if C + otherwise. We can rewrite the constrained minimiation problem as a unconstrained minimiation problem: This can be reformulated as In abstract form, we encounter we can epress it as For more applications, see Bod s book. min f 0 ) + ι C A). min, f 0) + ι C ) subject to A =. min f) + ga) min f) + g) subject to A =. A standard conve optimiation problem can be formulated as min f 0) X subject to A = and f i ) b i, i = 1,..., M Here, f i s are conve. The space X is a Hilbert space. Here, we just take X = R N. 1

2 1.2 Conve functions Goal: We want to etend theor of smooth conve analsis to non-differentiable conve functions. Let X be a separable Hilbert space, f : X, + ] be a function. Proper: f is called proper if f) < for at least one. The domain of f is defined to be: domf = { f) < }. Lower Semi-continuit: f is called lower semi-continuous if lim inf n f n ) f ). The set epif := {, η) f) η} is called the epigraph of f. Prop: f is l.s.c. if and onl if epif is closed. Sometimes, we call such f closed. proofwiki.org/wiki/characteriation_of_lower_semicontinuit) The indicator function ι C of a set C is closed if and onl if C is closed. Conve function f is called conve if dom f is conve and Jensen s inequalit holds: f1 θ) + θ) 1 θ)f) + θf) for all 0 θ 1 and an, X. Proposition: f is conve if and onl if epif is conve. First-order condition: for f C 1, epif being conve is equivalent to f) f) + f), for all, X. Second-order condition: for f C 2, Jensen s inequalit is equivalent to 2 f) 0. If f α is a famil of conve function, then sup α f α is again a conve function. Strictl conve: f is called strictl conve if the strict Jensen inequalit holds: for and t 0, 1), f1 t) + t) < 1 t)f) + tf). First-order condition: for f C 1, the strict Jensen inequalit is equivalent to f) > f) + f), for all, X. Second-order condition: for f C 2, 2 f) 0) = strict Jensen s inequalit is equivalent to. Proposition 1.1. A conve function f : R N R is continuous. Proposition 1.2. Let f : R N, ] be conve. Then 1. a local minimier of f is also a global minimier; 2. the set of minimiers is conve; 3. if f is strictl conve, then the minimier is unique. 2

3 1.3 Gradients of conve functions Proposition 1.3 Monotonicit of f)). Suppose f C 1. Then f is conve if and onl if domf is conve and f) is a monotone operator: f) f), 0. Proof. 1. ) From conveit f) f) + f),, f) f) + f),. Add these two, we get monotonicit of f). 2. ) Let gt) = f+t )). Then g t) = f+t )), g 0) b monotonicit. Hence f) = g1) = g0) g t) dt g0) g 0) dt = f) + f), Proposition 1.4. Suppose f is conve and in C 1. The following statements are equivalent. a) Lipschit continuit of f): there eists an L > 0 such that f) f) L for all, domf. b) g) := L 2 2 f) is conve. c) Quadratic upper bound f) f) + f), + L 2 2. d) Co-coercivit f) f), 1 L f) f) 2. Proof. 1. a) b): f) f), f) f) L 2 g) g), = L ) f) f)), 0 Therefore, g) is monotonic and thus g is conve. 2. b) c): g is conve g) g) + g), L 2 2 f) L 2 2 f) + L f), f) f) + f), + L

4 3. b) d): Define f ) = f) f),, f ) = f) f),. From b), both L/2) 2 f ) and L/2) 2 f ) are conve, and = minimies f. Thus from the proposition below f) f) f), = f ) f ) 1 2L f ) 2 = 1 2L f) f) 2. Similarl, = minimies f ), we get f) f) f), 1 2L f) f) 2. Adding these two together, we get the co-coercivit. 4. d) a): b Cauch inequalit. Proposition 1.5. Suppose f is conve and in C 1 with f) being Lipschit continuous with parameter L. Suppose is a global minimum of f. Then Proof. 1 2L f) 2 f) f ) L Right-hand inequalit follows from quadratic upper bound. 2. Left-hand inequalit follows b minimiing quadratic upper bound f) + f), + L2 ) 2 f ) = inf f) inf = f) 1 2L f) Strong conveit f is called strongl conve if domf is conve and the strong Jensen inequalit holds: there eists a constant m > 0 such that for an, domf and t [0, 1], ft + 1 t)) tf) + 1 t)f) m 2 t1 t) 2. This definition is equivalent to the conveit of g) := f) m 2 2. This comes from the calculation 1 t) 2 + t 2 1 t) + t 2 = t1 t) 2. Whenf C 2, then strong conveit of f is equivalent to 2 f) mi for an domf. Proposition 1.6. Suppose f C 1. The following statements are equivalent: a) f is strongl conve, i.e. g) = f) m 2 2 is conve, b) for an, domf, f) f), m 2. c) quadratic lower bound): f) f) + f), + m

5 Proposition 1.7. If f is strongl conve, then f has a unique global minimier which satisfies Proof. m 2 2 f) f ) 1 2m f) 2 1. For lelf-hand inequalit, we appl quadratic lower bound for all domf. f) f ) + f ), + m 2 2 = m For right-hand inequalit, quadratic lower bound gives f ) = inf f) inf f) + f), + m 2 2) f) 1 2m f) 2 We take infimum in then get the left-hand inequalit. Proposition 1.8. Suppose f is both strongl conve with parameter m and f) is Lipschit continuous with parameter L. Then f satisfies stronger co-coercivit condition f) f), ml m + L m + L f) f) 2. Proof. 1. Consider g) = f) m 2 2. From strong conveit of f, we get g) is conve. 2. From Lipschit of f, we get g is also Lipschit continuous with parameter L m. 3. We appl co-coercivit to g): g) g), 1 g) g) 2 L m f) f) m ), 1 f) f) m ) 2 L m 1 + 2m ) f) f), 1 ) m 2 L m L m f) f) 2 + L m + m Subdifferential Let f be conve. The subdifferential of f at a point is a set defined b f) = {u X X) f) + u, f)} f) is also called subgradients of f at. Proposition 1. a) If f is conve and differentiable at, then f) = { f)}. b) If f is conve, then f) is a closed conve set. Let f) =. Then f0) = [ 1, 1]. Let C be a closed conve set on R N. Then C is locall rectifiable. Moreover, ι C ) = {λn λ 0, n is the unit outer normal of C at }. 5

6 Proposition 1.9. Let f : R n, ] be conve and closed. Then is a minimum of f if and onl if 0 f ). Proposition The subdifferential of a conve function f is a set-valued monotone operator. That is, if u f), v f), then u v, 0. Proof. From f) f) + u,, Combining these two inequalit, we get monotonicit. Proposition The following statements are equivalent. 1) f is strongl conve i.e. f m 2 2 is conve); 2) quadratic lower bound) f) f) + v,, f) f) + u, + m 2 for 2 an, where u f); 3) Strong monotonicit of f): 1.6 Proimal operator u v, m 2, for an, with an u f), v f). Definition 1.1. Given a conve function f, the proimal mapping of f is defined as pro f ) := argmin u fu) + 1 ) 2 u 2. Since fu) + 1/2 u 2 is strongl conve in u, we get unique minimum. Thus, pro f ) is welldefined. Eamples Let C be a conve set. Define indicator function ι C ) as { 0 if C ι C ) = otherwise. Then pro ιc ) is the projection of onto C. f) = 1 : pro f is the soft-thresholding: P C C and C), P C ), P C ) 0. i 1 if i 1 pro f ) i = 0 if i 1 i + 1 if i 1 6

7 Properties Let f be conve. Then = pro f ) = argmin u fu) u 2 ) if and onl if or Sometimes, we epress this as 0 f) + + f). pro f ) = = I + f) 1 ). Co-coercivit: pro f ), pro f ), pro f ) pro f ) 2. Let + = pro f ) := argmin f) We have + f + ). Similarl, + := pro f ) satisfies + f + ). From monotonicit of f, we get u v, for an u f + ), v f + ). Taking u = + and v = +, we obtain co-coercivit. The co-coercivit of pro f implies that pro f is Lipschit continuous. pro f ) pro f ) 2, pro f ) pro f ) implies pro f ) pro f ). 1.7 Conjugate of a conve function For a function f : R N, ], we define its conjugate f b f ) = sup, f)). Eamples { b if = a 1. f) = a, b, f ) = sup, a, + b) = otherwise. 2. f) = { a if < 0 b if > 0., a < 0 < b. { f 0 if a < < b ) = otherwise. 7

8 3. f) = 1 2, A + b, + c, where A is smmteric and non-singular, then In general, if A 0, then f ) = 1 2 b, A 1 b) c. f ) = 1 2 b, A b) c, A := A A) 1 A and dom f = range A + b. 4. f) = 1 p p, p 1, then f u) = 1 p u p, where 1/p + 1/p = f) = e, f ) = sup e ) = ln if > 0 0 if = 0 if < 0 6. C = { A, 1}, where A is s smmetric positive definite matri. ι C = A 1 u, u. Properties f is conve and l.s.c. Note that f is the supremum of linear functions. We have seen that supremum of a famil of closed functions is closed; and supremum of a famil of conve functions is also conve. Fenchel s inequalit: This follows directl from the definition of f : f) + f ),. f ) = sup, f)), f). This can be viewed as an etension of the Cauch inequalit ,. Proposition ) f ) is closed and conve. 2) f ) f). 3) f ) = f) if and onl if f is closed and conve. Proof. 1. From Fenchel s inequalit Taking sup in gives f ) f)., f ) f). 2. f ) = f) if and onl if epif = epif. We have seen f f. This leads to eps f eps f. Suppose f is closed and conve and suppose, f )) epif. That is f ) < f) and there is a strict separating hperplane: {, s) : a ) + bs f ) = 0} such that with b 0. a b ), s f ) ) c < 0 8 for all, s) epif

9 3. If b < 0, we ma normalie it such that a, b) =, 1). Then we have Taking supremum over, s) epif, Thus, we get sup,s) epif This contradicts to Fenchel s inequalit., s, + f ) c < 0., s) sup, f)) = f ). f ), + f ) c < If b = 0, choose ŷ dom f and add ɛŷ, 1) to a, b), we can get ) ) a + ɛŷ, ɛ s f c ) 1 < 0 Now, we appl the argument for b < 0 and get contradiction. 5. If f = f, then f is closed and conve because f is closed and conve no matter what f is. Proposition If f is closed and conve, then Proof. 1. f) f ), = f) + f ). f) f) f) +,, f), f) for all, f) = sup, f)), f) = f ) 2. For the equivalence of f ), = f) + f ), we use f ) = f) and appl the previous argument. 1.8 Method of Lagrange multiplier for constrained optimiation problems A standard conve optimiation problem can be formulated as inf 0) subject to f i ) 0, i = 1,..., m and h i ) = 0 i = 1,..., p. We assume the domain D := i domf i i domh i is a closed conve set in R n. A point D satisfing the constraints is called a feasible point. We assume D and denote p the optimal value. 9

10 The method of Lagrange multiplier is to introduce augmented variables λ, µ and a Lagrangian so that the problem is transformed to a unconstrained optimiation problem. Let us define the Lagrangian to be m p L, λ, µ) = f 0 ) + λ i f i ) + µ i h i ). Here, λ and µ are the augmented variables, called the Lagrange multipliers or the dual variables. Primal problem From this Lagrangian, we notice that m ) sup λ i f i ) = ι Cf ), C f = λ 0 i { f i ) 0} and Hence p ) sup µ i h i ) = ι Ch ), C h = { h i ) = 0}. µ i sup L, λ, µ) = f 0 ) + ι Cf ) + ι Ch ) λ 0,µ Thus, the original optimiation problem can be written as p = inf f0 ) + ι Cf ) + ι Ch ) ) = inf sup L, λ, µ). D D λ 0,µ This problem is called the primal problem. Dual problem From this Lagrangian, we define the dual function gλ, µ) := inf L, λ, µ). D This is an infimum of a famil of concave closed functions in λ and µ, thus gλ, µ) is a concave closed function. The dual problem is d = sup gλ, µ). λ 0,µ This dual problem is the same as sup gλ, µ) subject to λ 0. λ,µ We refer λ, µ) dom g with λ 0 as dual feasible variables. The primal problem and dual problem are connected b the following dualit propert. Weak Dualit Propert Proposition 2. For an λ 0 and an µ, we have that gλ, µ) p. In other words, d p 10

11 Proof. Suppose is feasible point i.e. D, or equivalentl, f i ) 0 and h i ) = 0). Then for an λ i 0 and an µ i, we have m p λ i f i ) + µ i h i ) 0. This leads to Hence Hence for all feasible pair λ, µ) L, λ, µ) = f 0 ) + m λ i f i ) + p µ i h i ) f 0 ). gλ, µ) := inf D L, λ, µ) f 0), for all D. gλ, µ) p This is called weak dualit propert. Thus, the weak dualit can also be read as sup inf λ 0,µ D L, λ, µ) inf sup L, λ, µ). D λ 0,µ Definition 1.2. a) A point is called a primal optimal if it minimies sup λ 0,µ L, λ, µ). b) A dual pair λ, µ ) with λ 0 is said to be a dual optimal if it maimies inf D L, λ, µ). Strong dualit Definition 1.3. When d = p, we sa the strong dualit holds. A sufficient condition for strong dualit is the Slater condition: there eists a feasible in relative interior of domd: f i ) < 0, i = 1,..., m and h i ) = 0, i = 1,..., p. Such a point is called a strictl feasible point. Theorem 1.1. Suppose f 0,..., f m are conve, h) = A b, and assume the Slater condition holds: there eists D with A b = 0 and f i ) < 0 for all i = 1,..., m. Then the strong dualit holds. sup inf λ 0,µ D Proof. See pp , Bod s Conve Optimiation. L, λ, µ) = inf sup L, λ, µ). D λ 0,µ Complementar slackness Suppose there eist, λ 0 and µ such that is the optimal primal point and λ, µ ) is the optimal dual point and the strong dualit gap p d = 0. In this case, f 0 ) = gλ, µ ) := inf f 0 ) + f 0 ). f 0 ) + m λ i f i ) + m λ i f i ) + ) p µ i h i ) p µ i h i ) 11

12 The last line follows from m λ i f i ) + for an feasible pair, λ, µ). This leads to m λ i f i ) + p µ i h i ) 0. p µ i h i ) = 0. Since h i ) = 0 for i = 1,..., p, λ i 0 and f i ) 0, we then get λ i f i ) = 0 for all i = 1,..., m. This is called complementar slackness. It holds for an optimal solutions, λ, µ ). KKT condition Proposition When f 0, f i and h i are differentiable, then the optimal points to the primal problem and λ, µ ) to the dual problem satisf the Karush-Kuhn-Tucker KKT) condition: f 0 ) + m λ i f i ) + f i ) 0, i = 1,..., m λ i 0, i = 1,..., m, λ i f i ) = 0, h i ) = 0, p µ i g i ) = 0. i = 1,..., m i = 1,..., p Remark. If f 0, f i, i = 0,..., m are closed and conve, but ma not be differentiable, then the last KKT condition is replaced b 0 f 0 ) + m λ i f i ) + We call the triple, λ, µ ) satisfies the optimalit condition. p µ i g i ). Theorem 1.2. If f 0, f i are closed and conve and h are affine. Then the KKT condition is also a sufficient condition for optimal solutions. That is, if ˆ, ˆλ, ˆµ) satisfies KKT condition, then ˆ is primal optimal and ˆλ, ˆµ) is dual optimal, and there is ero dualit gap. Proof. 1. From f i ˆ) 0 and hˆ) = 0, we get that ˆ is feasible. 2. From ˆλ i 0 and f i being conve and h i are linear, we get L, ˆλ, ˆµ) = f 0 ) + i ˆλ i f i ) + i ˆµ i h i ) is also conve in. 12

13 3. The last KKT condition states that ˆ minimies L, ˆλ, ˆµ). Thus gˆλ, ˆµ) = Lˆ, ˆλ, ˆµ) m = f 0 ˆ) + ˆλ i f i ˆ) + = f 0 ˆ) p ˆµ i h i ˆ) This shows that ˆ and ˆλ, ˆµ) have ero dualit gap and therefore are primal optimal and dual optimal, respectivel. 2 Optimiation algorithms 2.1 Gradient Methods Assumptions f C 1 R N ) and conve f) is Lipschit continuous with parameter L Optimal value f = inf f) is finite and attained at. Gradient method Forward method k = k 1 t k f k 1 ) Fied step sie: if t k is constant Backtracking line search: Choose 0 < β < 1, initialie t k = 1; take t k := βt k until f t k f)) < f) 1 2 t k f) 2 Optimal line search: Backward method t k = argmin t f t f)). k = k 1 t k f k ). Analsis for the fied step sie case Proposition Suppose f C 1, conve and f is Lipschit with constant L. If the step sie t satisfies t 1/L, then the fied-step sie gradient descent method satisfies f k ) f ) 1 2kt

14 Remarks Proof. The sequence { k } is bounded. Thus, it has convergent subsequence to some which is an optimal solution. If in addition f is strongl conve, then the sequence { k } converges to the unique optimal solution linearl. 1. Let + := t f). 2. From quadratic upper bound: choose = + and t < 1/L, we get f + ) f) + f) f) + f), + L 2 2 t + Lt2 2 ) f) 2 f) t 2 f) From we get f ) f) + f), f + ) f) t 2 f) 2 f + f), t 2 f) 2 = f t f) 2) 2t = f ). 2t 4. Define i 1 =, i = +, sum this inequalities from i = 1,..., k, we get k f i ) f ) 1 2t k i 1 2 i 2) = 1 2t 0 2 k 2) 1 2t Since f i ) f is a decreasing sequence, we then get f k ) f 1 k k f i ) f ) 1 2kt 0 2. Proposition Suppose f C 1 and conve. The fied-step sie backward gradient method satisfies f k ) f ) 1 2kt 0 2. Here, no assumption on Lipschit continuit of f) is needed. 14

15 Proof. 1. Define + = t f + ). 2. For an, we have f) f + ) + f + ), + = f + ) + f + ), + t f + ) Take =, we get f + ) f) t f + ) 2 Thus, f + ) < f) unless f + ) = Take =, we obtain f + ) f ) + f + ), t f + ) 2 f ) + f + ), t 2 f+ ) 2 = f ) 1 2t t f + ) t 2 = f ) ). 2t Proposition Suppose f is strongl conve with parameter m and f) is Lipschit continuous with parameter L. Suppose the minimum of f is attended at. Then the gradient method converges linearl, namel k 2 c k 0 2 where Proof. 1. For 0 < t 2/m + L): f k ) f ) ck L 2 0 2, c = 1 t 2mL 2 < 1 if the step sie t m + L m + L. + 2 = t f) 2 = 2 2t f), + t 2 f) 2 1 t 2mL ) 2 + t t 2 ) f) 2 m + L m + L 1 t 2mL ) 2 = c 2. m + L t is chosen so that c < 1. Thus, the sequence k converges linearl with rate c. 2. From quadratic upper bound f k ) f ) L 2 k 2 ck L we get f k ) f ) also converges to 0 with linear rate. 15

16 2.2 Subgradient method Assumptions f is closed and conve Optimal value f = inf f) is finite and attained at. Subgradient method k = k 1 t k v k 1, v k 1 f k 1 ). t k is chosen so that f k ) < f k 1 ). This is a forward sub)gradient method. It ma not converge. If it converges, the optimal rate is f k ) f ) O1/ k), which is ver slow. 2.3 Proimal point method Assumptions f is closed and conve Optimal value f = inf f) is finite and attained at. Proimal point method: Let + := pro tf ) := tg t ). From we get k = pro tf k 1 ) = k 1 tg t k 1 ) pro tf ) := argmin tf) ) G t ) f + ). Thus, we ma view proimal point method is a backward subgradient method. Proposition Suppose f is closed and conve and suppose an ptimal solution of min f is attainable. Then the proimal point method k = pro tf k 1 ) with t > 0 satisfies f k ) f ) 1 2kt 0. 16

17 Convergence proof: 1. Given, let + := pro tf ). Let G t ) := + )/t. Then G t ) f + ). We then have, for an, f) f + ) + G t ), + = G t ), + t G t ) Take =, we get Thus, f + ) < f) unless f + ) = Take =, we obtain f + ) f) t f + ) 2 f + ) f ) + G t ), t G t ) 2 f ) + G t ), t 2 G t) 2 = f ) + 1 2t tg t ) 2 1 2t 2 = f ) + 1 2t + 2 2). 4. Taking = i 1, + = i, sum over i = 1,..., k, we get k f k ) f )) 1 2t Since f k ) is non-increasing, we get ) 0 k. kf k ) f )) k f k ) f )) 1 2t Accelerated Proimal point method The proimal point method is a first order method. With a small modification, it can be accelerated to a second order method. This is the work of Nesterov in 1980s. 2.5 Fied point method The proimal point method can be viewed as a fied point of the proimal map: F ) := pro f ). Let G) = + = I F )). Both F and G are firml non-epansive, i.e. F ) F ), F ) F ) 2 G) G), G) G) 2 17

18 Proof. 1). + = pro f ) = F ), + = pro f ) = F ). G) = + f + ). From monotonicit of f, we have This gives That is 2). From G = I F, we have G) G), , F ) F ), F ) F ) 2. G) G), = G) G), F + G)) F + G)) = G) G) 2 + G) G), F ) F ) = G) G) 2 + F ) + F ), F ) F ) = G) G) 2 +, F ) F ) F ) F ) 2 G) G) 2 Theorem 2.3. Assume F is firml non-epansive. Let Suppose a fied point of F eists and Then k converges to a fied point of F. Proof. k = 1 t k ) k 1 + t k F k 1 ), 0 arbitrar. t k [t min, t ma ], 0 < t min t ma < Let us define G = I F ). We have seen that G is also firml non-epansive. k = k 1 t k G k 1 ). 2. Suppose is a fied point of F, or equivalentl, G ) = 0. From firml nonepansive propert of F and G, we get with = k 1, + = k, t = t k ) = where M = t min 2 t ma ). 3. Let us sum this inequalit over k: This implies M = 2 +, = 2 tg), + t 2 G) 2 = 2 tg) G )), + t 2 G) 2 2t G) G ) 2 + t 2 G) 2 = t2 t) G) 2 M G) 2 0. G l ) l=0 G k ) 0 as k, and k is non-increasing; hence k is bounded; and k C as k. 18

19 4. Since the sequence { k } is bounded, an convergent subsequence, sa ȳ k, converges to ȳ satisfing Gȳ) = lim k Gȳk ) = 0, b the continuit of G. Thus, an cluster point ȳ of { k } satisfies Gȳ) = 0. Hence, b the previous argument with replaced b ȳ, the sequence k ȳ is also non-increasing and has a limit. 5. We claim that there is onl one limiting point of { k }. Suppose ȳ 1 and ȳ 2 are two cluster points of { k }. Then both sequences { k ȳ 1 } and { k ȳ 2 } are non-increasing and have limits. Since ȳ i are limiting points, there eist subsequences {ki 1} and {k2 i } such that k1 i ȳ 1 and k2 i ȳ 2 as i. We can choose subsequences again so that we have k 2 i 1 < k 1 i < k 2 i < k 1 i+1 for all i With this and the non-increasing of k ȳ 1 and k ȳ 2 we get k1 i+1 ȳ1 k2 i ȳ1 k1 i ȳ1 0 as i. On the other hand, k2 i ȳ 2. Therefore, we get ȳ 1 = ȳ 2. This shows that there is onl one limiting point, sa, and k. 2.6 Proimal gradient method This method is to minimie h) := f) + g). Assumptions: g C 1 conve, g) Lipschit continuous with parameter L f is closed and conve Proimal gradient method: This is also known as the Forward-backward method k = pro tf k 1 t g k 1 )) We can epress pro tf as I + t f) 1. Therefore the proimal gradient method can be epressed as k = I + t f) 1 I t g) k 1 Thus, the proimal gradient method is also called the forward-backward method. Theorem 2.4. The forward-backward method converges provided Lt 1. Proof. 1. Given a point, define Then = t g), + = pro tf ). t = g), + t f + ). Combining these two, we define a gradient G t ) := + t. Then G t ) g) f + ). 19

20 2. From the quadratic upper bound of g, we have g + ) g) + g), + + L = g) + g), + + Lt2 2 G t) 2 g) + g), + + t 2 G t) 2, The last inequalit holds provided Lt 1. Combining this with g) g) + g), we get g + ) g) + g), + + t 2 G t) From first-order condition at + of f f) f + ) + p, + for all p f + ). Choosing p = G t ) g), we get f + ) f) + G t ) g), Adding the above two inequalities, we get h + ) h) + G t ), + + t 2 G t) 2 Taking =, we get h + ) h) t 2 G t) 2. Taking =, we get h + ) h ) G t ), + + t 2 G t) 2 = tg t ) 2 + 2) 2t = ) 2t 2.7 Augmented Lagrangian Method Problem min F P ) := f) + ga) Equivalent to the primal problem with constraint min f) + g) subject to A = Assumptions f and g are closed and conve. 20

21 Eamples: { 0 if = b g) = ι {b} ) = otherwise The corresponding g ) =, b. g) = ι C ) g) = b 2. The Lagrangian is The primal function is The primal problem is The dual problem is sup inf, L,, ) := f) + g) +, A. inf [ L,, ) = sup [ = sup = sup F P ) = inf F P ) = inf inf Thus, the dual function F D ) is defined as and the dual problem is sup L,, ). inf sup L,, ). ] f) +, A ) + inf g), ) sup A, f)) sup, g)) f A ) g )) = sup F D )) F D ) := inf, L,, ) = f A ) + g )). sup P D ). We shall solve this dual problem b proimal point method: k = pro tfd k 1 ) = argma u [ f A T u) g u) 1 2t u k 1 2 ] ] We have sup f A T u) g u) 12t ) u 2 u = sup inf L,, u) 1 ) u 2 u, 2t = sup inf f) + g) + u, A 12t ) u u, 2 = inf f) + g) + u, A 12t ) u 2 sup, u = inf, f) + g) +, A + t2 A 2 ). 21

22 Here, the maimum u = + ta ). Thus, we define the augmented Lagrangian to be The augmented Lagrangian method is L t,, ) := f) + g) +, A + t A 2 2 k, k ) = argmin, L t,, k 1 ) k = k 1 + ta k k ) Thus, the Augmented Lagrangian method is equivalent to the proimal point method applied to the dual problem: sup f A ) g )). 2.8 Alternating direction method of multipliersadmm) Problem min f 1 1 ) + f 2 2 ) subject to A A 2 2 b = 0. Assumptions f i are closed and conve. ADMM Define ADMM: L t 1, 2, ) := f 1 1 ) + f 2 2 ) +, A A 2 2 b + t 2 A A 2 2 b 2. k 1 = argmin 1 L t 1, k 1 2, k 1 ) = argmin 1 f 1 1 ) + t 2 A A 2 k 1 2 b + 1 t k 1 2 ) k 2 = argmin 2 L t k 1, 2, k 1 ) = argmin 2 f 2 2 ) + t 2 A 1 k 1 + A 2 2 b + 1 t k 1 2 ) k = k 1 + ta 1 k 1 + A 2 k 2 b) ADMM is the Douglas-Rachford method applied to the dual problem: ma b, f 1 A T 1 ) ) + f2 A T 2 ) ) := h 1 ) h 2 ). Douglas-Rachford method min h 1 ) + h 2 ) k = pro h1 k 1 ) k = k 1 + pro h2 2 k k 1 ) k. If we call I + h 1 ) 1 = A and I + h 2 ) 1 = B. These two operators are firml nonepansive. The Douglas-Rachford method is to find the fied point of k = T k 1. T = I + A + B2A I). 22

23 2.9 Primal dual formulation Consider Let inf f) + ga)) F P ) := f) + ga) Define = A consider inf, f) + g) subject to = A. Now, introduce method of Lagrange multiplier: consider L P,, ) = f) + g) +, A Then The problem is The dual problem is We find that F P ) = inf inf B assuming optimalit condition, we have If we take inf first inf L P,, ) = inf Then the problem is inf sup L P,, ) sup L P,, ) sup inf L P,, ), inf L P,, ) = f A ) g ). := F D ), sup inf L P,, ) = sup F D )., f) + g) +, A ) = f) +, A g ) := L P D, ). inf sup L P D, ). On the other hand, we can start from F D ) := f A ) g ). Consider then we have If instead, we echange the order of inf and sup, sup,w L D, w, ) = f w) g ), A w sup inf L D, w, ) = F D ). w L D, w, ) = sup f w) g ), A w ) = f) + ga) = F P ).,w We can also take sup w first, then we get sup w L D, w, ) = sup f w) g ), A w ) = f) g ) + A, = L P D, ). w 23

24 Let us summarie F P ) = f) + ga) F D ) = f A) g ) L P,, ) := f) + g) +, A L D, w, ) := f w) g ), A w L P D, ) := inf F P ) = sup L P D, ) F D ) = inf L P D, ) B assuming optimalit condition, we have L P,, ) = sup L D, w, ) = f) g ) +, A w inf sup L P D, ) = sup inf L P, ). 24

A set C R n is convex if and only if δ C is convex if and only if. f : R n R is strictly convex if and only if f is convex and the inequality (1.

A set C R n is convex if and only if δ C is convex if and only if. f : R n R is strictly convex if and only if f is convex and the inequality (1. ONVEX OPTIMIZATION SUMMARY ANDREW TULLOH 1. Eistence Definition. For R n, define δ as 0 δ () = / (1.1) Note minimizes f over if and onl if minimizes f + δ over R n. Definition. (ii) (i) dom f = R n f()