Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Size: px
Start display at page:

Download "Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers"

Transcription

1 Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers

2 Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x) + g(y) s.t. Ax + By = b min f(x) + g(x) because it is equivalent to (by variable-splitting) min f(x) + g(y) s.t. x y = 0 both f and g closed and convex both f and g have special structures: easy proximal mappings possible that both f and g are nonsmooth, so proximal gradient method cannot be used

3 Shiqian Ma, MAT-258A: Numerical Optimization 3 Robust PCA or Example: Robust PCA min X R m n X + ρ M X 1 min + ρ Y 1, s.t. X + Y = M X,Y R m n Can we use ALM to solve this? augmented Lagrangian function L t (X, Y ; Λ) = X +ρ Y 1 Λ, X +Y M + t 2 X +Y M 2 F ALM: any disadvantage? { (X k+1, Y k+1 ) = argmin X,Y L t (X, Y ; Λ k ) Λ k+1 = Λ k t(x k+1 + Y k+1 M)

4 Shiqian Ma, MAT-258A: Numerical Optimization 4 Surveillance video background extraction

5 Shiqian Ma, MAT-258A: Numerical Optimization 5 Example: Sparse Inverse Covariance Selection Sparse Inverse Covariance Selection Can we use PGM to solve this? min X log det(x) + Σ, X + ρ X 1 Proximal gradient method (note that g(x) = log det(x)+ Σ, X is smooth) X k+1 := argmin X τ X t X (Xk t g(x k )) 2 F any disadvantage?

6 Shiqian Ma, MAT-258A: Numerical Optimization 6 Example: Sparse Inverse Covariance Selection min X log det(x) + Σ, X + ρ X 1 equivalent to (by variable-splitting) min X,Y log det(x) + Σ, X + ρ Y 1 s.t. X Y = 0 Can we use ALM to solve this? augmented Lagrangian function L t (X, Y ; Λ) = log det(x)+ Σ, X +ρ Y 1 Λ, X Y + t 2 X Y 2 F ALM: { (X k+1, Y k+1 ) = argmin X,Y L t (X, Y ; Λ k ) Λ k+1 = Λ k t(x k+1 Y k+1 ) any disadvantage?

7 Shiqian Ma, MAT-258A: Numerical Optimization 7 Alternating Direction Method of Multipliers (ADMM) Robust PCA: min + ρ Y 1, s.t. X + Y = M X,Y R m n augmented Lagrangian function L t (X, Y ; Λ) = X +ρ Y 1 Λ, X +Y M + t 2 X +Y M 2 F ALM: { (X k+1, Y k+1 ) = argmin X,Y L t (X, Y ; Λ k ) Λ k+1 = Λ k t(x k+1 + Y k+1 M) Alternating Direction Method of Multipliers X k+1 = argmin X L t (X, Y k ; Λ k ) Y k+1 = argmin Y L t (X k+1, Y ; Λ k ) Λ k+1 = Λ k t(x k+1 + Y k+1 M)

8 Shiqian Ma, MAT-258A: Numerical Optimization 8 The X-subproblem is min X + ρ Y k 1 Λ k, X + Y k M + t X 2 X + Y k M 2 F equivalent to min X X + t 2 X + Y k M Λ k /t 2 F the proximal mapping of X The Y -subproblem is min X k+1 +ρ Y 1 Λ k, X k+1 +Y M + t Y 2 Xk+1 +Y M 2 F equivalent to min Y ρ Y 1 + t 2 Xk+1 + Y M Λ k /t 2 F the proximal mapping of Y 1

9 Shiqian Ma, MAT-258A: Numerical Optimization 9 Alternating Direction Method of Multipliers Sparse inverse covariance selection min X,Y log det(x) + Σ, X + ρ Y 1 s.t. X Y = 0 augmented Lagrangian function L t (X, Y ; Λ) = log det(x)+ Σ, X +ρ Y 1 Λ, X Y + t 2 X Y 2 F ALM: { (X k+1, Y k+1 ) = argmin X,Y L t (X, Y ; Λ k ) Λ k+1 = Λ k t(x k+1 Y k+1 ) ADMM: X k+1 = argmin X L t (X, Y k ; Λ k ) Y k+1 = argmin Y L t (X k+1, Y ; Λ k ) Λ k+1 = Λ k t(x k+1 + Y k+1 M)

10 Shiqian Ma, MAT-258A: Numerical Optimization 10 The X-subproblem is min log det(x)+ Σ, X +ρ Y k 1 Λ k, X Y k + t X 2 X Y k 2 F equivalent to min log det(x) + t X 2 X + Y k M + (Σ Λ k )/t 2 F the proximal mapping of log det(x) The Y -subproblem is min log det(x k+1 )+ Σ, X k+1 +ρ Y 1 Λ k, X k+1 Y + t Y 2 Xk+1 Y 2 F equivalent to min Y ρ Y 1 + t 2 Xk+1 Y Λ k /t 2 F the proximal mapping of Y 1

11 Shiqian Ma, MAT-258A: Numerical Optimization 11 General form of ADMM Convex minimization with two-block separable structure: augmented Lagrangian function min f(x) + g(y) s.t. Ax + By = b L t (x, y; λ) = f(x) + g(y) λ, Ax + By b + t 2 Ax + By b 2 2 ADMM Two subproblems x k+1 = argmin x L t (x, y k ; λ k ) y k+1 = argmin y L t (x k+1, y; λ k ) λ k+1 = λ k t(ax k+1 + By k+1 b) x k+1 = argmin x f(x) + t 2 Ax + Byk b λ k /t 2 2 y k+1 = argmin y g(y) + t 2 Axk+1 + By b λ k /t 2 2

12 Shiqian Ma, MAT-258A: Numerical Optimization 12 Variable splitting and reformulation sum of two functions with structures (could be indicator function) For many applications, one can apply variable-splitting to reformulate the problem as min f(x) + g(y) s.t. Ax + By = b such that both f and g have easy proximal mappings then one can apply ADMM For example is equivalent to min f(x) + g(ax b) min f(x) + g(y) s.t. Ax y = b

13 Shiqian Ma, MAT-258A: Numerical Optimization 13 Compressed sensing with noise can be reformulated as min x 1 s.t. Ax b 2 σ or Augmented Lagrangian function min x 1 s.t. Ax y = b y 2 σ min x 1 + I y 2 σ(y) s.t. Ax y = b L t (x, y; λ) = x 1 + I y 2 σ(y) λ, Ax y b + t 2 Ax y b 2 2

14 Shiqian Ma, MAT-258A: Numerical Optimization 14 Apply ADMM x-subproblem: y-subproblem: x k+1 = argmin x L t (x, y k ; λ k ) y k+1 = argmin y L t (x k+1, y; λ k ) λ k+1 = λ k t(ax k+1 y k+1 b) x k+1 = argmin x y k+1 = argmin y this is the projection onto l 2 -ball. x 1 + t 2 Ax yk b λ k /t 2 2 I y 2 σ(y) + t 2 Axk+1 y b λ k /t 2 2

15 Shiqian Ma, MAT-258A: Numerical Optimization 15 Portfolio Selection r i : random variable, the rate of return for stock i x i : the relative amount invested in stock i Return: r = r 1 x 1 + r 2 x r n x n expected return: R = E(r) = E(r i )x i = µ i x i Risk: V = V ar(r) = ij σ ijx i x j = x Σx min (1/2)x Σx s.t. i µ ix i = r 0 i x i = 1 x i 0, i = 1,..., n

16 Shiqian Ma, MAT-258A: Numerical Optimization 16 Can be reformulated as (define set C as the probability simplex) Augmented Lagrangian function: min (1/2)x Σx s.t. µ x = r 0 x y = 0 y C L t (x, y; λ 1, λ 2 ) = (1/2)x Σx + I {y C} (y) λ 1, µ x r 0 λ 2, x y + t 2 µ x r t 2 x y 2 2 ADMM: x k+1 = argmin x L t (x, y k ; λ k 1, λ k 2) y k+1 = argmin y L t (x k+1, y; λ k 1, λ k 2) λ k+1 1 = λ k 1 t(µ x k+1 r 0 ) λ k+1 2 = λ k 2 t(x k+1 y k+1 )

17 Shiqian Ma, MAT-258A: Numerical Optimization 17 Total variation image deblurring Use u R n2 to denote an n n gray-scale image. Use K R n2 n 2 to represent a blurring operator An observation of the image is obtained by (ɛ is noise) b = Ku + ɛ So one wants to minimize Ku b 2 2 A widely used technique in image processing is to use the Total Variation term to preserve the sharp edges n T V (u) = (u i+1,j u ij ) 2 + (u i,j+1 u ij ) 2 i,j=1

18 Shiqian Ma, MAT-258A: Numerical Optimization 18 By slight abuse of the notation (now u is a n 2 -dim vector), TV can also be written as T V (u) = the TV image deblurring model is n 2 i=1 D i u 2 min u n 2 i=1 D i u 2 + ρ 2 Ku b 2 2 By variable-splitting, reformulate it as min u,w n 2 i=1 w i 2 + ρ 2 Ku b 2 2 s.t. D i u w i = 0, i = 1,..., n 2

19 Shiqian Ma, MAT-258A: Numerical Optimization 19 augmented Lagrangian function L t (u, w; λ) = n 2 i=1 w i 2 + ρ 2 Ku b 2 2 n 2 i=1 λ i, D i u w i + n 2 i=1 t 2 D iu w i 2 2 ADMM u k+1 = argmin u L t (u, w k ; λ k ) w k+1 = argmin w L t (u k+1, w; λ k ) λ k+1 i = λ k i t(d iu k+1 wi k+1 ), i = 1,..., n 2 the w-subproblem is separable for w i

20 Shiqian Ma, MAT-258A: Numerical Optimization 20 TV+L1 model for image reconstruction The image u is sparse under wavelet transform Ψ, i.e., Ψu is sparse Reformulate as min u n 2 i=1 D i u 2 + γ Ψu 1 + ρ 2 Ku b 2 2 min n 2 i=1 w i 2 + γ v 1 + ρ 2 Ku b 2 2 s.t. D i u w i = 0, i = 1,..., n 2 Ψu v = 0 augmented Lagrangian function L t (u, w, v; λ, µ) = n 2 i=1 w i 2 + γ v 1 + ρ 2 Ku b 2 2 λ i, D i u w i + n 2 i=1 t 2 D iu w i 2 2 µ, Ψu v + t 2 Ψu v 2 2

21 Shiqian Ma, MAT-258A: Numerical Optimization 21 ADMM u k+1 = argmin u L t (u, w k, v k ; λ k, µ k ) (w k+1, v k+1 ) = argmin w,v L t (u k+1, w, v; λ k, µ k ) λ k+1 i = λ k i t(d iu k+1 wi k+1 ), i = 1,..., n 2 µ k+1 = µ k t(ψu k+1 v k+1 ) note that the subproblem for (w, v) is separable for w and v.

22 Shiqian Ma, MAT-258A: Numerical Optimization 22 The standard SDP: Semidefinite Programming min X S n C, X s.t. A (i), X = b i, i = 1,..., m X 0 where C, A (i) S n, i = 1,..., m The dual problem min y R m,s S n s.t. b y A (y) + S = C S 0 augmented Lagrangian function (X is the Lagrange multiplier) L t (y, S; X) = b y+i {S 0} (S) X, A (y)+s C + t 2 A (y)+s C 2 F

23 Shiqian Ma, MAT-258A: Numerical Optimization 23 ADMM y k+1 = argmin y L t (y, S k ; X k ) S k+1 = argmin S L t (y k+1, S; X k ) X k+1 = X k t(a (y k+1 ) + S k+1 C)

24 Shiqian Ma, MAT-258A: Numerical Optimization 24 sparse covariance matrix estimation 1 min X S n 2 X Σ 2 F + ρ X 1, s.t.x 0 where Σ is the sample covariance matrix which may not be sparse and not positive semidefinite Reformulation (by variable-splitting) augmented Lagrangian function 1 min X,Y S n 2 X Σ 2 F + ρ X 1 s.t. X Y = 0 Y 0 L t (X, Y ; Λ) = 1 2 X Σ 2 F+ρ X 1 +I {Y 0} (Y )+ Λ, X Y + t 2 X Y 2 F

25 Shiqian Ma, MAT-258A: Numerical Optimization 25 ADMM X k+1 = argmin X L t (X, Y k ; Λ k ) Y k+1 = argmin Y L t (X k+1, Y ; Λ k ) Λ k+1 = Λ k t(x k+1 Y k+1 )

26 Shiqian Ma, MAT-258A: Numerical Optimization 26 Nonconvex model: Optimization on sphere min f(x) + x 1 x s.t. x 2 = 1 where f(x) is a differentiable function. Reformulation: min x,y f(x) + x 1 s.t. x y = 0 y 2 = 1 augmented Lagrangian function L t (x, y; λ) = f(x) + x 1 + I { y 2 =1}(y) λ, x y + t 2 x y 2 2 ADMM x k+1 = argmin x L t (x, y k ; λ k ) y k+1 = argmin y L t (x k+1, y; λ k ) λ k+1 = λ k t(x k+1 y k+1 )

27 Shiqian Ma, MAT-258A: Numerical Optimization 27 Linearized ADMM The standard form of the problem min x,y s.t. Augmented Lagrangian function f(x) + g(y) Ax + By = b L t (x, y; λ) = f(x) + g(y) λ, Ax + By b + t 2 Ax + By b 2 2 ADMM x k+1 = argmin x f(x) + t 2 Ax + Byk b λ k /t 2 2 y k+1 = argmin y g(y) + t 2 Axk+1 + By b λ k /t 2 2 λ k+1 = λ k t(ax k+1 + By k+1 b) The two subproblems are not easy if A and B are not identity matrices

28 Shiqian Ma, MAT-258A: Numerical Optimization 28 Use proximal gradient method to solve them For example, the x-subproblem is min x f(x) + h(x) iterates x i+1 = argmin x f(x) + 1 2τ x (xi τ h(x i )) 2 2 where τ < 1/L and L is the Lipschitz constant of h But, at the end, it is just a subproblem, we do not want to solve it to a very high accuracy in fact, one iteration of PGM is enough. This leads to the linearized ADMM

29 Shiqian Ma, MAT-258A: Numerical Optimization 29 linearized ADMM x k+1 = argmin x f(x) + 1 2τ 1 x (x k τ 1 ta (Ax k + By k b λ k /t) 2 2 y k+1 = argmin y g(y) + 1 2τ 2 y (y k τ 2 tb (Ax k+1 + By k b λ k /t)) 2 2 λ k+1 = λ k t(ax k+1 + By k+1 b) where τ 1 < 1/λ max (A A) and τ 2 < 1/λ max (B B) Now the two subproblems are easy: they are the proximal mappings of f and g

30 Shiqian Ma, MAT-258A: Numerical Optimization 30 Global Convergence of ADMM The problem Lagrangian function min x,y s.t. f(x) + g(y) Ax + By = b L(x, y; λ) = f(x) + g(y) λ, Ax + By b Optimality conditions: (x, y ; λ ) is optimal, if A λ f(x ) B λ g(y ) Ax + By = b

31 Shiqian Ma, MAT-258A: Numerical Optimization 31 ADMM: x k+1 = argmin x f(x) + t 2 Ax + Byk b λ k /t 2 2 y k+1 = argmin y g(y) + t 2 Axk+1 + By b λ k /t 2 2 λ k+1 = λ k t(ax k+1 + By k+1 b) Theorem: If A and B are full column rank, ADMM globally converges to the optimal solution (x, y ; λ ) for any t > 0, and any initial point (y 0, λ 0 ). Proof. The optimality conditions for the two subproblems are: 0 f(x k+1 ) + ta (Ax k+1 + By k b λ k /t) 0 g(y k+1 ) + tb (Ax k+1 + By k+1 b λ k /t) Using the updating formula for λ k+1, we have

32 Shiqian Ma, MAT-258A: Numerical Optimization 32 A (λ k+1 tb(y k y k+1 )) f(x k+1 ) (1) B λ k+1 g(y k+1 ) (2) Because f( ) and g( ) are monotone operators, we have x k+1 x, A (λ k+1 λ tb(y k y k+1 )) 0 y k+1 y, B (λ k+1 λ ) 0 Summing these two inequalities, we have (x k+1 x ) A (λ k+1 λ ) t(x k+1 x ) A B(y k y k+1 ) +(y k+1 y ) B (λ k+1 λ ) 0 this is equivalent to (λ k+1 λ ) (Ax k+1 +By k+1 b) t(x k+1 x ) A B(y k y k+1 ) 0 ( )

33 Shiqian Ma, MAT-258A: Numerical Optimization 33 Note that by we get Ax k+1 + By k+1 b = (λ k λ k+1 )/t, Ax + By b = 0 A(x k+1 x ) = B(y k+1 y ) + (λ k λ k+1 )/t substitute this to ( ), we get 1 t (λk+1 λ ) (λ k λ k+1 ) + t(by k+1 By ) (By k By k+1 ) (λ k λ k+1 ) (By k By k+1 ) Define we get u = ( ) y, H = λ ( ) tb B t I u k+1 u, u k u k+1 H λ k λ k+1, By k By k+1

34 Shiqian Ma, MAT-258A: Numerical Optimization 34 Because we have Thus Because B λ k+1 g(y k+1 ), B λ k g(y k ) y k y k+1, B λ k B λ k+1 0 u k+1 u, u k u k+1 H 0 u k+1 u 2 H = u k+1 u k 2 H 2 u k u k+1, u k u H + u k u 2 H we have u k u 2 H uk+1 u 2 H = 2 u k u k+1, u k u H u k+1 u k 2 H = 2 u k u k+1, (u k u k+1 ) + (u k+1 u ) H u k+1 u k 2 H = u k+1 u k 2 H + 2 uk u k+1, u k+1 u H u k+1 u k 2 H ( )

35 Shiqian Ma, MAT-258A: Numerical Optimization 35 From ( ) we have the following conclusion: (i) u k u k+1 H 0 (ii) {u k } lies in a compact region (iii) u k u 2 H is monotonically non-increasing and thus converges From (i) we have By k By k+1 0 and λ k λ k+1 0. Then Ax k + By k b 0 and Ax k Ax k+1 0. Since A and B are full column rank, we have x k x k+1 0 and y k y k+1 0. From (ii) we know u k has a subsequence {u k j} that converges to û = (ŷ, ˆλ). Therefore, x k j ˆx. So (ˆx, ŷ, ˆλ) is a limit point of {(x k, y k, λ k )} and Aˆx + Bŷ b = 0. From (1) and (2) we know that 0 f(ˆx) A ˆλ 0 g(ŷ) B ˆλ

36 Shiqian Ma, MAT-258A: Numerical Optimization 36 thus (ˆx, ŷ, ˆλ) satisfies the KKT conditions and thus is an optimal solution. Therefore, we showed that any limit point of {(x k, y k, λ k )} is an optimal solution. To complete the proof, it remains to show that {(x k, y k, λ k )} has a unique limit point. Let {(ˆx 1, ŷ 1, ˆλ 1 )} and {(ˆx 2, ŷ 2, ˆλ 2 )} be any two limit points of {(x k, y k, λ k )}. As we have shown, both {(ˆx 1, ŷ 1, ˆλ 1 )} and {(ˆx 2, ŷ 2, ˆλ 2 )} are optimal solutions. Thus, u in ( ) can be replaced by û 1 := (ŷ 1, ˆλ 1 ) and û 2 := (ŷ 2, ˆλ 2 ). This results in u k+1 û i 2 H u k û i 2 H, i = 1, 2, and we thus get the existence of the limits lim k uk û i H = η i < +, i = 1, 2. Now using the identity u k û 1 2 H u k û 2 2 H = 2 u k, û 1 û 2 H + û 1 2 H û 2 2 H

37 Shiqian Ma, MAT-258A: Numerical Optimization 37 and passing the limit we get η1 2 η2 2 = 2 û 1, û 1 û 2 H + û 1 2 H û 2 2 H = û 1 û 2 2 H and η1 2 η2 2 = 2 û 2, û 1 û 2 H + û 1 2 H û 2 2 H = û 1 û 2 2 H. Thus we must have û 1 û 2 2 H {(x k, y k, λ k )} is unique. = 0 and hence the limit point of

38 Shiqian Ma, MAT-258A: Numerical Optimization 38 linearized ADMM Convergence of Linearized ADMM x k+1 = argmin x f(x) + 1 2τ 1 x (x k τ 1 ta (Ax k + By k b λ k /t) 2 2 y k+1 = argmin y g(y) + 1 2τ 2 y (y k τ 2 tb (Ax k+1 + By k b λ k /t)) 2 2 λ k+1 = λ k t(ax k+1 + By k+1 b) Theorem: If τ 1 < 1/λ max (A A) and τ 2 < 1/λ max (B B), linearized ADMM globally converges to the optimal solution (x, y ; λ ) for any t > 0, and any initial point (y 0, λ 0 ). Proof. See the posted paper for proof.

39 Shiqian Ma, MAT-258A: Numerical Optimization 39 Extensions: Multi-block ADMM How about the function and variables have 3 parts (blocks)? augmented Lagrangian function min f 1 (x 1 ) + f 2 (x 2 ) + f 3 (x 3 ) s.t. A 1 x 1 + A 2 x 2 + A 3 x 3 = b L t (x 1, x 2, x 3 ; λ) = f 1 (x 1 ) + f 2 (x 2 ) + f 3 (x 3 ) λ, A 1 x 1 + A 2 x 2 + A 3 x 3 b + t 2 A 1x 1 + A 2 x 2 + A 3 x 3 b 2 2 Multi-block ADMM: x k+1 1 = argmin x1 L t (x 1, x k 2, x k 3; λ k ) x k+1 2 = argmin x2 L t (x k+1 1, x 2, x k 3; λ k ) x k+1 3 = argmin x3 L t (x k+1 1, x k+1 2, x 3 ; λ k ) λ k+1 = λ k t(a 1 x k A 2 x k A 3 x k+1 3 b)

40 Shiqian Ma, MAT-258A: Numerical Optimization 40 RPCA with noise Applications min X + ρ Y 1 s.t. X + Y + Z = M Z F σ Latent variable graphical model (See Lecture 1) min R,S,L R, ˆΣ X log det(r) + α S 1 + βtr(l) s.t. R = S L, R 0, L 0.

41 Shiqian Ma, MAT-258A: Numerical Optimization 41 Convergence without further conditions, multi-block is not necessarily convergent Counter-example by Chen, He, Ye and Yuan (2013) min 0 s.t. A 1 x 1 + A 2 x 2 + A 3 x 3 = 0, where (A 1, A 2, A 3 ) =

42 Shiqian Ma, MAT-258A: Numerical Optimization 42 The update of multi-block ADMM with t = 1 is x k x k x k+1 3 = λ k x k 1 x k 2 x k 3 λ k Equivalently, x k+1 2 x k+1 3 λ k+1 = M x k 2 x k 3, λ k where

43 Shiqian Ma, MAT-258A: Numerical Optimization 43 M = Note that ρ(m) > Theorem (Chen-He-Ye-Yuan-2013): There existing an example where the direct extension of ADMM of three blocks with a real number initial point is not necessarily convergent for any choice of t > 0.

44 Shiqian Ma, MAT-258A: Numerical Optimization 44 Sufficient conditions for convergence of multi-block ADMM This is a trendy topic for ADMM; still under developing Han and Yuan (2012): Golbal convergence, if f 1,..., f N strongly convex, and t is restricted to be small are all Lin, Ma and Zhang (2014): Sublinear convergence rate, if f 2,..., f N are strongly convex, and t is restricted to be small Lin, Ma and Zhang (2014): Globally linear convergence rate, if f 2,..., f N are strongly convex, f N Lipschitz continuous, A N full row rank, and t is restricted to be small Cai, Han and Yuan (2014): Sublinear convergence rate for N = 3, if f 3 is strongly convex, and t is restricted to be small

45 Shiqian Ma, MAT-258A: Numerical Optimization 45 Li, Sun and Toh (2014): Global convergence with proximal terms for N = 3, if f 3 is strongly convex a lot of following up works going on...

46 Shiqian Ma, MAT-258A: Numerical Optimization 46 Variants: Transform multi-block to two-block What if I do not want to impose additional conditions? Many varaints of multi-block ADMM with guaranteed convergence But usually they perform worse than the original multi-block ADMM, although the latter one is not theoretically guaranteed One variant is the following (Wang, Hong, Ma and Luo (2013)): first tranform the original problem to the following one: min f 1 (x 1 ) + f 2 (x 2 ) f N (x N ) s.t. A 1 x 1 + A 2 x A N x N = b min f 1 (x 1 ) + f 2 (x 2 ) f N (x N ) s.t. A i x i b/n = y i y 1 + y y N = 0

47 Shiqian Ma, MAT-258A: Numerical Optimization 47 then apply two-block ADMM to the transformed problem: N N N t L t (x, y; λ i ) = f i (x i ) λ i, A i x i b/n y i + 2 A ix i b/n y i 2 i=1 i=1 x-subproblems are separable y-subproblem is an easy projection Lagrangian function i=1 min 1 2 y z 2, s.t. y y N = 0 L(y, µ) = KKT conditions: N i=1 1 2 y i z i 2 µ, y y N y i z i µ = 0, and y y N = 0

48 Shiqian Ma, MAT-258A: Numerical Optimization 48 So we get µ = 1 N N z i and y i = z i 1 N i=1 N i=1 z i theoretically guaranteed to converge

49 Shiqian Ma, MAT-258A: Numerical Optimization 49 Gradient-based ADMM min f(x) + g(y) s.t. Ax + By = 0 What if f has an easy proximal mapping, but g does not? Assume g is smooth ADMM x k+1 = argmin x L t (x, y k ; λ k ) y k+1 = argmin y L t (x k+1, y; λ k ) λ k+1 = λ k t(ax k+1 + By k+1 b) Gradient-based ADMM: take a gradient step for the y-subproblem x k+1 = argmin x L t (x, y k ; λ k ) y k+1 = y k t y L t (x k+1, y k ; λ k ) λ k+1 = λ k t(ax k+1 + By k+1 b)

50 Shiqian Ma, MAT-258A: Numerical Optimization 50 Sparse logistic regression: where l(x, c) = 1 m min x 1 + l(x, c) m log(1 + exp( b i (x a i + c))) i=1 reformulation of sparse logistic regression: min x 1 + l(y, c) s.t. x y = 0 take gradient step for (y, c)-subproblem Fused logistic regression: min x 1 + n x i x i 1 + l(x, c) i=1

51 Shiqian Ma, MAT-258A: Numerical Optimization 51 reformulation of fused logistic regression augmented Lagrangian function min x 1 + w 1 + l(y, c) s.t. w = By x = y L t (x, w, y, c; λ 1, λ 2 ) = x 1 + w 1 + l(y, c) λ 1, w By + t 2 w By 2 λ 2, x y + t 2 x y 2

52 Shiqian Ma, MAT-258A: Numerical Optimization 52 exactly solve (x, w)-subproblem, take gradient step for (y, c)- subproblem

53 Shiqian Ma, MAT-258A: Numerical Optimization 53 One more example on nonconvex problem Semidefinite programming min X S n C, X s.t. A (i), X = b i, i = 1,..., m X 0 Any positive semidefinite matrix X can be rewritten as X = V V, where V R n n reformulation of SDP: min V R n n C, V V s.t. A (i), V V = b i, i = 1,..., m This is a nonconvex equality-constrained problem: you can use augmented Lagrangian method as long as you have a good way to mi-

54 Shiqian Ma, MAT-258A: Numerical Optimization 54 nimize the augmented Lagrangian function: L t (V, λ) = C, V V λ, A(V V ) b + t 2 A(V V ) b 2 The augmented Lagrangian method: V k+1 := argmin V L t (V, λ) λ k+1 := λ k t(a(v k+1 V k+1 ) b) Two-block reformulation (X = UV and U = V ) min U,V R n n C, UV s.t. A (i), UV = b i, i = 1,..., m U V = 0 augmented Lagrangian function L t (U, V ; λ, Λ) = C, UV λ, A(UV ) b + t 2 A(UV ) b 2 Λ, U V + t 2 U V 2 F

55 Shiqian Ma, MAT-258A: Numerical Optimization 55 ADMM: U k+1 := argmin U L t (U, V k ; λ k, Λ k ) V k+1 := argmin V L t (U k+1, V ; λ k, Λ k ) λ k+1 := λ k t(a(u k+1 V k+1 ) b) Λ k+1 := Λ k t(u k+1 V k+1 )

56 Shiqian Ma, MAT-258A: Numerical Optimization 56 Lots of recent developments of ADMM Sufficient conditions for multi-block ADMM for convex problems Convergence analysis for ADMM for nonconvex problems Stochastic ADMM Online ADMM...

57 Shiqian Ma, MAT-258A: Numerical Optimization 57 Relation with operator-splitting method Operator-splitting methods for inclusion problem of monotone operators Find u, s.t., 0 S(u) + T (u) where S, T : R n R n are maximal monotone operators T is monotone operator if (u v) (T (u) T (v)) 0, u, v T is called maximal monotone if there is no monotone operator that properly contains it

58 Shiqian Ma, MAT-258A: Numerical Optimization 58 Douglas-Rachford Operator Splitting Method Find u, s.t., 0 S(u) + T (u) Douglas-Rachford operator splitting method v k+1 u k+1 = JS τ(2j T τ I)vk + (I JT τ )vk = JT τ vk+1 J τ T = (I + τt ) 1 is called the resolvent of operator T Example: Optimality condition is: min f(x) + g(x) 0 f(x) + g(x) so, S = f and T = g

59 Shiqian Ma, MAT-258A: Numerical Optimization 59 Now y = J τ S (x) = (I + τs) 1 (x) = (I + τ f) 1 (x) means that x y + τ f(y) This is the optimality condition of min y τf(y) y x 2 2 This is the proximal mapping of f

60 Shiqian Ma, MAT-258A: Numerical Optimization 60 Primal problem Dual problem Separable convex minimization min λ min f(x) + g(y) s.t. Ax + By = b f (A λ) + g (B λ) b λ Optimality condition of dual problem Find λ, s.t. 0 A f (A λ) + B g (B λ) b Define S( ) = A f (A ), T ( ) = B g (B ) b Apply Douglas-Rachford splitting method to Find λ, s.t. 0 S(λ) + T (λ) is equivalent to apply ADMM to the primal problem

61 Shiqian Ma, MAT-258A: Numerical Optimization 61 Peaceman-Rachford operator splitting method Find u, s.t. 0 S(u) + T (u) Peaceman-Rachford operator splitting method: v k+1 u k+1 If apply to the dual problem of = (2JS τ I)(2J T τ I)vk = JT τ vk+1 min f(x) + g(y) s.t. Ax + By = b then it is equivalent to the following algorithm: (symmetric ADMM) x k+1 = argmin x L t (x, y k ; λ k ) λ k+1 2 = λ k t(ax k+1 + By k b) y k+1 = argmin y L t (x k+1, y; λ k+1 2) λ k+1 = λ k+1 2 t(ax k+1 + By k+1 b)

62 Shiqian Ma, MAT-258A: Numerical Optimization 62 Lots of recent developments Operator splitting method has broader application Recent research questions: Three (or more) operators? other operator splitting schemes? sufficient conditions? convergence rate?...

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 XI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 Alternating direction methods of multipliers for separable convex programming Bingsheng He Department of Mathematics

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent Yinyu Ye K. T. Li Professor of Engineering Department of Management Science and Engineering Stanford

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING SIAM J. OPTIM. Vol. 8, No. 1, pp. 646 670 c 018 Society for Industrial and Applied Mathematics HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization

Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization Joint work with Bo Huang, Shiqian Ma, Tony Qin, Katya Scheinberg, Zaiwen Wen and Wotao Yin Department of IEOR, Columbia University

More information

Application of the Strictly Contractive Peaceman-Rachford Splitting Method to Multi-block Separable Convex Programming

Application of the Strictly Contractive Peaceman-Rachford Splitting Method to Multi-block Separable Convex Programming Application of the Strictly Contractive Peaceman-Rachford Splitting Method to Multi-block Separable Convex Programming Bingsheng He, Han Liu, Juwei Lu, and Xiaoming Yuan Abstract Recently, a strictly contractive

More information

A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM

A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM CAIHUA CHEN, SHIQIAN MA, AND JUNFENG YANG Abstract. In this paper, we first propose a general inertial proximal point method

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学 Linearized Alternating Direction Method: Two Blocks and Multiple Blocks Zhouchen Lin 林宙辰北京大学 Dec. 3, 014 Outline Alternating Direction Method (ADM) Linearized Alternating Direction Method (LADM) Two Blocks

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

An ADMM algorithm for optimal sensor and actuator selection

An ADMM algorithm for optimal sensor and actuator selection An ADMM algorithm for optimal sensor and actuator selection Neil K. Dhingra, Mihailo R. Jovanović, and Zhi-Quan Luo 53rd Conference on Decision and Control, Los Angeles, California, 2014 1 / 25 2 / 25

More information

arxiv: v1 [math.oc] 27 Jan 2013

arxiv: v1 [math.oc] 27 Jan 2013 An Extragradient-Based Alternating Direction Method for Convex Minimization arxiv:1301.6308v1 [math.oc] 27 Jan 2013 Shiqian MA Shuzhong ZHANG January 26, 2013 Abstract In this paper, we consider the problem

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information

arxiv: v1 [math.oc] 13 Dec 2018

arxiv: v1 [math.oc] 13 Dec 2018 A NEW HOMOTOPY PROXIMAL VARIABLE-METRIC FRAMEWORK FOR COMPOSITE CONVEX MINIMIZATION QUOC TRAN-DINH, LIANG LING, AND KIM-CHUAN TOH arxiv:8205243v [mathoc] 3 Dec 208 Abstract This paper suggests two novel

More information

A Primal-dual Three-operator Splitting Scheme

A Primal-dual Three-operator Splitting Scheme Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

Optimal Linearized Alternating Direction Method of Multipliers for Convex Programming 1

Optimal Linearized Alternating Direction Method of Multipliers for Convex Programming 1 Optimal Linearized Alternating Direction Method of Multipliers for Convex Programming Bingsheng He 2 Feng Ma 3 Xiaoming Yuan 4 October 4, 207 Abstract. The alternating direction method of multipliers ADMM

More information

On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting

On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting Mathematical Programming manuscript No. (will be inserted by the editor) On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting Daniel O Connor Lieven Vandenberghe

More information

First-order methods for structured nonsmooth optimization

First-order methods for structured nonsmooth optimization First-order methods for structured nonsmooth optimization Sangwoon Yun Department of Mathematics Education Sungkyunkwan University Oct 19, 2016 Center for Mathematical Analysis & Computation, Yonsei University

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems?

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Mingyi Hong IMSE and ECpE Department Iowa State University ICCOPT, Tokyo, August 2016 Mingyi Hong (Iowa State University)

More information

Applications of Linear Programming

Applications of Linear Programming Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Operator Splitting for Parallel and Distributed Optimization

Operator Splitting for Parallel and Distributed Optimization Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance

Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance Date: Mar. 3rd, 2017 Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance Presenter: Songtao Lu Department of Electrical and Computer Engineering Iowa

More information

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE Convex Analysis Notes Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE These are notes from ORIE 6328, Convex Analysis, as taught by Prof. Adrian Lewis at Cornell University in the

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Linearized Alternating Direction Method of Multipliers via Positive-Indefinite Proximal Regularization for Convex Programming.

Linearized Alternating Direction Method of Multipliers via Positive-Indefinite Proximal Regularization for Convex Programming. Linearized Alternating Direction Method of Multipliers via Positive-Indefinite Proximal Regularization for Convex Programming Bingsheng He Feng Ma 2 Xiaoming Yuan 3 July 3, 206 Abstract. The alternating

More information

Solving DC Programs that Promote Group 1-Sparsity

Solving DC Programs that Promote Group 1-Sparsity Solving DC Programs that Promote Group 1-Sparsity Ernie Esser Contains joint work with Xiaoqun Zhang, Yifei Lou and Jack Xin SIAM Conference on Imaging Science Hong Kong Baptist University May 14 2014

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between

More information

INERTIAL PRIMAL-DUAL ALGORITHMS FOR STRUCTURED CONVEX OPTIMIZATION

INERTIAL PRIMAL-DUAL ALGORITHMS FOR STRUCTURED CONVEX OPTIMIZATION INERTIAL PRIMAL-DUAL ALGORITHMS FOR STRUCTURED CONVEX OPTIMIZATION RAYMOND H. CHAN, SHIQIAN MA, AND JUNFENG YANG Abstract. The primal-dual algorithm recently proposed by Chambolle & Pock (abbreviated as

More information

Expanding the reach of optimal methods

Expanding the reach of optimal methods Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM

More information

On the acceleration of augmented Lagrangian method for linearly constrained optimization

On the acceleration of augmented Lagrangian method for linearly constrained optimization On the acceleration of augmented Lagrangian method for linearly constrained optimization Bingsheng He and Xiaoming Yuan October, 2 Abstract. The classical augmented Lagrangian method (ALM plays a fundamental

More information

Solving Dual Problems

Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018 Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Lecture: Duality of LP, SOCP and SDP

Lecture: Duality of LP, SOCP and SDP 1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

Lecture: Algorithms for Compressed Sensing

Lecture: Algorithms for Compressed Sensing 1/56 Lecture: Algorithms for Compressed Sensing Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

Primal-dual algorithms for the sum of two and three functions 1

Primal-dual algorithms for the sum of two and three functions 1 Primal-dual algorithms for the sum of two and three functions 1 Ming Yan Michigan State University, CMSE/Mathematics 1 This works is partially supported by NSF. optimization problems for primal-dual algorithms

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J 7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

A Two-Stage Moment Robust Optimization Model and its Solution Using Decomposition

A Two-Stage Moment Robust Optimization Model and its Solution Using Decomposition A Two-Stage Moment Robust Optimization Model and its Solution Using Decomposition Sanjay Mehrotra and He Zhang July 23, 2013 Abstract Moment robust optimization models formulate a stochastic problem with

More information

Inexact Newton Methods and Nonlinear Constrained Optimization

Inexact Newton Methods and Nonlinear Constrained Optimization Inexact Newton Methods and Nonlinear Constrained Optimization Frank E. Curtis EPSRC Symposium Capstone Conference Warwick Mathematics Institute July 2, 2009 Outline PDE-Constrained Optimization Newton

More information

DLM: Decentralized Linearized Alternating Direction Method of Multipliers

DLM: Decentralized Linearized Alternating Direction Method of Multipliers 1 DLM: Decentralized Linearized Alternating Direction Method of Multipliers Qing Ling, Wei Shi, Gang Wu, and Alejandro Ribeiro Abstract This paper develops the Decentralized Linearized Alternating Direction

More information

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Lectures 9 and 10: Constrained optimization problems and their optimality conditions Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained

More information

LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION

LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION JUNFENG YANG AND XIAOMING YUAN Abstract. The nuclear norm is widely used to induce low-rank solutions for

More information

Proximal ADMM with larger step size for two-block separable convex programming and its application to the correlation matrices calibrating problems

Proximal ADMM with larger step size for two-block separable convex programming and its application to the correlation matrices calibrating problems Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., (7), 538 55 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa Proximal ADMM with larger step size

More information

Convergence of Fixed-Point Iterations

Convergence of Fixed-Point Iterations Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and

More information

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix

More information

Generalized ADMM with Optimal Indefinite Proximal Term for Linearly Constrained Convex Optimization

Generalized ADMM with Optimal Indefinite Proximal Term for Linearly Constrained Convex Optimization Generalized ADMM with Optimal Indefinite Proximal Term for Linearly Constrained Convex Optimization Fan Jiang 1 Zhongming Wu Xingju Cai 3 Abstract. We consider the generalized alternating direction method

More information

A Solution Method for Semidefinite Variational Inequality with Coupled Constraints

A Solution Method for Semidefinite Variational Inequality with Coupled Constraints Communications in Mathematics and Applications Volume 4 (2013), Number 1, pp. 39 48 RGN Publications http://www.rgnpublications.com A Solution Method for Semidefinite Variational Inequality with Coupled

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

Signal Processing and Networks Optimization Part VI: Duality

Signal Processing and Networks Optimization Part VI: Duality Signal Processing and Networks Optimization Part VI: Duality Pierre Borgnat 1, Jean-Christophe Pesquet 2, Nelly Pustelnik 1 1 ENS Lyon Laboratoire de Physique CNRS UMR 5672 pierre.borgnat@ens-lyon.fr,

More information

Solving large Semidefinite Programs - Part 1 and 2

Solving large Semidefinite Programs - Part 1 and 2 Solving large Semidefinite Programs - Part 1 and 2 Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Singapore workshop 2006 p.1/34 Overview Limits of Interior

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent Caihua Chen Bingsheng He Yinyu Ye Xiaoming Yuan 4 December, Abstract. The alternating direction method

More information

Distributed Optimization and Statistics via Alternating Direction Method of Multipliers

Distributed Optimization and Statistics via Alternating Direction Method of Multipliers Distributed Optimization and Statistics via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University Stanford Statistics Seminar, September 2010

More information

Primal-dual coordinate descent

Primal-dual coordinate descent Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Lecture: Duality.

Lecture: Duality. Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information