Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
|
|
- Chastity Crawford
- 5 years ago
- Views:
Transcription
1 Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers
2 Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x) + g(y) s.t. Ax + By = b min f(x) + g(x) because it is equivalent to (by variable-splitting) min f(x) + g(y) s.t. x y = 0 both f and g closed and convex both f and g have special structures: easy proximal mappings possible that both f and g are nonsmooth, so proximal gradient method cannot be used
3 Shiqian Ma, MAT-258A: Numerical Optimization 3 Robust PCA or Example: Robust PCA min X R m n X + ρ M X 1 min + ρ Y 1, s.t. X + Y = M X,Y R m n Can we use ALM to solve this? augmented Lagrangian function L t (X, Y ; Λ) = X +ρ Y 1 Λ, X +Y M + t 2 X +Y M 2 F ALM: any disadvantage? { (X k+1, Y k+1 ) = argmin X,Y L t (X, Y ; Λ k ) Λ k+1 = Λ k t(x k+1 + Y k+1 M)
4 Shiqian Ma, MAT-258A: Numerical Optimization 4 Surveillance video background extraction
5 Shiqian Ma, MAT-258A: Numerical Optimization 5 Example: Sparse Inverse Covariance Selection Sparse Inverse Covariance Selection Can we use PGM to solve this? min X log det(x) + Σ, X + ρ X 1 Proximal gradient method (note that g(x) = log det(x)+ Σ, X is smooth) X k+1 := argmin X τ X t X (Xk t g(x k )) 2 F any disadvantage?
6 Shiqian Ma, MAT-258A: Numerical Optimization 6 Example: Sparse Inverse Covariance Selection min X log det(x) + Σ, X + ρ X 1 equivalent to (by variable-splitting) min X,Y log det(x) + Σ, X + ρ Y 1 s.t. X Y = 0 Can we use ALM to solve this? augmented Lagrangian function L t (X, Y ; Λ) = log det(x)+ Σ, X +ρ Y 1 Λ, X Y + t 2 X Y 2 F ALM: { (X k+1, Y k+1 ) = argmin X,Y L t (X, Y ; Λ k ) Λ k+1 = Λ k t(x k+1 Y k+1 ) any disadvantage?
7 Shiqian Ma, MAT-258A: Numerical Optimization 7 Alternating Direction Method of Multipliers (ADMM) Robust PCA: min + ρ Y 1, s.t. X + Y = M X,Y R m n augmented Lagrangian function L t (X, Y ; Λ) = X +ρ Y 1 Λ, X +Y M + t 2 X +Y M 2 F ALM: { (X k+1, Y k+1 ) = argmin X,Y L t (X, Y ; Λ k ) Λ k+1 = Λ k t(x k+1 + Y k+1 M) Alternating Direction Method of Multipliers X k+1 = argmin X L t (X, Y k ; Λ k ) Y k+1 = argmin Y L t (X k+1, Y ; Λ k ) Λ k+1 = Λ k t(x k+1 + Y k+1 M)
8 Shiqian Ma, MAT-258A: Numerical Optimization 8 The X-subproblem is min X + ρ Y k 1 Λ k, X + Y k M + t X 2 X + Y k M 2 F equivalent to min X X + t 2 X + Y k M Λ k /t 2 F the proximal mapping of X The Y -subproblem is min X k+1 +ρ Y 1 Λ k, X k+1 +Y M + t Y 2 Xk+1 +Y M 2 F equivalent to min Y ρ Y 1 + t 2 Xk+1 + Y M Λ k /t 2 F the proximal mapping of Y 1
9 Shiqian Ma, MAT-258A: Numerical Optimization 9 Alternating Direction Method of Multipliers Sparse inverse covariance selection min X,Y log det(x) + Σ, X + ρ Y 1 s.t. X Y = 0 augmented Lagrangian function L t (X, Y ; Λ) = log det(x)+ Σ, X +ρ Y 1 Λ, X Y + t 2 X Y 2 F ALM: { (X k+1, Y k+1 ) = argmin X,Y L t (X, Y ; Λ k ) Λ k+1 = Λ k t(x k+1 Y k+1 ) ADMM: X k+1 = argmin X L t (X, Y k ; Λ k ) Y k+1 = argmin Y L t (X k+1, Y ; Λ k ) Λ k+1 = Λ k t(x k+1 + Y k+1 M)
10 Shiqian Ma, MAT-258A: Numerical Optimization 10 The X-subproblem is min log det(x)+ Σ, X +ρ Y k 1 Λ k, X Y k + t X 2 X Y k 2 F equivalent to min log det(x) + t X 2 X + Y k M + (Σ Λ k )/t 2 F the proximal mapping of log det(x) The Y -subproblem is min log det(x k+1 )+ Σ, X k+1 +ρ Y 1 Λ k, X k+1 Y + t Y 2 Xk+1 Y 2 F equivalent to min Y ρ Y 1 + t 2 Xk+1 Y Λ k /t 2 F the proximal mapping of Y 1
11 Shiqian Ma, MAT-258A: Numerical Optimization 11 General form of ADMM Convex minimization with two-block separable structure: augmented Lagrangian function min f(x) + g(y) s.t. Ax + By = b L t (x, y; λ) = f(x) + g(y) λ, Ax + By b + t 2 Ax + By b 2 2 ADMM Two subproblems x k+1 = argmin x L t (x, y k ; λ k ) y k+1 = argmin y L t (x k+1, y; λ k ) λ k+1 = λ k t(ax k+1 + By k+1 b) x k+1 = argmin x f(x) + t 2 Ax + Byk b λ k /t 2 2 y k+1 = argmin y g(y) + t 2 Axk+1 + By b λ k /t 2 2
12 Shiqian Ma, MAT-258A: Numerical Optimization 12 Variable splitting and reformulation sum of two functions with structures (could be indicator function) For many applications, one can apply variable-splitting to reformulate the problem as min f(x) + g(y) s.t. Ax + By = b such that both f and g have easy proximal mappings then one can apply ADMM For example is equivalent to min f(x) + g(ax b) min f(x) + g(y) s.t. Ax y = b
13 Shiqian Ma, MAT-258A: Numerical Optimization 13 Compressed sensing with noise can be reformulated as min x 1 s.t. Ax b 2 σ or Augmented Lagrangian function min x 1 s.t. Ax y = b y 2 σ min x 1 + I y 2 σ(y) s.t. Ax y = b L t (x, y; λ) = x 1 + I y 2 σ(y) λ, Ax y b + t 2 Ax y b 2 2
14 Shiqian Ma, MAT-258A: Numerical Optimization 14 Apply ADMM x-subproblem: y-subproblem: x k+1 = argmin x L t (x, y k ; λ k ) y k+1 = argmin y L t (x k+1, y; λ k ) λ k+1 = λ k t(ax k+1 y k+1 b) x k+1 = argmin x y k+1 = argmin y this is the projection onto l 2 -ball. x 1 + t 2 Ax yk b λ k /t 2 2 I y 2 σ(y) + t 2 Axk+1 y b λ k /t 2 2
15 Shiqian Ma, MAT-258A: Numerical Optimization 15 Portfolio Selection r i : random variable, the rate of return for stock i x i : the relative amount invested in stock i Return: r = r 1 x 1 + r 2 x r n x n expected return: R = E(r) = E(r i )x i = µ i x i Risk: V = V ar(r) = ij σ ijx i x j = x Σx min (1/2)x Σx s.t. i µ ix i = r 0 i x i = 1 x i 0, i = 1,..., n
16 Shiqian Ma, MAT-258A: Numerical Optimization 16 Can be reformulated as (define set C as the probability simplex) Augmented Lagrangian function: min (1/2)x Σx s.t. µ x = r 0 x y = 0 y C L t (x, y; λ 1, λ 2 ) = (1/2)x Σx + I {y C} (y) λ 1, µ x r 0 λ 2, x y + t 2 µ x r t 2 x y 2 2 ADMM: x k+1 = argmin x L t (x, y k ; λ k 1, λ k 2) y k+1 = argmin y L t (x k+1, y; λ k 1, λ k 2) λ k+1 1 = λ k 1 t(µ x k+1 r 0 ) λ k+1 2 = λ k 2 t(x k+1 y k+1 )
17 Shiqian Ma, MAT-258A: Numerical Optimization 17 Total variation image deblurring Use u R n2 to denote an n n gray-scale image. Use K R n2 n 2 to represent a blurring operator An observation of the image is obtained by (ɛ is noise) b = Ku + ɛ So one wants to minimize Ku b 2 2 A widely used technique in image processing is to use the Total Variation term to preserve the sharp edges n T V (u) = (u i+1,j u ij ) 2 + (u i,j+1 u ij ) 2 i,j=1
18 Shiqian Ma, MAT-258A: Numerical Optimization 18 By slight abuse of the notation (now u is a n 2 -dim vector), TV can also be written as T V (u) = the TV image deblurring model is n 2 i=1 D i u 2 min u n 2 i=1 D i u 2 + ρ 2 Ku b 2 2 By variable-splitting, reformulate it as min u,w n 2 i=1 w i 2 + ρ 2 Ku b 2 2 s.t. D i u w i = 0, i = 1,..., n 2
19 Shiqian Ma, MAT-258A: Numerical Optimization 19 augmented Lagrangian function L t (u, w; λ) = n 2 i=1 w i 2 + ρ 2 Ku b 2 2 n 2 i=1 λ i, D i u w i + n 2 i=1 t 2 D iu w i 2 2 ADMM u k+1 = argmin u L t (u, w k ; λ k ) w k+1 = argmin w L t (u k+1, w; λ k ) λ k+1 i = λ k i t(d iu k+1 wi k+1 ), i = 1,..., n 2 the w-subproblem is separable for w i
20 Shiqian Ma, MAT-258A: Numerical Optimization 20 TV+L1 model for image reconstruction The image u is sparse under wavelet transform Ψ, i.e., Ψu is sparse Reformulate as min u n 2 i=1 D i u 2 + γ Ψu 1 + ρ 2 Ku b 2 2 min n 2 i=1 w i 2 + γ v 1 + ρ 2 Ku b 2 2 s.t. D i u w i = 0, i = 1,..., n 2 Ψu v = 0 augmented Lagrangian function L t (u, w, v; λ, µ) = n 2 i=1 w i 2 + γ v 1 + ρ 2 Ku b 2 2 λ i, D i u w i + n 2 i=1 t 2 D iu w i 2 2 µ, Ψu v + t 2 Ψu v 2 2
21 Shiqian Ma, MAT-258A: Numerical Optimization 21 ADMM u k+1 = argmin u L t (u, w k, v k ; λ k, µ k ) (w k+1, v k+1 ) = argmin w,v L t (u k+1, w, v; λ k, µ k ) λ k+1 i = λ k i t(d iu k+1 wi k+1 ), i = 1,..., n 2 µ k+1 = µ k t(ψu k+1 v k+1 ) note that the subproblem for (w, v) is separable for w and v.
22 Shiqian Ma, MAT-258A: Numerical Optimization 22 The standard SDP: Semidefinite Programming min X S n C, X s.t. A (i), X = b i, i = 1,..., m X 0 where C, A (i) S n, i = 1,..., m The dual problem min y R m,s S n s.t. b y A (y) + S = C S 0 augmented Lagrangian function (X is the Lagrange multiplier) L t (y, S; X) = b y+i {S 0} (S) X, A (y)+s C + t 2 A (y)+s C 2 F
23 Shiqian Ma, MAT-258A: Numerical Optimization 23 ADMM y k+1 = argmin y L t (y, S k ; X k ) S k+1 = argmin S L t (y k+1, S; X k ) X k+1 = X k t(a (y k+1 ) + S k+1 C)
24 Shiqian Ma, MAT-258A: Numerical Optimization 24 sparse covariance matrix estimation 1 min X S n 2 X Σ 2 F + ρ X 1, s.t.x 0 where Σ is the sample covariance matrix which may not be sparse and not positive semidefinite Reformulation (by variable-splitting) augmented Lagrangian function 1 min X,Y S n 2 X Σ 2 F + ρ X 1 s.t. X Y = 0 Y 0 L t (X, Y ; Λ) = 1 2 X Σ 2 F+ρ X 1 +I {Y 0} (Y )+ Λ, X Y + t 2 X Y 2 F
25 Shiqian Ma, MAT-258A: Numerical Optimization 25 ADMM X k+1 = argmin X L t (X, Y k ; Λ k ) Y k+1 = argmin Y L t (X k+1, Y ; Λ k ) Λ k+1 = Λ k t(x k+1 Y k+1 )
26 Shiqian Ma, MAT-258A: Numerical Optimization 26 Nonconvex model: Optimization on sphere min f(x) + x 1 x s.t. x 2 = 1 where f(x) is a differentiable function. Reformulation: min x,y f(x) + x 1 s.t. x y = 0 y 2 = 1 augmented Lagrangian function L t (x, y; λ) = f(x) + x 1 + I { y 2 =1}(y) λ, x y + t 2 x y 2 2 ADMM x k+1 = argmin x L t (x, y k ; λ k ) y k+1 = argmin y L t (x k+1, y; λ k ) λ k+1 = λ k t(x k+1 y k+1 )
27 Shiqian Ma, MAT-258A: Numerical Optimization 27 Linearized ADMM The standard form of the problem min x,y s.t. Augmented Lagrangian function f(x) + g(y) Ax + By = b L t (x, y; λ) = f(x) + g(y) λ, Ax + By b + t 2 Ax + By b 2 2 ADMM x k+1 = argmin x f(x) + t 2 Ax + Byk b λ k /t 2 2 y k+1 = argmin y g(y) + t 2 Axk+1 + By b λ k /t 2 2 λ k+1 = λ k t(ax k+1 + By k+1 b) The two subproblems are not easy if A and B are not identity matrices
28 Shiqian Ma, MAT-258A: Numerical Optimization 28 Use proximal gradient method to solve them For example, the x-subproblem is min x f(x) + h(x) iterates x i+1 = argmin x f(x) + 1 2τ x (xi τ h(x i )) 2 2 where τ < 1/L and L is the Lipschitz constant of h But, at the end, it is just a subproblem, we do not want to solve it to a very high accuracy in fact, one iteration of PGM is enough. This leads to the linearized ADMM
29 Shiqian Ma, MAT-258A: Numerical Optimization 29 linearized ADMM x k+1 = argmin x f(x) + 1 2τ 1 x (x k τ 1 ta (Ax k + By k b λ k /t) 2 2 y k+1 = argmin y g(y) + 1 2τ 2 y (y k τ 2 tb (Ax k+1 + By k b λ k /t)) 2 2 λ k+1 = λ k t(ax k+1 + By k+1 b) where τ 1 < 1/λ max (A A) and τ 2 < 1/λ max (B B) Now the two subproblems are easy: they are the proximal mappings of f and g
30 Shiqian Ma, MAT-258A: Numerical Optimization 30 Global Convergence of ADMM The problem Lagrangian function min x,y s.t. f(x) + g(y) Ax + By = b L(x, y; λ) = f(x) + g(y) λ, Ax + By b Optimality conditions: (x, y ; λ ) is optimal, if A λ f(x ) B λ g(y ) Ax + By = b
31 Shiqian Ma, MAT-258A: Numerical Optimization 31 ADMM: x k+1 = argmin x f(x) + t 2 Ax + Byk b λ k /t 2 2 y k+1 = argmin y g(y) + t 2 Axk+1 + By b λ k /t 2 2 λ k+1 = λ k t(ax k+1 + By k+1 b) Theorem: If A and B are full column rank, ADMM globally converges to the optimal solution (x, y ; λ ) for any t > 0, and any initial point (y 0, λ 0 ). Proof. The optimality conditions for the two subproblems are: 0 f(x k+1 ) + ta (Ax k+1 + By k b λ k /t) 0 g(y k+1 ) + tb (Ax k+1 + By k+1 b λ k /t) Using the updating formula for λ k+1, we have
32 Shiqian Ma, MAT-258A: Numerical Optimization 32 A (λ k+1 tb(y k y k+1 )) f(x k+1 ) (1) B λ k+1 g(y k+1 ) (2) Because f( ) and g( ) are monotone operators, we have x k+1 x, A (λ k+1 λ tb(y k y k+1 )) 0 y k+1 y, B (λ k+1 λ ) 0 Summing these two inequalities, we have (x k+1 x ) A (λ k+1 λ ) t(x k+1 x ) A B(y k y k+1 ) +(y k+1 y ) B (λ k+1 λ ) 0 this is equivalent to (λ k+1 λ ) (Ax k+1 +By k+1 b) t(x k+1 x ) A B(y k y k+1 ) 0 ( )
33 Shiqian Ma, MAT-258A: Numerical Optimization 33 Note that by we get Ax k+1 + By k+1 b = (λ k λ k+1 )/t, Ax + By b = 0 A(x k+1 x ) = B(y k+1 y ) + (λ k λ k+1 )/t substitute this to ( ), we get 1 t (λk+1 λ ) (λ k λ k+1 ) + t(by k+1 By ) (By k By k+1 ) (λ k λ k+1 ) (By k By k+1 ) Define we get u = ( ) y, H = λ ( ) tb B t I u k+1 u, u k u k+1 H λ k λ k+1, By k By k+1
34 Shiqian Ma, MAT-258A: Numerical Optimization 34 Because we have Thus Because B λ k+1 g(y k+1 ), B λ k g(y k ) y k y k+1, B λ k B λ k+1 0 u k+1 u, u k u k+1 H 0 u k+1 u 2 H = u k+1 u k 2 H 2 u k u k+1, u k u H + u k u 2 H we have u k u 2 H uk+1 u 2 H = 2 u k u k+1, u k u H u k+1 u k 2 H = 2 u k u k+1, (u k u k+1 ) + (u k+1 u ) H u k+1 u k 2 H = u k+1 u k 2 H + 2 uk u k+1, u k+1 u H u k+1 u k 2 H ( )
35 Shiqian Ma, MAT-258A: Numerical Optimization 35 From ( ) we have the following conclusion: (i) u k u k+1 H 0 (ii) {u k } lies in a compact region (iii) u k u 2 H is monotonically non-increasing and thus converges From (i) we have By k By k+1 0 and λ k λ k+1 0. Then Ax k + By k b 0 and Ax k Ax k+1 0. Since A and B are full column rank, we have x k x k+1 0 and y k y k+1 0. From (ii) we know u k has a subsequence {u k j} that converges to û = (ŷ, ˆλ). Therefore, x k j ˆx. So (ˆx, ŷ, ˆλ) is a limit point of {(x k, y k, λ k )} and Aˆx + Bŷ b = 0. From (1) and (2) we know that 0 f(ˆx) A ˆλ 0 g(ŷ) B ˆλ
36 Shiqian Ma, MAT-258A: Numerical Optimization 36 thus (ˆx, ŷ, ˆλ) satisfies the KKT conditions and thus is an optimal solution. Therefore, we showed that any limit point of {(x k, y k, λ k )} is an optimal solution. To complete the proof, it remains to show that {(x k, y k, λ k )} has a unique limit point. Let {(ˆx 1, ŷ 1, ˆλ 1 )} and {(ˆx 2, ŷ 2, ˆλ 2 )} be any two limit points of {(x k, y k, λ k )}. As we have shown, both {(ˆx 1, ŷ 1, ˆλ 1 )} and {(ˆx 2, ŷ 2, ˆλ 2 )} are optimal solutions. Thus, u in ( ) can be replaced by û 1 := (ŷ 1, ˆλ 1 ) and û 2 := (ŷ 2, ˆλ 2 ). This results in u k+1 û i 2 H u k û i 2 H, i = 1, 2, and we thus get the existence of the limits lim k uk û i H = η i < +, i = 1, 2. Now using the identity u k û 1 2 H u k û 2 2 H = 2 u k, û 1 û 2 H + û 1 2 H û 2 2 H
37 Shiqian Ma, MAT-258A: Numerical Optimization 37 and passing the limit we get η1 2 η2 2 = 2 û 1, û 1 û 2 H + û 1 2 H û 2 2 H = û 1 û 2 2 H and η1 2 η2 2 = 2 û 2, û 1 û 2 H + û 1 2 H û 2 2 H = û 1 û 2 2 H. Thus we must have û 1 û 2 2 H {(x k, y k, λ k )} is unique. = 0 and hence the limit point of
38 Shiqian Ma, MAT-258A: Numerical Optimization 38 linearized ADMM Convergence of Linearized ADMM x k+1 = argmin x f(x) + 1 2τ 1 x (x k τ 1 ta (Ax k + By k b λ k /t) 2 2 y k+1 = argmin y g(y) + 1 2τ 2 y (y k τ 2 tb (Ax k+1 + By k b λ k /t)) 2 2 λ k+1 = λ k t(ax k+1 + By k+1 b) Theorem: If τ 1 < 1/λ max (A A) and τ 2 < 1/λ max (B B), linearized ADMM globally converges to the optimal solution (x, y ; λ ) for any t > 0, and any initial point (y 0, λ 0 ). Proof. See the posted paper for proof.
39 Shiqian Ma, MAT-258A: Numerical Optimization 39 Extensions: Multi-block ADMM How about the function and variables have 3 parts (blocks)? augmented Lagrangian function min f 1 (x 1 ) + f 2 (x 2 ) + f 3 (x 3 ) s.t. A 1 x 1 + A 2 x 2 + A 3 x 3 = b L t (x 1, x 2, x 3 ; λ) = f 1 (x 1 ) + f 2 (x 2 ) + f 3 (x 3 ) λ, A 1 x 1 + A 2 x 2 + A 3 x 3 b + t 2 A 1x 1 + A 2 x 2 + A 3 x 3 b 2 2 Multi-block ADMM: x k+1 1 = argmin x1 L t (x 1, x k 2, x k 3; λ k ) x k+1 2 = argmin x2 L t (x k+1 1, x 2, x k 3; λ k ) x k+1 3 = argmin x3 L t (x k+1 1, x k+1 2, x 3 ; λ k ) λ k+1 = λ k t(a 1 x k A 2 x k A 3 x k+1 3 b)
40 Shiqian Ma, MAT-258A: Numerical Optimization 40 RPCA with noise Applications min X + ρ Y 1 s.t. X + Y + Z = M Z F σ Latent variable graphical model (See Lecture 1) min R,S,L R, ˆΣ X log det(r) + α S 1 + βtr(l) s.t. R = S L, R 0, L 0.
41 Shiqian Ma, MAT-258A: Numerical Optimization 41 Convergence without further conditions, multi-block is not necessarily convergent Counter-example by Chen, He, Ye and Yuan (2013) min 0 s.t. A 1 x 1 + A 2 x 2 + A 3 x 3 = 0, where (A 1, A 2, A 3 ) =
42 Shiqian Ma, MAT-258A: Numerical Optimization 42 The update of multi-block ADMM with t = 1 is x k x k x k+1 3 = λ k x k 1 x k 2 x k 3 λ k Equivalently, x k+1 2 x k+1 3 λ k+1 = M x k 2 x k 3, λ k where
43 Shiqian Ma, MAT-258A: Numerical Optimization 43 M = Note that ρ(m) > Theorem (Chen-He-Ye-Yuan-2013): There existing an example where the direct extension of ADMM of three blocks with a real number initial point is not necessarily convergent for any choice of t > 0.
44 Shiqian Ma, MAT-258A: Numerical Optimization 44 Sufficient conditions for convergence of multi-block ADMM This is a trendy topic for ADMM; still under developing Han and Yuan (2012): Golbal convergence, if f 1,..., f N strongly convex, and t is restricted to be small are all Lin, Ma and Zhang (2014): Sublinear convergence rate, if f 2,..., f N are strongly convex, and t is restricted to be small Lin, Ma and Zhang (2014): Globally linear convergence rate, if f 2,..., f N are strongly convex, f N Lipschitz continuous, A N full row rank, and t is restricted to be small Cai, Han and Yuan (2014): Sublinear convergence rate for N = 3, if f 3 is strongly convex, and t is restricted to be small
45 Shiqian Ma, MAT-258A: Numerical Optimization 45 Li, Sun and Toh (2014): Global convergence with proximal terms for N = 3, if f 3 is strongly convex a lot of following up works going on...
46 Shiqian Ma, MAT-258A: Numerical Optimization 46 Variants: Transform multi-block to two-block What if I do not want to impose additional conditions? Many varaints of multi-block ADMM with guaranteed convergence But usually they perform worse than the original multi-block ADMM, although the latter one is not theoretically guaranteed One variant is the following (Wang, Hong, Ma and Luo (2013)): first tranform the original problem to the following one: min f 1 (x 1 ) + f 2 (x 2 ) f N (x N ) s.t. A 1 x 1 + A 2 x A N x N = b min f 1 (x 1 ) + f 2 (x 2 ) f N (x N ) s.t. A i x i b/n = y i y 1 + y y N = 0
47 Shiqian Ma, MAT-258A: Numerical Optimization 47 then apply two-block ADMM to the transformed problem: N N N t L t (x, y; λ i ) = f i (x i ) λ i, A i x i b/n y i + 2 A ix i b/n y i 2 i=1 i=1 x-subproblems are separable y-subproblem is an easy projection Lagrangian function i=1 min 1 2 y z 2, s.t. y y N = 0 L(y, µ) = KKT conditions: N i=1 1 2 y i z i 2 µ, y y N y i z i µ = 0, and y y N = 0
48 Shiqian Ma, MAT-258A: Numerical Optimization 48 So we get µ = 1 N N z i and y i = z i 1 N i=1 N i=1 z i theoretically guaranteed to converge
49 Shiqian Ma, MAT-258A: Numerical Optimization 49 Gradient-based ADMM min f(x) + g(y) s.t. Ax + By = 0 What if f has an easy proximal mapping, but g does not? Assume g is smooth ADMM x k+1 = argmin x L t (x, y k ; λ k ) y k+1 = argmin y L t (x k+1, y; λ k ) λ k+1 = λ k t(ax k+1 + By k+1 b) Gradient-based ADMM: take a gradient step for the y-subproblem x k+1 = argmin x L t (x, y k ; λ k ) y k+1 = y k t y L t (x k+1, y k ; λ k ) λ k+1 = λ k t(ax k+1 + By k+1 b)
50 Shiqian Ma, MAT-258A: Numerical Optimization 50 Sparse logistic regression: where l(x, c) = 1 m min x 1 + l(x, c) m log(1 + exp( b i (x a i + c))) i=1 reformulation of sparse logistic regression: min x 1 + l(y, c) s.t. x y = 0 take gradient step for (y, c)-subproblem Fused logistic regression: min x 1 + n x i x i 1 + l(x, c) i=1
51 Shiqian Ma, MAT-258A: Numerical Optimization 51 reformulation of fused logistic regression augmented Lagrangian function min x 1 + w 1 + l(y, c) s.t. w = By x = y L t (x, w, y, c; λ 1, λ 2 ) = x 1 + w 1 + l(y, c) λ 1, w By + t 2 w By 2 λ 2, x y + t 2 x y 2
52 Shiqian Ma, MAT-258A: Numerical Optimization 52 exactly solve (x, w)-subproblem, take gradient step for (y, c)- subproblem
53 Shiqian Ma, MAT-258A: Numerical Optimization 53 One more example on nonconvex problem Semidefinite programming min X S n C, X s.t. A (i), X = b i, i = 1,..., m X 0 Any positive semidefinite matrix X can be rewritten as X = V V, where V R n n reformulation of SDP: min V R n n C, V V s.t. A (i), V V = b i, i = 1,..., m This is a nonconvex equality-constrained problem: you can use augmented Lagrangian method as long as you have a good way to mi-
54 Shiqian Ma, MAT-258A: Numerical Optimization 54 nimize the augmented Lagrangian function: L t (V, λ) = C, V V λ, A(V V ) b + t 2 A(V V ) b 2 The augmented Lagrangian method: V k+1 := argmin V L t (V, λ) λ k+1 := λ k t(a(v k+1 V k+1 ) b) Two-block reformulation (X = UV and U = V ) min U,V R n n C, UV s.t. A (i), UV = b i, i = 1,..., m U V = 0 augmented Lagrangian function L t (U, V ; λ, Λ) = C, UV λ, A(UV ) b + t 2 A(UV ) b 2 Λ, U V + t 2 U V 2 F
55 Shiqian Ma, MAT-258A: Numerical Optimization 55 ADMM: U k+1 := argmin U L t (U, V k ; λ k, Λ k ) V k+1 := argmin V L t (U k+1, V ; λ k, Λ k ) λ k+1 := λ k t(a(u k+1 V k+1 ) b) Λ k+1 := Λ k t(u k+1 V k+1 )
56 Shiqian Ma, MAT-258A: Numerical Optimization 56 Lots of recent developments of ADMM Sufficient conditions for multi-block ADMM for convex problems Convergence analysis for ADMM for nonconvex problems Stochastic ADMM Online ADMM...
57 Shiqian Ma, MAT-258A: Numerical Optimization 57 Relation with operator-splitting method Operator-splitting methods for inclusion problem of monotone operators Find u, s.t., 0 S(u) + T (u) where S, T : R n R n are maximal monotone operators T is monotone operator if (u v) (T (u) T (v)) 0, u, v T is called maximal monotone if there is no monotone operator that properly contains it
58 Shiqian Ma, MAT-258A: Numerical Optimization 58 Douglas-Rachford Operator Splitting Method Find u, s.t., 0 S(u) + T (u) Douglas-Rachford operator splitting method v k+1 u k+1 = JS τ(2j T τ I)vk + (I JT τ )vk = JT τ vk+1 J τ T = (I + τt ) 1 is called the resolvent of operator T Example: Optimality condition is: min f(x) + g(x) 0 f(x) + g(x) so, S = f and T = g
59 Shiqian Ma, MAT-258A: Numerical Optimization 59 Now y = J τ S (x) = (I + τs) 1 (x) = (I + τ f) 1 (x) means that x y + τ f(y) This is the optimality condition of min y τf(y) y x 2 2 This is the proximal mapping of f
60 Shiqian Ma, MAT-258A: Numerical Optimization 60 Primal problem Dual problem Separable convex minimization min λ min f(x) + g(y) s.t. Ax + By = b f (A λ) + g (B λ) b λ Optimality condition of dual problem Find λ, s.t. 0 A f (A λ) + B g (B λ) b Define S( ) = A f (A ), T ( ) = B g (B ) b Apply Douglas-Rachford splitting method to Find λ, s.t. 0 S(λ) + T (λ) is equivalent to apply ADMM to the primal problem
61 Shiqian Ma, MAT-258A: Numerical Optimization 61 Peaceman-Rachford operator splitting method Find u, s.t. 0 S(u) + T (u) Peaceman-Rachford operator splitting method: v k+1 u k+1 If apply to the dual problem of = (2JS τ I)(2J T τ I)vk = JT τ vk+1 min f(x) + g(y) s.t. Ax + By = b then it is equivalent to the following algorithm: (symmetric ADMM) x k+1 = argmin x L t (x, y k ; λ k ) λ k+1 2 = λ k t(ax k+1 + By k b) y k+1 = argmin y L t (x k+1, y; λ k+1 2) λ k+1 = λ k+1 2 t(ax k+1 + By k+1 b)
62 Shiqian Ma, MAT-258A: Numerical Optimization 62 Lots of recent developments Operator splitting method has broader application Recent research questions: Three (or more) operators? other operator splitting schemes? sufficient conditions? convergence rate?...
Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables
Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.16
XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of
More informationInexact Alternating Direction Method of Multipliers for Separable Convex Optimization
Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.11
XI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 Alternating direction methods of multipliers for separable convex programming Bingsheng He Department of Mathematics
More informationContraction Methods for Convex Optimization and monotone variational inequalities No.12
XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department
More informationACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING
ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely
More informationBeyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory
Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationThe Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent
The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent Yinyu Ye K. T. Li Professor of Engineering Department of Management Science and Engineering Stanford
More informationOptimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method
Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors
More informationON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS
ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in
More informationHYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING
SIAM J. OPTIM. Vol. 8, No. 1, pp. 646 670 c 018 Society for Industrial and Applied Mathematics HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX
More informationDistributed Optimization via Alternating Direction Method of Multipliers
Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition
More informationAlternating Direction Augmented Lagrangian Algorithms for Convex Optimization
Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization Joint work with Bo Huang, Shiqian Ma, Tony Qin, Katya Scheinberg, Zaiwen Wen and Wotao Yin Department of IEOR, Columbia University
More informationApplication of the Strictly Contractive Peaceman-Rachford Splitting Method to Multi-block Separable Convex Programming
Application of the Strictly Contractive Peaceman-Rachford Splitting Method to Multi-block Separable Convex Programming Bingsheng He, Han Liu, Juwei Lu, and Xiaoming Yuan Abstract Recently, a strictly contractive
More informationA GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM
A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM CAIHUA CHEN, SHIQIAN MA, AND JUNFENG YANG Abstract. In this paper, we first propose a general inertial proximal point method
More informationARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationLinearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学
Linearized Alternating Direction Method: Two Blocks and Multiple Blocks Zhouchen Lin 林宙辰北京大学 Dec. 3, 014 Outline Alternating Direction Method (ADM) Linearized Alternating Direction Method (LADM) Two Blocks
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationComposite nonlinear models at scale
Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)
More informationAdaptive Primal Dual Optimization for Image Processing and Learning
Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University
More informationAlternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization
Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationA Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem
A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a
More informationSparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More informationAn ADMM algorithm for optimal sensor and actuator selection
An ADMM algorithm for optimal sensor and actuator selection Neil K. Dhingra, Mihailo R. Jovanović, and Zhi-Quan Luo 53rd Conference on Decision and Control, Los Angeles, California, 2014 1 / 25 2 / 25
More informationarxiv: v1 [math.oc] 27 Jan 2013
An Extragradient-Based Alternating Direction Method for Convex Minimization arxiv:1301.6308v1 [math.oc] 27 Jan 2013 Shiqian MA Shuzhong ZHANG January 26, 2013 Abstract In this paper, we consider the problem
More information9. Dual decomposition and dual algorithms
EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple
More informationarxiv: v1 [math.oc] 13 Dec 2018
A NEW HOMOTOPY PROXIMAL VARIABLE-METRIC FRAMEWORK FOR COMPOSITE CONVEX MINIMIZATION QUOC TRAN-DINH, LIANG LING, AND KIM-CHUAN TOH arxiv:8205243v [mathoc] 3 Dec 208 Abstract This paper suggests two novel
More informationA Primal-dual Three-operator Splitting Scheme
Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm
More informationSplitting methods for decomposing separable convex programs
Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques
More informationOptimal Linearized Alternating Direction Method of Multipliers for Convex Programming 1
Optimal Linearized Alternating Direction Method of Multipliers for Convex Programming Bingsheng He 2 Feng Ma 3 Xiaoming Yuan 4 October 4, 207 Abstract. The alternating direction method of multipliers ADMM
More informationOn the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting
Mathematical Programming manuscript No. (will be inserted by the editor) On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting Daniel O Connor Lieven Vandenberghe
More informationFirst-order methods for structured nonsmooth optimization
First-order methods for structured nonsmooth optimization Sangwoon Yun Department of Mathematics Education Sungkyunkwan University Oct 19, 2016 Center for Mathematical Analysis & Computation, Yonsei University
More informationADMM and Fast Gradient Methods for Distributed Optimization
ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work
More informationDoes Alternating Direction Method of Multipliers Converge for Nonconvex Problems?
Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Mingyi Hong IMSE and ECpE Department Iowa State University ICCOPT, Tokyo, August 2016 Mingyi Hong (Iowa State University)
More informationApplications of Linear Programming
Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal
More informationOptimization for Learning and Big Data
Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationOperator Splitting for Parallel and Distributed Optimization
Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationMatrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance
Date: Mar. 3rd, 2017 Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance Presenter: Songtao Lu Department of Electrical and Computer Engineering Iowa
More informationConvex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE
Convex Analysis Notes Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE These are notes from ORIE 6328, Convex Analysis, as taught by Prof. Adrian Lewis at Cornell University in the
More information5. Duality. Lagrangian
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationConvex Optimization Boyd & Vandenberghe. 5. Duality
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationLinearized Alternating Direction Method of Multipliers via Positive-Indefinite Proximal Regularization for Convex Programming.
Linearized Alternating Direction Method of Multipliers via Positive-Indefinite Proximal Regularization for Convex Programming Bingsheng He Feng Ma 2 Xiaoming Yuan 3 July 3, 206 Abstract. The alternating
More informationSolving DC Programs that Promote Group 1-Sparsity
Solving DC Programs that Promote Group 1-Sparsity Ernie Esser Contains joint work with Xiaoqun Zhang, Yifei Lou and Jack Xin SIAM Conference on Imaging Science Hong Kong Baptist University May 14 2014
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationSEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS
SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between
More informationINERTIAL PRIMAL-DUAL ALGORITHMS FOR STRUCTURED CONVEX OPTIMIZATION
INERTIAL PRIMAL-DUAL ALGORITHMS FOR STRUCTURED CONVEX OPTIMIZATION RAYMOND H. CHAN, SHIQIAN MA, AND JUNFENG YANG Abstract. The primal-dual algorithm recently proposed by Chambolle & Pock (abbreviated as
More informationExpanding the reach of optimal methods
Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM
More informationOn the acceleration of augmented Lagrangian method for linearly constrained optimization
On the acceleration of augmented Lagrangian method for linearly constrained optimization Bingsheng He and Xiaoming Yuan October, 2 Abstract. The classical augmented Lagrangian method (ALM plays a fundamental
More informationSolving Dual Problems
Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationPrimal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions
Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationSelected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018
Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationLecture: Duality of LP, SOCP and SDP
1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:
More informationConvex Optimization M2
Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization
More informationLecture: Algorithms for Compressed Sensing
1/56 Lecture: Algorithms for Compressed Sensing Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:
More informationPrimal-dual algorithms for the sum of two and three functions 1
Primal-dual algorithms for the sum of two and three functions 1 Ming Yan Michigan State University, CMSE/Mathematics 1 This works is partially supported by NSF. optimization problems for primal-dual algorithms
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationr=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J
7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationMinimizing the Difference of L 1 and L 2 Norms with Applications
1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:
More informationA Two-Stage Moment Robust Optimization Model and its Solution Using Decomposition
A Two-Stage Moment Robust Optimization Model and its Solution Using Decomposition Sanjay Mehrotra and He Zhang July 23, 2013 Abstract Moment robust optimization models formulate a stochastic problem with
More informationInexact Newton Methods and Nonlinear Constrained Optimization
Inexact Newton Methods and Nonlinear Constrained Optimization Frank E. Curtis EPSRC Symposium Capstone Conference Warwick Mathematics Institute July 2, 2009 Outline PDE-Constrained Optimization Newton
More informationDLM: Decentralized Linearized Alternating Direction Method of Multipliers
1 DLM: Decentralized Linearized Alternating Direction Method of Multipliers Qing Ling, Wei Shi, Gang Wu, and Alejandro Ribeiro Abstract This paper develops the Decentralized Linearized Alternating Direction
More informationLectures 9 and 10: Constrained optimization problems and their optimality conditions
Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained
More informationLINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION
LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION JUNFENG YANG AND XIAOMING YUAN Abstract. The nuclear norm is widely used to induce low-rank solutions for
More informationProximal ADMM with larger step size for two-block separable convex programming and its application to the correlation matrices calibrating problems
Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., (7), 538 55 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa Proximal ADMM with larger step size
More informationConvergence of Fixed-Point Iterations
Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and
More informationDonald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016
Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix
More informationGeneralized ADMM with Optimal Indefinite Proximal Term for Linearly Constrained Convex Optimization
Generalized ADMM with Optimal Indefinite Proximal Term for Linearly Constrained Convex Optimization Fan Jiang 1 Zhongming Wu Xingju Cai 3 Abstract. We consider the generalized alternating direction method
More informationA Solution Method for Semidefinite Variational Inequality with Coupled Constraints
Communications in Mathematics and Applications Volume 4 (2013), Number 1, pp. 39 48 RGN Publications http://www.rgnpublications.com A Solution Method for Semidefinite Variational Inequality with Coupled
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming
E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program
More informationLagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)
Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual
More informationSignal Processing and Networks Optimization Part VI: Duality
Signal Processing and Networks Optimization Part VI: Duality Pierre Borgnat 1, Jean-Christophe Pesquet 2, Nelly Pustelnik 1 1 ENS Lyon Laboratoire de Physique CNRS UMR 5672 pierre.borgnat@ens-lyon.fr,
More informationSolving large Semidefinite Programs - Part 1 and 2
Solving large Semidefinite Programs - Part 1 and 2 Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Singapore workshop 2006 p.1/34 Overview Limits of Interior
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More informationThe Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent
The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent Caihua Chen Bingsheng He Yinyu Ye Xiaoming Yuan 4 December, Abstract. The alternating direction method
More informationDistributed Optimization and Statistics via Alternating Direction Method of Multipliers
Distributed Optimization and Statistics via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University Stanford Statistics Seminar, September 2010
More informationPrimal-dual coordinate descent
Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationLecture: Duality.
Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong
More informationLecture 3. Optimization Problems and Iterative Algorithms
Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex
More information