Lecture: Algorithms for Compressed Sensing

Size: px
Start display at page:

Download "Lecture: Algorithms for Compressed Sensing"

Transcription

1 1/56 Lecture: Algorithms for Compressed Sensing Zaiwen Wen Beijing International Center For Mathematical Research Peking University Acknowledgement: part of this slides is based on Prof. Lieven Vandenberghe and Prof. Wotao Yin s lecture notes

2 Outline 2/56 Proximal gradient method Complexity analysis of Proximal gradient method Alternating direction methods of Multipliers (ADMM) Linearized Alternating direction methods of Multipliers Check lecture notes at http: //bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html

3 3/56 l 1 -regularized least square problem Consider min ψ µ (x) := µ x Ax b 2 2 Approaches: Interior point method: l1_ls Spectral gradient method: GPSR Fixed-point continuation method: FPC Active set method: FPC_AS Alternating direction augmented Lagrangian method Nesterov s optimal first-order method many others

4 Subgradient recall basic inequality for convex differentiable f : f (y) f (x) + f (x) (y x). g is a subgradient of a convex function f at x domf if f (y) f (x) + g (y x), y domf. g 2, g 3 are subgradients at x 2, g 1 is a subgradient at x 1. 4/56

5 Optimality conditions unconstrained 5/56 x minimizes f (x) if and only 0 f (x ) Proof: by definition f (y) f (x ) + 0 (y x ) for all y 0 f (x ).

6 6/56 Optimality conditions constrained min f 0 (x) s.t. f i (x) 0, i = 1,..., m. From Lagrange duality: if strong duality holds, then x, λ are primal, dual optimal if and only if x is primal feasible λ 0 complementary: λ i f i(x ) = 0 for i = 1,..., m x is a minimizer of min L(x, λ ) = f 0 (x) + i λ i f i(x), i.e., 0 x L(x, λ ) = f 0 (x ) + i λ i f i (x )

7 Proximal Gradient Method Let f (x) = 1 2 Ax b 2 2. The gradient f (x) = A (Ax b). Consider min ψ µ (x) := µ x 1 + f (x). First-order approximation + proximal term: x k+1 := arg min x R n µ x 1 + ( f (x k )) (x x k ) + 1 2τ x xk 2 2 = shrink(x k τ f (x k ), µτ) Forward-backward operator splitting: x k+1 := (I + τt 1 ) 1 (I τt 2 )x k 0 φ(x) := µ x 1 + f (x) = T 1 (x) + T 2 (x) = T(x) 0 (x + τt 1 (x)) (x τt 2 (x)) x = (I + τt 1 ) 1 (I τt 2 )x gradient step: bring in candidates for nonzero components shrinkage step: eliminate some of them by soft thresholding 7/56

8 Shrinkage (soft thresholding) 8/56 shrink(y, ν) : = arg min x R ν x x y 2 2 Chambolle, Devore, Lee and Lucier Figueirdo, Nowak and Wright Elad, Matalon and Zibulevsky Hales, Yin and Zhang Darbon, Osher Many others = sgn(y) max( y ν, 0) { y νsgn(y), if y > ν = 0, otherwise

9 Proximal gradient method For General Problems Consider the model min F(x) := f (x) + r(x) f (x) is convex, differentiable r(x) is convex but may be nondifferentiable General scheme: linearize f (x) and add a proximal term: x k+1 := arg min x R n r(x) + ( f (xk )) (x x k ) + 1 2τ x xk 2 2 Proximal Operator = arg min x R n τr(x) x (xk τ f (x k )) 2 = prox τr (x k τ f (x k )) prox r (y) := arg min x r(x) x y /56

10 10/56 Proximal mapping if g is convex and closed (has a closed epigraph), then exists and is unique for all y. prox r (y) := arg min x r(x) x y 2 2 from optimality conditions of minimization in the definition: x = prox r (y) y x r(x) r(z) r(x) + (y x) (z x), z subdifferential of a convex function is a monotone operator: (u v) (x y) 0 x, y, u f (x), v f (y) proof: combining the following two inequalities f (y) f (x) + u (y x), f (x) f (y) + v (x y)

11 11/56 Nonexpansiveness if u = prox r (x), v = prox r (y), then (u v) (x y) u v 2 prox r is is firmly nonexpansive, or co-coercive with constant 1 follows from last page and monotonicity of subgradients x u r(u), y v r(v) = (x u y + v) (u v) 0 implies (from Cauchy-Schwarz inequality) prox r (x) prox r (y) 2 x y 2

12 12/56 Global convergence The proximal mapping is nonexpansive Under certain assumptions, h(x) = x τ f (x) is nonexpansive: h(x) h(x ) x x, and r(x) = r(x ) whenever the equality holds. x is a fixed point if prox τr (h(x)) prox τr (h(x )) prox τr (h(x)) x = x x

13 Proximal gradient method for l 1 -minimization 13/56 Advantages Low computational cost (first-order method) Finite convergence of the nondegenerate support Global q-linear convergence Disadvantages Takes a lot of steps when the support is relatively large Slow convergence: degenerate support identification or magnitude recovery? Strategies for improving shrinkage Adjust λ dynamically (use λ k ) Line search x k+1 = x k + α k (shrink(x k λ k f k, µλ k ) x k ) Continuation: approximately solve for µ 0 > µ 1 > > µ p = µ (FPC) Debiasing or subspace optimization

14 Line search approach 14/56 Let x k (τ k ) = shrink(x k τ k f k, µτ k ). Then set x k+1 = x k + α k (x k (τ k ) x k ) = x k + α k d k Choosing τ k : Barzilai-Borwein method f k = A (Ax k b) s k 1 = x k x k 1 and y k 1 = f k f k 1 τ k,bb1 = (sk 1 ) s k 1 (s k 1 ) y k 1 or τ k,bb2 = (sk 1 ) y k 1 (y k 1 ) y k 1. Choosing α k : Armijo-like Line search ψ µ (x k + α k d k ) C k + σα k k FPC: k := ( f k ) d k FPC_AS: k := ( f k ) d k + µ x k (τ k ) 1 µ x k 1 C k = (ηq k 1 C k 1 + ψ µ (x k ))/Q k, Q k = ηq k 1 + 1, C 0 = ψ µ (x 0 ) and Q 0 = 1 (Zhang and Hager)

15 15/56 Outline: Complexity Analysis Amir Beck and Marc Teboulle, A fast iterative shrinkage thresholding algorithm for linear inverse problems Paul Tseng, On accelerated proximal gradient methods for convex-concave optimization Paul Tseng, Approximation accuracy, gradient methods and error bound for structured convex optimization N. S. Aybat and G. Iyengar, A first-order augmented Lagrangian method for compressed sensing

16 Proximal Gradient/ISTA/FPC The following terminologies refer to the same algorithm: proximal gradient method ISTA: iterative shrinkage thresholding algorithm FPC: fixed-point continuation method Consider ISTA/FPC computes min F(x) := µ x Ax b 2 2 x k+1 := arg min µ x 1 + ( f k ) (x x k ) + 1 2τ x x k 2 2 = shrink(x k τ f k, µτ) Complexity? How can we speed it up? 16/56

17 Generalization of ISTA Consider a solvable model min F(x) := f (x) + r(x) r(x) continuous convex, may be nonsmooth f (x) continuously diff. with Lipschitz continuous gradient f (x) f (y) 2 L f x y 2 The cost of the following problem is tractable p L(y) := arg min x = arg min x r(x) + L 2 Q L(x, y) := f (y) + f (y), x y + L 2 x y r(x) ( x y 1 ) 2 L f (y) = prox 1/L (y 1 f (y)) L 2 ISTA Basic iteration: x k = p L (x k 1 ) 17/56

18 Quadratic upper bound Suppose f is Lipschitz continuous with parameter L and domf is convex. Then f (y) f (x) + f (x) (y x) + L 2 y x 2 2, x, y. Proof: Let v = y x. Then f (y) = f (x) + f (x) v + f (x) + f (x) v + f (x) + f (x) v ( f (x + tv) f (x)) vdt f (x + tv) f (x) 2 v 2 dt Lt v 2 2dt = f (x) + f (x) v + L 2 y x 2 2 The above inequality also implies that F(p L (y)) Q(p L (y), y). 18/56

19 19/56 Lemma: Assume F(p L (y)) Q(p L (y), y), then for any x R n, F(x) F(p L (y)) L 2 p L(y) y 2 + L y x, p L (y) y = L 2 ( pl (y) x 2 2 x y 2 2) Proof: f (y) + L(p L (y) y) + z = 0 for some z g(p L (y)). The convexity of f and g gives f (x) f (y) + f (y) (x y) r(x) g(p L (y)) + z (x p L (y)). Using the definition of Q(p L (y), y) and z F(x) F(p L (y)) F(x) Q(p L (y), y) L 2 p L(y) y 2 + (x p L (y)) ( f (y) + z)

20 Complexity analysis of ISTA 20/56 Complexity results: F(x k ) F(x ) L f x 0 x 2 2 2k Let x = x, y = x j, then 2 L (F(x ) F(x j+1 )) x x j x x j 2 2 Summing over j := 0,, k 1 gives 2 k 1 L (kf(x ) F(x j+1 )) x x k 2 2 x x j=0 Let x = y = x j yields 2 L (F(x j) F(x j+1 )) x j x j Multiplying the last ineq. by j and summing over j := 0,, k 1 gives 2 L ( kf(x k) + k 1 j=0 F(x j+1)) k 1 j=0 j x j x j Therefore 2k L (F(x ) F(x k )) x x k k 1 j=0 j x j x j x x 0 2 2

21 FISTA: Accelerated proximal gradient 21/56 Given y 1 = x 0 and t 1 = 1, compute: Complexity results: x k = p L (y k ) t k+1 = tk 2 2 y k+1 = x k + t k 1 (x k x k 1 ) t k+1 F(x k ) F(x ) 2L f x 0 x 2 2 (k + 1) 2 Lipschitz constant L f is unknown? Choose L by backtracking so that F(p L (y k )) Q L (p L (y k ), y k )

22 22/56 Complexity analysis of FISTA Let v k := F(x k ) F(x ) and u k := t k x k (t k 1)x k 1 x, 2 ( t 2 L k v k tk+1v 2 ) k+1 uk u k 2 2 If a k a k+1 b k+1 b k, with a 1 + b 1 c, then a k c t k (k + 1)/2 Let a k = 2 L t2 k v k, b k = u k 2 2, and c = x 0 x 2 2. Check a 1 + b 1 c: F(x ) F(x 1 ) = F(x ) F(p L (y 1 )) L 2 p L(y 1 ) y L y 1 x, p L (y 1 ) y 1 = L 2 ( x 1 x 2 2 y 1 x 2 2)

23 Complexity analysis of FISTA 23/56 Let (x = x k, y = y k+1) and (x = x, y = y k+1). Note x k+1 = p Lk+1 (y k+1): 2L 1 (v k v k+1) x k+1 y k x k+1 y k+1, y k+1 x k (1) 2L 1 v k+1 x k+1 y k x k+1 y k+1, y k+1 x (2) ((1) (t k+1 1) + (2)) t k+1 and using t 2 k = t 2 k+1 t k+1: 2t k+1 L ((tk+1 1)vk tk+1vk+1) = 2 L (t2 kv k tk+1v 2 k+1) t k+1(x k+1 y k+1) t k+1 x k+1 y k+1, t k+1y k+1 (t k+1 1)x k x b a b a, a c = b c 2 2 a c 2 2 y k+1 = x k + t k 1 t k+1 (x k x k 1 ): 2 L (t2 kv k t 2 k+1v k+1) t k+1x k+1 (t k+1 1)x k x 2 2 t k+1y k+1 (t k+1 1)x k x 2 2 = u k u k 2 2

24 Proximal gradient method 24/56 Consider the model Linearize f (x): min F(x) := f (x) + r(x) l f (x, y) := f (y) + f (y), x y + r(x) Given a strictly convex function h(x), Bregman distance: D(x, y) := h(x) h(y) h(y), x y. For example, take h(x) = x 2 2. Then D(x, y) = x y 2 2. Proximal gradient method can also be written as x k+1 := arg min x l f (x, x k ) + L 2 D(x, xk )

25 APG Variant 1 25/56 Acclerated proximal gradient (APG): Set x 1 = x 0 and θ 1 = θ 0 = 1: y k = x k + θ k (θ 1 k 1 1)(x k x k 1 ) x k+1 = arg min l f (x, y k ) + L x 2 x y k 2 2 θ k+1 = θk 4 + 4θ2 k θ2 k 2 Question: what is the difference between θ k and t k?

26 APG Variant 2 26/56 Replace 1 2 x yk 2 2 by Bregman distance D(x, y k) Set x 0, z 0 and θ 0 = 1: y k = (1 θ k )x k + θ k z k z k+1 = arg min l f (x, y k ) + θ k LD(x, z k ) x x k+1 = (1 θ k )x k + θ k z k+1 θ k+1 = θk 4 + 4θ2 k θ2 k 2

27 APG Variant 3 27/56 Set x 0, z 0 := arg min h(x) and θ 0 = 1: y k = (1 θ k )x k + θ k z k z k+1 = k l f (x, y i ) arg min x θ i i=0 x k+1 = (1 θ k )x k + θ k z k+1 θ k+1 = θk 4 + 4θ2 k θ2 k 2 + Lh(x)

28 Complexity analysis 28/56 Proximal gradient method F(x k ) F(x) + 1 k LD(x, x0 ) APG1: APG2: APG3: F(x k ) F(x) + F(x k ) F(x) + F(x k ) F(x) + 4 (k + 1) 2 LD(x, x0 ) 4 (k + 1) 2 LD(x, z0 ) 4 (k + 1) 2 L(h(x) h(x 0))

29 Complexity analysis Let f (x) is convex and assume f (x) is Lipschitz cont. F(x) l f (x, y) F(x) L 2 x y 2 2. For any proper lsc convex function ψ(x), if z + = arg min x ψ(x) + D(x, z) and h(x) is diff. at z +, then ψ(x) + D(x, z) ψ(z + ) + D(x, z + ) + D(z +, z). Proof: The optimality at z + and definition of subgradient gives ψ(x) + x D(z +, z) (x z + ) ψ(z + ). Note x D(z +, z) = h(z + ) h(z). Rearranging terms yields ψ(x) h(z) (x z) ψ(z + ) h(z) (z + z) h(z + ) (x z + ). Adding h(x) h(z) to both sides. 29/56

30 30/56 Complexity analysis of APG2 F(x k+1 ) l f (x k+1, y k ) + L 2 x k+1 y k 2 2 = l f ((1 θ k )x k + θz k+1, y k ) + Lθ2 k 2 z k+1 z k 2 2 (1 θ k )l f (x k, y k ) + θ k l f (z k+1, y k ) + θ 2 kld(z k+1, z k ) (1 θ k )l f (x k, y k ) + θ k (l f (x, y k ) + θ k LD(x, z k ) θ k LD(x, z k+1 )) (1 θ k )F(x k ) + θ k F(x) + θ 2 kld(x, z k ) θ 2 kld(x, z k+1 ) The third inequality applies the inequality for l f (x, z) + θld(x, z). Let e k = F(x k ) F(x) and k = LD(x, z k ),then e k+1 (1 θ k )e k + θ 2 k k θ 2 k k+1

31 Complexity analysis of APG2 Divide both sides by θ 2 k and using 1 θ 2 k = 1 θ k+1 : θk θ k+1 θ 2 k+1 e k+1 + k+1 1 θ k θk 2 e k + k Hence 1 θ k+1 θ 2 k+1 e k+1 + k+1 1 θ 0 θ0 2 e Using 1 θ 2 k = 1 θ k+1 θ 2 k+1 and θ 0 = 1: 1 e k+1 0 k+1 0 = LD(x, z 0 ) θ 2 k 31/56

32 An application to Basis Pursuit problem 32/56 Consider min x 1 s.t. Ax = b Augmented Lagragian (Bregman) framework (b 1 = b, k = 1): x k := arg min x F k (x) := µ k x Ax b k 2 2 b k+1 := b + µ k+1 µ k (b k Ax k ) obtaining xk+1 exactly is difficult inexact solver: how do we control the accuracy? Analysis of Bregman approach (see wotao), no complexity Solving each xk+1 by the APG algorithms?

33 33/56 Outline: ADMM Alternating direction augmented Lagrangian methods Variable splitting method Convergence for problems with two blocks of variables

34 34/56 References Wotao Yin, Stanley Osher, Donald Goldfarb, Jerome Darbon, Bregman Iterative Algorithms for l1-minimization with Applications to Compressed Sensing Junfeng Yang, Yin Zhang, Alternating direction algorithms for l1-problems in Compressed Sensing Tom Goldstein, Stanely Osher, The Split Bregman Method for L1-Regularized Problems B.S. He, H. Yang, S.L. Wang, Alternating Direction Method with Self-Adaptive Penalty Parameters for Monotone Variational Inequalities

35 35/56 Basis pursuit problem Primal: min x 1, s.t. Ax = b Dual: max b λ, s.t. A λ 1 The dual problem is equivalent to max b λ, s.t. A λ = s, s 1.

36 36/56 Augmented Lagrangian (Bregman) framework Augmented Lagrangian function: L(λ, s, x) := b λ + x (A λ s) + 1 2µ A λ s 2 Algorithmic framework Compute λ k+1 and s k+1 at k-th iteration (DL) min λ,s L(λ, s, x k ), s.t. s 1 Update the Lagrangian multiplier: x k+1 = x k + A λ k+1 s k+1 µ Pros and Cons: Pros: rich theory, well understood and a lot of algorithms Cons: L(λ, s, x k ) is not separable in λ and s, and the subproblem (DL) is difficult to minimize

37 An alternating direction minimization scheme 37/56 ADMM Divide variables into different blocks according to their roles Minimize the augmented Lagrangian function with respect to one block at a time while all other blocks are fixed λ k+1 = arg min L(λ, s k, x k ) λ s k+1 = arg min L(λ k+1, s, x k ), s.t. s 1 s x k+1 = x k + A λ k+1 s k+1 µ

38 An alternating direction minimization scheme 38/56 Explicit solutions: λ k+1 = (AA ) 1 ( µ(ax k b) + As k) s k+1 = arg min s A λ k+1 µx k 2, s.t. s 1 = P [ 1,1] (A λ k+1 + µx k ) x k+1 = x k + A λ k+1 s k+1 µ

39 39/56 ADMM for BP-denoising Primal: min x 1, s.t. Ax b 2 σ which is equivalent to min x 1, s.t. Ax b + r = 0, r 2 σ Lagrangian function: L(x, r, λ) := x 1 λ (Ax b + r) + π( r 2 σ) = x 1 (A λ) x + π r 2 λ r + b λ πσ Hence, the dual problem is: max b λ πσ, s.t. A λ 1, λ 2 π

40 ADMM for BP-denoising The dual problem is equivalent to: max b λ πσ, s.t. A λ = s, s 1, λ 2 π Augmented Lagrangian function is: ADM scheme: L(λ, s, x) := b λ + πσ + x (A λ s) + 1 2µ A λ s 2 λ k+1 = arg min 1 2µ A λ s k 2 + (Ax k b) λ, s.t. λ 2 π k s k+1 = arg min s A λ k+1 µx k 2, s.t. s 1 = P [ 1,1] (A λ k+1 + µx k ) π k+1 = λ k+1 2 x k+1 = x k + A λ k+1 s k+1 µ 40/56

41 ADMM for l 1 -regularized problem 41/56 Primal: min µ x Ax b 2 2 which is equivalent to min µ x r 2 2, s.t. Ax b = r. Lagrangian function: L(x, r, λ) := µ x r 2 2 λ (Ax b r) = µ x 1 (A λ) x r λ r + b λ Hence, the dual problem is: max b λ 1 2 λ 2, s.t. A λ µ

42 ADMM for l 1 -regularized problem 42/56 The dual problem is equivalent to max b λ 1 2 λ 2, s.t. A λ = s, s µ. Augmented Lagrangian function is: L(λ, s, x) := b λ λ 2 + x (A λ s) + 1 2µ A λ s 2 ADM scheme: λ k+1 = (AA + µi) 1 ( µ(ax k b) + As k) s k+1 = arg min s A λ k+1 µx k 2, s.t. s µ = P [ µ,µ] (A λ k+1 + µx k ) x k+1 = x k + A λ k+1 s k+1 µ

43 YALL1 43/56 Derive ADMM for the following problems: BP: min x C n Wx w,1, s.t. Ax = b L1/L1: min x C n Wx w,1 + 1 ν Ax b 1 L1/L2: min x C n Wx w, ρ Ax b 2 2 BP+: min x R n x w,1, s.t. Ax = b, x 0 L1/L1+: min x R n x w,1 + 1 ν Ax b 1, s.t. x 0 L1/L2+: min x R n x w, ρ Ax b 2 2, s.t. x 0 ν, ρ 0, A C m n, b C m,x C n for the first three and x R n for the last three, W C n n is an unitary matrix serving as a sparsifying basis, and x w,1 := n i=1 w i x i.

44 Variable splitting 44/56 Given A R m n, consider min f (x) + g(ax), which is Augmented Lagrangian function: min f (x) + g(y), s.t. Ax = y L(x, y, λ) = f (x) + g(y) λ (Ax y) + 1 2µ Ax y 2 2 ADM (P x ) : x k+1 := arg min x X (P y ) : y k+1 := arg min y Y L(x, y k, λ k ), L(x k+1, y, λ k ), (P λ ) : λ k+1 := λ k γ Axk+1 y k+1 µ

45 Variable splitting 45/56 split Bregman (Goldstein and Osher) for anisotropic TV: min α Du 1 + β Ψu Au f 2 2 Introduce y = Du and w = Ψu, obtain min α y 1 + β w Au f 2 2, s.t. y = Du, w = Ψu Augmented Lagrangian function: L := α y 1 + β w Au f 2 2 p (Du y) + 1 2µ Du y 2 2 q (Ψu w) + 1 2µ Ψu w 2 2

46 Variable splitting 46/56 The variable u can be otained by ( A A + 1 ) µ (D D + I) u = A f + 1 µ (D y + Ψ w) + D p + Ψ q If A and D are diagonalizable by FFT, then the computational cost is very cheap. For example, A = RF, both R and D are circulant matrices. Variables y and w: y := shrink(du µp, αµ) w := shrink(ψu µq, αµ) apply a few iterations before updating the Lagrangian multipliers p and q Exercise: isotropic TV min α Du 2 + β Ψu Au f 2 2

47 FTVd: Fast TV deconvolution 47/56 Wang-Yang-Yin-Zhang consider: min u Di u µ Ku f 2 2 Introducing w and quadratic penalty: min u,w Alternating minimization: ( w i β w i D i u 2 2 ) + 1 2µ Ku f 2 2 For fixed u, {w i } can be solved by shrinkage at O(N) For fixed {w i }, u can be solved by FFT at O(N log N)

48 48/56 Outline: Linearized ADMM Linearized Bregman and Bregmanized operator splitting ADM + proximal point method Xiaoqun Zhang, Martin Burgerz, Stanley Osher, A unified primal-dual algorithm framework based on Bregman iteration

49 Review of Bregman method 49/56 Consider BP: Bregman method: min x 1, s.t. Ax = b D pk J (x, xk ) := x 1 x k 1 p k, x x k x k+1 := arg min x µd pk J (x, xk ) Ax b 2 2 p k+1 = p k + 1 µ A (b Ax k+1 ) Augmented Lagrangian (updating multiplier or b): x k+1 := arg min x µ x Ax bk 2 2 b k+1 = b + (b k Ax k+1 ) They are equivalent, see Yin-Osher-Goldfarb-Darbon

50 Linearized approaches 50/56 Linearized Bregman method: x k+1 := arg min µd pk J (x, xk ) + (A (Ax k b)) (x x k ) + 1 2δ x xk 2 2, p k+1 := p k 1 µδ (xk+1 x k ) 1 µ A (Ax k b), which is equivalent to x k+1 := arg min µ x δ x vk 2 2 v k+1 := v k δa (Ax k+1 b) Bregmanized operator splitting: x k+1 := arg min µ x 1 + (A (Ax k b k )) (x x k ) + 1 2δ x xk 2 2 b k+1 = b + (b k Ax k+1 ) Are they equivalent?

51 50/56 Linearized approaches Linearized Bregman method: x k+1 := arg min µd pk J (x, xk ) + (A (Ax k b)) (x x k ) + 1 2δ x xk 2 2, p k+1 := p k 1 µδ (xk+1 x k ) 1 µ A (Ax k b), which is equivalent to x k+1 := shrink(v k, µδ) v k+1 := v k δa (Ax k+1 b) Bregmanized operator splitting: or x k+1 := shrink(δa b k, µδ) b k+1 := b + (b k Ax k+1 ) x k+1 := shrink(x k δ(a (Ax k b k )), µδ) = shrink(δa b k + x k δa Ax k, µδ) b k+1 = b + (b k Ax k+1 )

52 51/56 Linearized approaches Linearized Bregman: If the sequence x k converges and p k is bounded, then the limit of x k is the unique solution of min µ x δ x 2 2 s.t. Ax = b. For µ large enough, the limit solution solves BP. Exact regularization if δ > δ What about Bregmanized operator splitting?

53 Primal ADM for l 1 -regularized problem 52/56 Primal: min µ x Ax b 2 2 which is equivalent to Augmented Lagrangian function: min µ x r 2 2, s.t. Ax b = r. L(x, r, λ) = µ x r 2 2 λ (Ax b r) + 1 2δ Ax b r 2 2 ADM scheme: x k+1 = arg min x µ x δ Ax b rk δλ k 2 2 original problem r k+1 1 = arg min r 2 r δ Axk+1 b r δλ k 2 2 λ k+1 = λ k + Axk+1 b r k+1 δ

54 Primal ADM for l 1 -regularized problem 52/56 Primal: min µ x Ax b 2 2 which is equivalent to Augmented Lagrangian function: min µ x r 2 2, s.t. Ax b = r. L(x, r, λ) = µ x r 2 2 λ (Ax b r) + 1 2δ Ax b r 2 2 ADM scheme: x k+1 = arg min x µ x 1 + (g k ) (x x k ) + 1 2τ x xk 2 2 r k+1 1 = arg min r 2 r δ Axk+1 b r δλ k 2 2 λ k+1 = λ k + Axk+1 b r k+1 Convergence of the linearized scheme? δ

55 53/56 Generalized algorithm Consider Proximal-point method: x k+1 := arg min x min f (x), s.t. Ax = b f (x) λ k, Ax b + 1 2δ Ax b x xk 2 Q Cλ k+1 := Cλ k + (b Ax k+1 ) Augmented Lagrangian or Bregman method if Q = 0 and C = δ Proximal point method by Rockafellar if Q = I and C = γ Bregmanized operator splitting if Q = 1 δ (I A A) Theoretical results: lim k Ax k b 2 = 0, lim k f (x k ) = f ( x) and all limit point of (x k, λ k ) are saddle points.

56 Convergence proof 54/56 From the x-subproblem: f (x k+1 ) s k+1 := A λ k 1 δ A (Ax k+1 b) Q(x k+1 x k ) Assume ( x, λ) is a saddle point, then A x = b, and s A λ = 0. Let s k+1 e = s k+1 s, xe k+1 = x k+1 x and λ k+1 e = λ k+1 λ: s k+1 e + 1 δ A Axe k+1 + Qxe k+1 = Qxe k + A λ k e, Cλ k+1 e = Cλ k e Axe k+1 Taking the inner product with xe k+1 ( 1 2 x k+1 e on both sides of the first equality: ) 2 Q + x k+1 x k 2 Q xe k 2 Q = s k+1 e, xe k+1 1 δ Axk+1 e Taking the inner product with λ k+1 e on both sides of the second equality: 1 ( ) λ k+1 e 2 C λ k e 2 C Axe k+1 2 C 2 1 = λ k e, Axe k+1 λ k e, Ax k+1 e Adding the above inequality (w k = (x k, λ k )): w k+1 e 2 Q + x k+1 x k 2 Q + 2 s k+1 e, xe k δ Axk+1 e 2 2 Axe k+1 2 C 1 = wk e 2 Q

57 Convergence proof 55/56 The convexity of f (x) gives s k+1 e, xe k+1 0. If 2 δ > 1 λ C m, then 2 s k+1 e, xe k δ Axk+1 e 2 2 Axe k+1 2 C 1 0 Hence, w k+1 e 2 Q w k e 2 Q, which implies the boundedness of w k = (x k, λ k ). Furthermore, { ( )} 2 s k+1 e, xe k+1 + x k+1 x k 2 2 Q + δ Axk+1 e 2 2 Axe k+1 2 C 1 k w 0 e 2 Q < Hence, 0 = lim k Axe k = lim k Ax k b, and lim k s k A λ k = 0 follows from s k+1 := A λ k 1 δ A (Ax k+1 b) Q(x k+1 x k )

58 Generalized algorithm 56/56 Consider Proximal-point method: x k+1 := arg min x z k+1 := arg min z min f (x) + g(z), s.t. Bx = z f (x) λ k, Bx g(z) λ k, z Cλ k+1 := Cλ k + (Bx k+1 z k+1 ) ADM if Q = M = 0 and C = δ + 1 2δ Bx zk x xk 2 Q + 1 2δ Bxk+1 z z zk 2 M Proximal point method by Rockafellar if Q = I and C = γ Bregmanized operator splitting if Q = 1 δ (I A A) Theoretical results: lim k Bx k z k 2 = 0, lim k f (x k ) = f ( x), lim k g(z k ) = g( z) and all limit point of (x k, z k, λ k ) are saddle points.

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem. ACCELERATED LINEARIZED BREGMAN METHOD BO HUANG, SHIQIAN MA, AND DONALD GOLDFARB June 21, 2011 Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (A) method for solving the

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Lecture: Duality of LP, SOCP and SDP

Lecture: Duality of LP, SOCP and SDP 1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient (PDHG) algorithm proposed

More information

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration

A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration J Sci Comput (2011) 46: 20 46 DOI 10.1007/s10915-010-9408-8 A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration Xiaoqun Zhang Martin Burger Stanley Osher Received: 23 November 2009 / Revised:

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Tight Rates and Equivalence Results of Operator Splitting Schemes

Tight Rates and Equivalence Results of Operator Splitting Schemes Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1 EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Math 273a: Optimization Convex Conjugacy

Math 273a: Optimization Convex Conjugacy Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper

More information

Low Complexity Regularization 1 / 33

Low Complexity Regularization 1 / 33 Low Complexity Regularization 1 / 33 Low-dimensional signal models Information level: pixels large wavelet coefficients (blue = 0) sparse signals low-rank matrices nonlinear models Sparse representations

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Operator Splitting for Parallel and Distributed Optimization

Operator Splitting for Parallel and Distributed Optimization Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

ANALYSIS AND GENERALIZATIONS OF THE LINEARIZED BREGMAN METHOD

ANALYSIS AND GENERALIZATIONS OF THE LINEARIZED BREGMAN METHOD ANALYSIS AND GENERALIZATIONS OF THE LINEARIZED BREGMAN METHOD WOTAO YIN Abstract. This paper analyzes and improves the linearized Bregman method for solving the basis pursuit and related sparse optimization

More information

FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING

FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING KATYA SCHEINBERG, DONALD GOLDFARB, AND XI BAI Abstract. We propose new versions of accelerated first order methods for convex

More information

Dual Decomposition.

Dual Decomposition. 1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:

More information

Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =

More information

Compressive Sensing Theory and L1-Related Optimization Algorithms

Compressive Sensing Theory and L1-Related Optimization Algorithms Compressive Sensing Theory and L1-Related Optimization Algorithms Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, USA CAAM Colloquium January 26, 2009 Outline:

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem: CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization

Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization Joint work with Bo Huang, Shiqian Ma, Tony Qin, Katya Scheinberg, Zaiwen Wen and Wotao Yin Department of IEOR, Columbia University

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences

More information

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008 Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

Image restoration: numerical optimisation

Image restoration: numerical optimisation Image restoration: numerical optimisation Short and partial presentation Jean-François Giovannelli Groupe Signal Image Laboratoire de l Intégration du Matériau au Système Univ. Bordeaux CNRS BINP / 6 Context

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Lecture: Smoothing.

Lecture: Smoothing. Lecture: Smoothing http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Smoothing 2/26 introduction smoothing via conjugate

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State

More information

Gradient Sliding for Composite Optimization

Gradient Sliding for Composite Optimization Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this

More information

Lecture 6: Conic Optimization September 8

Lecture 6: Conic Optimization September 8 IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging

Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging 1/38 Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging Xiaoqun Zhang Department of Mathematics and Institute of Natural Sciences Shanghai Jiao Tong

More information

Enhanced Compressive Sensing and More

Enhanced Compressive Sensing and More Enhanced Compressive Sensing and More Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Nonlinear Approximation Techniques Using L1 Texas A & M University

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Convergence of Fixed-Point Iterations

Convergence of Fixed-Point Iterations Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and

More information

Lecture: Duality.

Lecture: Duality. Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong

More information

PDE-Constrained and Nonsmooth Optimization

PDE-Constrained and Nonsmooth Optimization Frank E. Curtis October 1, 2009 Outline PDE-Constrained Optimization Introduction Newton s method Inexactness Results Summary and future work Nonsmooth Optimization Sequential quadratic programming (SQP)

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE Convex Analysis Notes Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE These are notes from ORIE 6328, Convex Analysis, as taught by Prof. Adrian Lewis at Cornell University in the

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSE SIGNAL RESTORATION. 1. Introduction SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Optimization Algorithms for Compressed Sensing

Optimization Algorithms for Compressed Sensing Optimization Algorithms for Compressed Sensing Stephen Wright University of Wisconsin-Madison SIAM Gator Student Conference, Gainesville, March 2009 Stephen Wright (UW-Madison) Optimization and Compressed

More information

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ???? DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0

More information

A first-order primal-dual algorithm with linesearch

A first-order primal-dual algorithm with linesearch A first-order primal-dual algorithm with linesearch Yura Malitsky Thomas Pock arxiv:608.08883v2 [math.oc] 23 Mar 208 Abstract The paper proposes a linesearch for a primal-dual method. Each iteration of

More information

Sequential Unconstrained Minimization: A Survey

Sequential Unconstrained Minimization: A Survey Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 XI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 Alternating direction methods of multipliers for separable convex programming Bingsheng He Department of Mathematics

More information

Sparse Regularization via Convex Analysis

Sparse Regularization via Convex Analysis Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 20 Subgradients Assumptions

More information

Applications of Linear Programming

Applications of Linear Programming Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal

More information

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Lecture 18: Optimization Programming

Lecture 18: Optimization Programming Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Math 273a: Optimization Lagrange Duality

Math 273a: Optimization Lagrange Duality Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler assume function f is proper

More information

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities

More information