Lecture: Algorithms for Compressed Sensing
|
|
- Jasper Burns
- 6 years ago
- Views:
Transcription
1 1/56 Lecture: Algorithms for Compressed Sensing Zaiwen Wen Beijing International Center For Mathematical Research Peking University Acknowledgement: part of this slides is based on Prof. Lieven Vandenberghe and Prof. Wotao Yin s lecture notes
2 Outline 2/56 Proximal gradient method Complexity analysis of Proximal gradient method Alternating direction methods of Multipliers (ADMM) Linearized Alternating direction methods of Multipliers Check lecture notes at http: //bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html
3 3/56 l 1 -regularized least square problem Consider min ψ µ (x) := µ x Ax b 2 2 Approaches: Interior point method: l1_ls Spectral gradient method: GPSR Fixed-point continuation method: FPC Active set method: FPC_AS Alternating direction augmented Lagrangian method Nesterov s optimal first-order method many others
4 Subgradient recall basic inequality for convex differentiable f : f (y) f (x) + f (x) (y x). g is a subgradient of a convex function f at x domf if f (y) f (x) + g (y x), y domf. g 2, g 3 are subgradients at x 2, g 1 is a subgradient at x 1. 4/56
5 Optimality conditions unconstrained 5/56 x minimizes f (x) if and only 0 f (x ) Proof: by definition f (y) f (x ) + 0 (y x ) for all y 0 f (x ).
6 6/56 Optimality conditions constrained min f 0 (x) s.t. f i (x) 0, i = 1,..., m. From Lagrange duality: if strong duality holds, then x, λ are primal, dual optimal if and only if x is primal feasible λ 0 complementary: λ i f i(x ) = 0 for i = 1,..., m x is a minimizer of min L(x, λ ) = f 0 (x) + i λ i f i(x), i.e., 0 x L(x, λ ) = f 0 (x ) + i λ i f i (x )
7 Proximal Gradient Method Let f (x) = 1 2 Ax b 2 2. The gradient f (x) = A (Ax b). Consider min ψ µ (x) := µ x 1 + f (x). First-order approximation + proximal term: x k+1 := arg min x R n µ x 1 + ( f (x k )) (x x k ) + 1 2τ x xk 2 2 = shrink(x k τ f (x k ), µτ) Forward-backward operator splitting: x k+1 := (I + τt 1 ) 1 (I τt 2 )x k 0 φ(x) := µ x 1 + f (x) = T 1 (x) + T 2 (x) = T(x) 0 (x + τt 1 (x)) (x τt 2 (x)) x = (I + τt 1 ) 1 (I τt 2 )x gradient step: bring in candidates for nonzero components shrinkage step: eliminate some of them by soft thresholding 7/56
8 Shrinkage (soft thresholding) 8/56 shrink(y, ν) : = arg min x R ν x x y 2 2 Chambolle, Devore, Lee and Lucier Figueirdo, Nowak and Wright Elad, Matalon and Zibulevsky Hales, Yin and Zhang Darbon, Osher Many others = sgn(y) max( y ν, 0) { y νsgn(y), if y > ν = 0, otherwise
9 Proximal gradient method For General Problems Consider the model min F(x) := f (x) + r(x) f (x) is convex, differentiable r(x) is convex but may be nondifferentiable General scheme: linearize f (x) and add a proximal term: x k+1 := arg min x R n r(x) + ( f (xk )) (x x k ) + 1 2τ x xk 2 2 Proximal Operator = arg min x R n τr(x) x (xk τ f (x k )) 2 = prox τr (x k τ f (x k )) prox r (y) := arg min x r(x) x y /56
10 10/56 Proximal mapping if g is convex and closed (has a closed epigraph), then exists and is unique for all y. prox r (y) := arg min x r(x) x y 2 2 from optimality conditions of minimization in the definition: x = prox r (y) y x r(x) r(z) r(x) + (y x) (z x), z subdifferential of a convex function is a monotone operator: (u v) (x y) 0 x, y, u f (x), v f (y) proof: combining the following two inequalities f (y) f (x) + u (y x), f (x) f (y) + v (x y)
11 11/56 Nonexpansiveness if u = prox r (x), v = prox r (y), then (u v) (x y) u v 2 prox r is is firmly nonexpansive, or co-coercive with constant 1 follows from last page and monotonicity of subgradients x u r(u), y v r(v) = (x u y + v) (u v) 0 implies (from Cauchy-Schwarz inequality) prox r (x) prox r (y) 2 x y 2
12 12/56 Global convergence The proximal mapping is nonexpansive Under certain assumptions, h(x) = x τ f (x) is nonexpansive: h(x) h(x ) x x, and r(x) = r(x ) whenever the equality holds. x is a fixed point if prox τr (h(x)) prox τr (h(x )) prox τr (h(x)) x = x x
13 Proximal gradient method for l 1 -minimization 13/56 Advantages Low computational cost (first-order method) Finite convergence of the nondegenerate support Global q-linear convergence Disadvantages Takes a lot of steps when the support is relatively large Slow convergence: degenerate support identification or magnitude recovery? Strategies for improving shrinkage Adjust λ dynamically (use λ k ) Line search x k+1 = x k + α k (shrink(x k λ k f k, µλ k ) x k ) Continuation: approximately solve for µ 0 > µ 1 > > µ p = µ (FPC) Debiasing or subspace optimization
14 Line search approach 14/56 Let x k (τ k ) = shrink(x k τ k f k, µτ k ). Then set x k+1 = x k + α k (x k (τ k ) x k ) = x k + α k d k Choosing τ k : Barzilai-Borwein method f k = A (Ax k b) s k 1 = x k x k 1 and y k 1 = f k f k 1 τ k,bb1 = (sk 1 ) s k 1 (s k 1 ) y k 1 or τ k,bb2 = (sk 1 ) y k 1 (y k 1 ) y k 1. Choosing α k : Armijo-like Line search ψ µ (x k + α k d k ) C k + σα k k FPC: k := ( f k ) d k FPC_AS: k := ( f k ) d k + µ x k (τ k ) 1 µ x k 1 C k = (ηq k 1 C k 1 + ψ µ (x k ))/Q k, Q k = ηq k 1 + 1, C 0 = ψ µ (x 0 ) and Q 0 = 1 (Zhang and Hager)
15 15/56 Outline: Complexity Analysis Amir Beck and Marc Teboulle, A fast iterative shrinkage thresholding algorithm for linear inverse problems Paul Tseng, On accelerated proximal gradient methods for convex-concave optimization Paul Tseng, Approximation accuracy, gradient methods and error bound for structured convex optimization N. S. Aybat and G. Iyengar, A first-order augmented Lagrangian method for compressed sensing
16 Proximal Gradient/ISTA/FPC The following terminologies refer to the same algorithm: proximal gradient method ISTA: iterative shrinkage thresholding algorithm FPC: fixed-point continuation method Consider ISTA/FPC computes min F(x) := µ x Ax b 2 2 x k+1 := arg min µ x 1 + ( f k ) (x x k ) + 1 2τ x x k 2 2 = shrink(x k τ f k, µτ) Complexity? How can we speed it up? 16/56
17 Generalization of ISTA Consider a solvable model min F(x) := f (x) + r(x) r(x) continuous convex, may be nonsmooth f (x) continuously diff. with Lipschitz continuous gradient f (x) f (y) 2 L f x y 2 The cost of the following problem is tractable p L(y) := arg min x = arg min x r(x) + L 2 Q L(x, y) := f (y) + f (y), x y + L 2 x y r(x) ( x y 1 ) 2 L f (y) = prox 1/L (y 1 f (y)) L 2 ISTA Basic iteration: x k = p L (x k 1 ) 17/56
18 Quadratic upper bound Suppose f is Lipschitz continuous with parameter L and domf is convex. Then f (y) f (x) + f (x) (y x) + L 2 y x 2 2, x, y. Proof: Let v = y x. Then f (y) = f (x) + f (x) v + f (x) + f (x) v + f (x) + f (x) v ( f (x + tv) f (x)) vdt f (x + tv) f (x) 2 v 2 dt Lt v 2 2dt = f (x) + f (x) v + L 2 y x 2 2 The above inequality also implies that F(p L (y)) Q(p L (y), y). 18/56
19 19/56 Lemma: Assume F(p L (y)) Q(p L (y), y), then for any x R n, F(x) F(p L (y)) L 2 p L(y) y 2 + L y x, p L (y) y = L 2 ( pl (y) x 2 2 x y 2 2) Proof: f (y) + L(p L (y) y) + z = 0 for some z g(p L (y)). The convexity of f and g gives f (x) f (y) + f (y) (x y) r(x) g(p L (y)) + z (x p L (y)). Using the definition of Q(p L (y), y) and z F(x) F(p L (y)) F(x) Q(p L (y), y) L 2 p L(y) y 2 + (x p L (y)) ( f (y) + z)
20 Complexity analysis of ISTA 20/56 Complexity results: F(x k ) F(x ) L f x 0 x 2 2 2k Let x = x, y = x j, then 2 L (F(x ) F(x j+1 )) x x j x x j 2 2 Summing over j := 0,, k 1 gives 2 k 1 L (kf(x ) F(x j+1 )) x x k 2 2 x x j=0 Let x = y = x j yields 2 L (F(x j) F(x j+1 )) x j x j Multiplying the last ineq. by j and summing over j := 0,, k 1 gives 2 L ( kf(x k) + k 1 j=0 F(x j+1)) k 1 j=0 j x j x j Therefore 2k L (F(x ) F(x k )) x x k k 1 j=0 j x j x j x x 0 2 2
21 FISTA: Accelerated proximal gradient 21/56 Given y 1 = x 0 and t 1 = 1, compute: Complexity results: x k = p L (y k ) t k+1 = tk 2 2 y k+1 = x k + t k 1 (x k x k 1 ) t k+1 F(x k ) F(x ) 2L f x 0 x 2 2 (k + 1) 2 Lipschitz constant L f is unknown? Choose L by backtracking so that F(p L (y k )) Q L (p L (y k ), y k )
22 22/56 Complexity analysis of FISTA Let v k := F(x k ) F(x ) and u k := t k x k (t k 1)x k 1 x, 2 ( t 2 L k v k tk+1v 2 ) k+1 uk u k 2 2 If a k a k+1 b k+1 b k, with a 1 + b 1 c, then a k c t k (k + 1)/2 Let a k = 2 L t2 k v k, b k = u k 2 2, and c = x 0 x 2 2. Check a 1 + b 1 c: F(x ) F(x 1 ) = F(x ) F(p L (y 1 )) L 2 p L(y 1 ) y L y 1 x, p L (y 1 ) y 1 = L 2 ( x 1 x 2 2 y 1 x 2 2)
23 Complexity analysis of FISTA 23/56 Let (x = x k, y = y k+1) and (x = x, y = y k+1). Note x k+1 = p Lk+1 (y k+1): 2L 1 (v k v k+1) x k+1 y k x k+1 y k+1, y k+1 x k (1) 2L 1 v k+1 x k+1 y k x k+1 y k+1, y k+1 x (2) ((1) (t k+1 1) + (2)) t k+1 and using t 2 k = t 2 k+1 t k+1: 2t k+1 L ((tk+1 1)vk tk+1vk+1) = 2 L (t2 kv k tk+1v 2 k+1) t k+1(x k+1 y k+1) t k+1 x k+1 y k+1, t k+1y k+1 (t k+1 1)x k x b a b a, a c = b c 2 2 a c 2 2 y k+1 = x k + t k 1 t k+1 (x k x k 1 ): 2 L (t2 kv k t 2 k+1v k+1) t k+1x k+1 (t k+1 1)x k x 2 2 t k+1y k+1 (t k+1 1)x k x 2 2 = u k u k 2 2
24 Proximal gradient method 24/56 Consider the model Linearize f (x): min F(x) := f (x) + r(x) l f (x, y) := f (y) + f (y), x y + r(x) Given a strictly convex function h(x), Bregman distance: D(x, y) := h(x) h(y) h(y), x y. For example, take h(x) = x 2 2. Then D(x, y) = x y 2 2. Proximal gradient method can also be written as x k+1 := arg min x l f (x, x k ) + L 2 D(x, xk )
25 APG Variant 1 25/56 Acclerated proximal gradient (APG): Set x 1 = x 0 and θ 1 = θ 0 = 1: y k = x k + θ k (θ 1 k 1 1)(x k x k 1 ) x k+1 = arg min l f (x, y k ) + L x 2 x y k 2 2 θ k+1 = θk 4 + 4θ2 k θ2 k 2 Question: what is the difference between θ k and t k?
26 APG Variant 2 26/56 Replace 1 2 x yk 2 2 by Bregman distance D(x, y k) Set x 0, z 0 and θ 0 = 1: y k = (1 θ k )x k + θ k z k z k+1 = arg min l f (x, y k ) + θ k LD(x, z k ) x x k+1 = (1 θ k )x k + θ k z k+1 θ k+1 = θk 4 + 4θ2 k θ2 k 2
27 APG Variant 3 27/56 Set x 0, z 0 := arg min h(x) and θ 0 = 1: y k = (1 θ k )x k + θ k z k z k+1 = k l f (x, y i ) arg min x θ i i=0 x k+1 = (1 θ k )x k + θ k z k+1 θ k+1 = θk 4 + 4θ2 k θ2 k 2 + Lh(x)
28 Complexity analysis 28/56 Proximal gradient method F(x k ) F(x) + 1 k LD(x, x0 ) APG1: APG2: APG3: F(x k ) F(x) + F(x k ) F(x) + F(x k ) F(x) + 4 (k + 1) 2 LD(x, x0 ) 4 (k + 1) 2 LD(x, z0 ) 4 (k + 1) 2 L(h(x) h(x 0))
29 Complexity analysis Let f (x) is convex and assume f (x) is Lipschitz cont. F(x) l f (x, y) F(x) L 2 x y 2 2. For any proper lsc convex function ψ(x), if z + = arg min x ψ(x) + D(x, z) and h(x) is diff. at z +, then ψ(x) + D(x, z) ψ(z + ) + D(x, z + ) + D(z +, z). Proof: The optimality at z + and definition of subgradient gives ψ(x) + x D(z +, z) (x z + ) ψ(z + ). Note x D(z +, z) = h(z + ) h(z). Rearranging terms yields ψ(x) h(z) (x z) ψ(z + ) h(z) (z + z) h(z + ) (x z + ). Adding h(x) h(z) to both sides. 29/56
30 30/56 Complexity analysis of APG2 F(x k+1 ) l f (x k+1, y k ) + L 2 x k+1 y k 2 2 = l f ((1 θ k )x k + θz k+1, y k ) + Lθ2 k 2 z k+1 z k 2 2 (1 θ k )l f (x k, y k ) + θ k l f (z k+1, y k ) + θ 2 kld(z k+1, z k ) (1 θ k )l f (x k, y k ) + θ k (l f (x, y k ) + θ k LD(x, z k ) θ k LD(x, z k+1 )) (1 θ k )F(x k ) + θ k F(x) + θ 2 kld(x, z k ) θ 2 kld(x, z k+1 ) The third inequality applies the inequality for l f (x, z) + θld(x, z). Let e k = F(x k ) F(x) and k = LD(x, z k ),then e k+1 (1 θ k )e k + θ 2 k k θ 2 k k+1
31 Complexity analysis of APG2 Divide both sides by θ 2 k and using 1 θ 2 k = 1 θ k+1 : θk θ k+1 θ 2 k+1 e k+1 + k+1 1 θ k θk 2 e k + k Hence 1 θ k+1 θ 2 k+1 e k+1 + k+1 1 θ 0 θ0 2 e Using 1 θ 2 k = 1 θ k+1 θ 2 k+1 and θ 0 = 1: 1 e k+1 0 k+1 0 = LD(x, z 0 ) θ 2 k 31/56
32 An application to Basis Pursuit problem 32/56 Consider min x 1 s.t. Ax = b Augmented Lagragian (Bregman) framework (b 1 = b, k = 1): x k := arg min x F k (x) := µ k x Ax b k 2 2 b k+1 := b + µ k+1 µ k (b k Ax k ) obtaining xk+1 exactly is difficult inexact solver: how do we control the accuracy? Analysis of Bregman approach (see wotao), no complexity Solving each xk+1 by the APG algorithms?
33 33/56 Outline: ADMM Alternating direction augmented Lagrangian methods Variable splitting method Convergence for problems with two blocks of variables
34 34/56 References Wotao Yin, Stanley Osher, Donald Goldfarb, Jerome Darbon, Bregman Iterative Algorithms for l1-minimization with Applications to Compressed Sensing Junfeng Yang, Yin Zhang, Alternating direction algorithms for l1-problems in Compressed Sensing Tom Goldstein, Stanely Osher, The Split Bregman Method for L1-Regularized Problems B.S. He, H. Yang, S.L. Wang, Alternating Direction Method with Self-Adaptive Penalty Parameters for Monotone Variational Inequalities
35 35/56 Basis pursuit problem Primal: min x 1, s.t. Ax = b Dual: max b λ, s.t. A λ 1 The dual problem is equivalent to max b λ, s.t. A λ = s, s 1.
36 36/56 Augmented Lagrangian (Bregman) framework Augmented Lagrangian function: L(λ, s, x) := b λ + x (A λ s) + 1 2µ A λ s 2 Algorithmic framework Compute λ k+1 and s k+1 at k-th iteration (DL) min λ,s L(λ, s, x k ), s.t. s 1 Update the Lagrangian multiplier: x k+1 = x k + A λ k+1 s k+1 µ Pros and Cons: Pros: rich theory, well understood and a lot of algorithms Cons: L(λ, s, x k ) is not separable in λ and s, and the subproblem (DL) is difficult to minimize
37 An alternating direction minimization scheme 37/56 ADMM Divide variables into different blocks according to their roles Minimize the augmented Lagrangian function with respect to one block at a time while all other blocks are fixed λ k+1 = arg min L(λ, s k, x k ) λ s k+1 = arg min L(λ k+1, s, x k ), s.t. s 1 s x k+1 = x k + A λ k+1 s k+1 µ
38 An alternating direction minimization scheme 38/56 Explicit solutions: λ k+1 = (AA ) 1 ( µ(ax k b) + As k) s k+1 = arg min s A λ k+1 µx k 2, s.t. s 1 = P [ 1,1] (A λ k+1 + µx k ) x k+1 = x k + A λ k+1 s k+1 µ
39 39/56 ADMM for BP-denoising Primal: min x 1, s.t. Ax b 2 σ which is equivalent to min x 1, s.t. Ax b + r = 0, r 2 σ Lagrangian function: L(x, r, λ) := x 1 λ (Ax b + r) + π( r 2 σ) = x 1 (A λ) x + π r 2 λ r + b λ πσ Hence, the dual problem is: max b λ πσ, s.t. A λ 1, λ 2 π
40 ADMM for BP-denoising The dual problem is equivalent to: max b λ πσ, s.t. A λ = s, s 1, λ 2 π Augmented Lagrangian function is: ADM scheme: L(λ, s, x) := b λ + πσ + x (A λ s) + 1 2µ A λ s 2 λ k+1 = arg min 1 2µ A λ s k 2 + (Ax k b) λ, s.t. λ 2 π k s k+1 = arg min s A λ k+1 µx k 2, s.t. s 1 = P [ 1,1] (A λ k+1 + µx k ) π k+1 = λ k+1 2 x k+1 = x k + A λ k+1 s k+1 µ 40/56
41 ADMM for l 1 -regularized problem 41/56 Primal: min µ x Ax b 2 2 which is equivalent to min µ x r 2 2, s.t. Ax b = r. Lagrangian function: L(x, r, λ) := µ x r 2 2 λ (Ax b r) = µ x 1 (A λ) x r λ r + b λ Hence, the dual problem is: max b λ 1 2 λ 2, s.t. A λ µ
42 ADMM for l 1 -regularized problem 42/56 The dual problem is equivalent to max b λ 1 2 λ 2, s.t. A λ = s, s µ. Augmented Lagrangian function is: L(λ, s, x) := b λ λ 2 + x (A λ s) + 1 2µ A λ s 2 ADM scheme: λ k+1 = (AA + µi) 1 ( µ(ax k b) + As k) s k+1 = arg min s A λ k+1 µx k 2, s.t. s µ = P [ µ,µ] (A λ k+1 + µx k ) x k+1 = x k + A λ k+1 s k+1 µ
43 YALL1 43/56 Derive ADMM for the following problems: BP: min x C n Wx w,1, s.t. Ax = b L1/L1: min x C n Wx w,1 + 1 ν Ax b 1 L1/L2: min x C n Wx w, ρ Ax b 2 2 BP+: min x R n x w,1, s.t. Ax = b, x 0 L1/L1+: min x R n x w,1 + 1 ν Ax b 1, s.t. x 0 L1/L2+: min x R n x w, ρ Ax b 2 2, s.t. x 0 ν, ρ 0, A C m n, b C m,x C n for the first three and x R n for the last three, W C n n is an unitary matrix serving as a sparsifying basis, and x w,1 := n i=1 w i x i.
44 Variable splitting 44/56 Given A R m n, consider min f (x) + g(ax), which is Augmented Lagrangian function: min f (x) + g(y), s.t. Ax = y L(x, y, λ) = f (x) + g(y) λ (Ax y) + 1 2µ Ax y 2 2 ADM (P x ) : x k+1 := arg min x X (P y ) : y k+1 := arg min y Y L(x, y k, λ k ), L(x k+1, y, λ k ), (P λ ) : λ k+1 := λ k γ Axk+1 y k+1 µ
45 Variable splitting 45/56 split Bregman (Goldstein and Osher) for anisotropic TV: min α Du 1 + β Ψu Au f 2 2 Introduce y = Du and w = Ψu, obtain min α y 1 + β w Au f 2 2, s.t. y = Du, w = Ψu Augmented Lagrangian function: L := α y 1 + β w Au f 2 2 p (Du y) + 1 2µ Du y 2 2 q (Ψu w) + 1 2µ Ψu w 2 2
46 Variable splitting 46/56 The variable u can be otained by ( A A + 1 ) µ (D D + I) u = A f + 1 µ (D y + Ψ w) + D p + Ψ q If A and D are diagonalizable by FFT, then the computational cost is very cheap. For example, A = RF, both R and D are circulant matrices. Variables y and w: y := shrink(du µp, αµ) w := shrink(ψu µq, αµ) apply a few iterations before updating the Lagrangian multipliers p and q Exercise: isotropic TV min α Du 2 + β Ψu Au f 2 2
47 FTVd: Fast TV deconvolution 47/56 Wang-Yang-Yin-Zhang consider: min u Di u µ Ku f 2 2 Introducing w and quadratic penalty: min u,w Alternating minimization: ( w i β w i D i u 2 2 ) + 1 2µ Ku f 2 2 For fixed u, {w i } can be solved by shrinkage at O(N) For fixed {w i }, u can be solved by FFT at O(N log N)
48 48/56 Outline: Linearized ADMM Linearized Bregman and Bregmanized operator splitting ADM + proximal point method Xiaoqun Zhang, Martin Burgerz, Stanley Osher, A unified primal-dual algorithm framework based on Bregman iteration
49 Review of Bregman method 49/56 Consider BP: Bregman method: min x 1, s.t. Ax = b D pk J (x, xk ) := x 1 x k 1 p k, x x k x k+1 := arg min x µd pk J (x, xk ) Ax b 2 2 p k+1 = p k + 1 µ A (b Ax k+1 ) Augmented Lagrangian (updating multiplier or b): x k+1 := arg min x µ x Ax bk 2 2 b k+1 = b + (b k Ax k+1 ) They are equivalent, see Yin-Osher-Goldfarb-Darbon
50 Linearized approaches 50/56 Linearized Bregman method: x k+1 := arg min µd pk J (x, xk ) + (A (Ax k b)) (x x k ) + 1 2δ x xk 2 2, p k+1 := p k 1 µδ (xk+1 x k ) 1 µ A (Ax k b), which is equivalent to x k+1 := arg min µ x δ x vk 2 2 v k+1 := v k δa (Ax k+1 b) Bregmanized operator splitting: x k+1 := arg min µ x 1 + (A (Ax k b k )) (x x k ) + 1 2δ x xk 2 2 b k+1 = b + (b k Ax k+1 ) Are they equivalent?
51 50/56 Linearized approaches Linearized Bregman method: x k+1 := arg min µd pk J (x, xk ) + (A (Ax k b)) (x x k ) + 1 2δ x xk 2 2, p k+1 := p k 1 µδ (xk+1 x k ) 1 µ A (Ax k b), which is equivalent to x k+1 := shrink(v k, µδ) v k+1 := v k δa (Ax k+1 b) Bregmanized operator splitting: or x k+1 := shrink(δa b k, µδ) b k+1 := b + (b k Ax k+1 ) x k+1 := shrink(x k δ(a (Ax k b k )), µδ) = shrink(δa b k + x k δa Ax k, µδ) b k+1 = b + (b k Ax k+1 )
52 51/56 Linearized approaches Linearized Bregman: If the sequence x k converges and p k is bounded, then the limit of x k is the unique solution of min µ x δ x 2 2 s.t. Ax = b. For µ large enough, the limit solution solves BP. Exact regularization if δ > δ What about Bregmanized operator splitting?
53 Primal ADM for l 1 -regularized problem 52/56 Primal: min µ x Ax b 2 2 which is equivalent to Augmented Lagrangian function: min µ x r 2 2, s.t. Ax b = r. L(x, r, λ) = µ x r 2 2 λ (Ax b r) + 1 2δ Ax b r 2 2 ADM scheme: x k+1 = arg min x µ x δ Ax b rk δλ k 2 2 original problem r k+1 1 = arg min r 2 r δ Axk+1 b r δλ k 2 2 λ k+1 = λ k + Axk+1 b r k+1 δ
54 Primal ADM for l 1 -regularized problem 52/56 Primal: min µ x Ax b 2 2 which is equivalent to Augmented Lagrangian function: min µ x r 2 2, s.t. Ax b = r. L(x, r, λ) = µ x r 2 2 λ (Ax b r) + 1 2δ Ax b r 2 2 ADM scheme: x k+1 = arg min x µ x 1 + (g k ) (x x k ) + 1 2τ x xk 2 2 r k+1 1 = arg min r 2 r δ Axk+1 b r δλ k 2 2 λ k+1 = λ k + Axk+1 b r k+1 Convergence of the linearized scheme? δ
55 53/56 Generalized algorithm Consider Proximal-point method: x k+1 := arg min x min f (x), s.t. Ax = b f (x) λ k, Ax b + 1 2δ Ax b x xk 2 Q Cλ k+1 := Cλ k + (b Ax k+1 ) Augmented Lagrangian or Bregman method if Q = 0 and C = δ Proximal point method by Rockafellar if Q = I and C = γ Bregmanized operator splitting if Q = 1 δ (I A A) Theoretical results: lim k Ax k b 2 = 0, lim k f (x k ) = f ( x) and all limit point of (x k, λ k ) are saddle points.
56 Convergence proof 54/56 From the x-subproblem: f (x k+1 ) s k+1 := A λ k 1 δ A (Ax k+1 b) Q(x k+1 x k ) Assume ( x, λ) is a saddle point, then A x = b, and s A λ = 0. Let s k+1 e = s k+1 s, xe k+1 = x k+1 x and λ k+1 e = λ k+1 λ: s k+1 e + 1 δ A Axe k+1 + Qxe k+1 = Qxe k + A λ k e, Cλ k+1 e = Cλ k e Axe k+1 Taking the inner product with xe k+1 ( 1 2 x k+1 e on both sides of the first equality: ) 2 Q + x k+1 x k 2 Q xe k 2 Q = s k+1 e, xe k+1 1 δ Axk+1 e Taking the inner product with λ k+1 e on both sides of the second equality: 1 ( ) λ k+1 e 2 C λ k e 2 C Axe k+1 2 C 2 1 = λ k e, Axe k+1 λ k e, Ax k+1 e Adding the above inequality (w k = (x k, λ k )): w k+1 e 2 Q + x k+1 x k 2 Q + 2 s k+1 e, xe k δ Axk+1 e 2 2 Axe k+1 2 C 1 = wk e 2 Q
57 Convergence proof 55/56 The convexity of f (x) gives s k+1 e, xe k+1 0. If 2 δ > 1 λ C m, then 2 s k+1 e, xe k δ Axk+1 e 2 2 Axe k+1 2 C 1 0 Hence, w k+1 e 2 Q w k e 2 Q, which implies the boundedness of w k = (x k, λ k ). Furthermore, { ( )} 2 s k+1 e, xe k+1 + x k+1 x k 2 2 Q + δ Axk+1 e 2 2 Axe k+1 2 C 1 k w 0 e 2 Q < Hence, 0 = lim k Axe k = lim k Ax k b, and lim k s k A λ k = 0 follows from s k+1 := A λ k 1 δ A (Ax k+1 b) Q(x k+1 x k )
58 Generalized algorithm 56/56 Consider Proximal-point method: x k+1 := arg min x z k+1 := arg min z min f (x) + g(z), s.t. Bx = z f (x) λ k, Bx g(z) λ k, z Cλ k+1 := Cλ k + (Bx k+1 z k+1 ) ADM if Q = M = 0 and C = δ + 1 2δ Bx zk x xk 2 Q + 1 2δ Bxk+1 z z zk 2 M Proximal point method by Rockafellar if Q = I and C = γ Bregmanized operator splitting if Q = 1 δ (I A A) Theoretical results: lim k Bx k z k 2 = 0, lim k f (x k ) = f ( x), lim k g(z k ) = g( z) and all limit point of (x k, z k, λ k ) are saddle points.
Sparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.
ACCELERATED LINEARIZED BREGMAN METHOD BO HUANG, SHIQIAN MA, AND DONALD GOLDFARB June 21, 2011 Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (A) method for solving the
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationOptimization for Learning and Big Data
Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for
More informationLarge-Scale L1-Related Minimization in Compressive Sensing and Beyond
Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationLecture: Duality of LP, SOCP and SDP
1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationA GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION
A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient (PDHG) algorithm proposed
More informationA General Framework for a Class of Primal-Dual Algorithms for TV Minimization
A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationSplitting methods for decomposing separable convex programs
Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationThis can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization
This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationA Unified Primal-Dual Algorithm Framework Based on Bregman Iteration
J Sci Comput (2011) 46: 20 46 DOI 10.1007/s10915-010-9408-8 A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration Xiaoqun Zhang Martin Burger Stanley Osher Received: 23 November 2009 / Revised:
More informationA GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE
A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More informationTight Rates and Equivalence Results of Operator Splitting Schemes
Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationBeyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory
Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics
More informationACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING
ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely
More informationRecent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables
Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationEE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1
EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationMath 273a: Optimization Convex Conjugacy
Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper
More informationLow Complexity Regularization 1 / 33
Low Complexity Regularization 1 / 33 Low-dimensional signal models Information level: pixels large wavelet coefficients (blue = 0) sparse signals low-rank matrices nonlinear models Sparse representations
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationOperator Splitting for Parallel and Distributed Optimization
Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationContraction Methods for Convex Optimization and monotone variational inequalities No.12
XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department
More informationON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS
ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in
More informationANALYSIS AND GENERALIZATIONS OF THE LINEARIZED BREGMAN METHOD
ANALYSIS AND GENERALIZATIONS OF THE LINEARIZED BREGMAN METHOD WOTAO YIN Abstract. This paper analyzes and improves the linearized Bregman method for solving the basis pursuit and related sparse optimization
More informationFAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING
FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING KATYA SCHEINBERG, DONALD GOLDFARB, AND XI BAI Abstract. We propose new versions of accelerated first order methods for convex
More informationDual Decomposition.
1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:
More informationProximal splitting methods on convex problems with a quadratic term: Relax!
Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.16
XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of
More informationKarush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =
More informationCompressive Sensing Theory and L1-Related Optimization Algorithms
Compressive Sensing Theory and L1-Related Optimization Algorithms Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, USA CAAM Colloquium January 26, 2009 Outline:
More informationPrimal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions
Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function
More informationMotivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:
CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationCoordinate Update Algorithm Short Course Subgradients and Subgradient Methods
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n
More informationAlternating Direction Augmented Lagrangian Algorithms for Convex Optimization
Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization Joint work with Bo Huang, Shiqian Ma, Tony Qin, Katya Scheinberg, Zaiwen Wen and Wotao Yin Department of IEOR, Columbia University
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationPrimal-dual Subgradient Method for Convex Problems with Functional Constraints
Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual
More informationAlgorithms for Constrained Optimization
1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic
More informationIterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem
Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences
More informationLecture 8 Plus properties, merit functions and gap functions. September 28, 2008
Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:
More informationThe Alternating Direction Method of Multipliers
The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider
More informationImage restoration: numerical optimisation
Image restoration: numerical optimisation Short and partial presentation Jean-François Giovannelli Groupe Signal Image Laboratoire de l Intégration du Matériau au Système Univ. Bordeaux CNRS BINP / 6 Context
More informationBlock Coordinate Descent for Regularized Multi-convex Optimization
Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline
More informationLecture: Smoothing.
Lecture: Smoothing http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Smoothing 2/26 introduction smoothing via conjugate
More informationA Tutorial on Primal-Dual Algorithm
A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.
More informationInexact Alternating Direction Method of Multipliers for Separable Convex Optimization
Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State
More informationGradient Sliding for Composite Optimization
Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this
More informationLecture 6: Conic Optimization September 8
IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationPrimal-dual fixed point algorithms for separable minimization problems and their applications in imaging
1/38 Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging Xiaoqun Zhang Department of Mathematics and Institute of Natural Sciences Shanghai Jiao Tong
More informationEnhanced Compressive Sensing and More
Enhanced Compressive Sensing and More Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Nonlinear Approximation Techniques Using L1 Texas A & M University
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationConvergence of Fixed-Point Iterations
Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and
More informationLecture: Duality.
Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong
More informationPDE-Constrained and Nonsmooth Optimization
Frank E. Curtis October 1, 2009 Outline PDE-Constrained Optimization Introduction Newton s method Inexactness Results Summary and future work Nonsmooth Optimization Sequential quadratic programming (SQP)
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationConvex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE
Convex Analysis Notes Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE These are notes from ORIE 6328, Convex Analysis, as taught by Prof. Adrian Lewis at Cornell University in the
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationSPARSE SIGNAL RESTORATION. 1. Introduction
SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationRelative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent
Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationOptimization Algorithms for Compressed Sensing
Optimization Algorithms for Compressed Sensing Stephen Wright University of Wisconsin-Madison SIAM Gator Student Conference, Gainesville, March 2009 Stephen Wright (UW-Madison) Optimization and Compressed
More informationWHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????
DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0
More informationA first-order primal-dual algorithm with linesearch
A first-order primal-dual algorithm with linesearch Yura Malitsky Thomas Pock arxiv:608.08883v2 [math.oc] 23 Mar 208 Abstract The paper proposes a linesearch for a primal-dual method. Each iteration of
More informationSequential Unconstrained Minimization: A Survey
Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.11
XI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 Alternating direction methods of multipliers for separable convex programming Bingsheng He Department of Mathematics
More informationSparse Regularization via Convex Analysis
Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 20 Subgradients Assumptions
More informationApplications of Linear Programming
Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal
More informationDuality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationLecture 18: Optimization Programming
Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationMath 273a: Optimization Lagrange Duality
Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler assume function f is proper
More informationOn the acceleration of the double smoothing technique for unconstrained convex optimization problems
On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities
More information