Majorization-Minimization Algorithm

Size: px
Start display at page:

Download "Majorization-Minimization Algorithm"

Transcription

1 Majorization-Minimization Algorithm Theory and Ying Sun and Daniel P. Palomar Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology ELEC Convex Optimization Fall , HKUST, Hong Kong

2 Acknowledgment Slides of this lecture are majorly based on the following works: [Hun-Lan J04] D. R. Hunter, and K. Lange, "A Tutorial on MM Algorithms", Amer. Statistician, pp , [Raz-Hon-Luo J13] M. Razaviyayn, M. Hong, and Z. Luo, "A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization", SIAM J. Optim., pp , [Scu-Fac-Son-Pal-Pan J14] G. Scutari, F. Facchinei, Peiran Song,D. P. Palomar, and Jong-Shi Pang, "Decomposition by Partial Linearization: Parallel Optimization of Multi-Agent Systems", IEEE Trans. Signal Processig, vol. 62, no. 3, pp , Feb [Sun-Bab-Pal J17] Y. Sun, P. Babu, and D. P. Palomar, Majorization-Minimization Algorithms in Signal Processing, Communications, and Machine Learning, IEEE Trans. Signal Process, vol. 65, no. 3, pp , Feb

3 Outline 1 The Majorization-Minimization Algorithm 2 Block Coordinate Descent 3 Exact Jacobi Successive Convex Approximation Extensions

4 Outline 1 The Majorization-Minimization Algorithm 2 Block Coordinate Descent 3 Exact Jacobi Successive Convex Approximation Extensions

5 Problem Statement Consider the following optimization problem minimize f (x) x subject to x X, with X being a closed convex set and f (x) being continuous. f (x) is too complicated to manipulate. Idea: successively minimize an approximating function u ( x,x k) ( x k+1 = arg min u x,x k), x X hoping the sequence of minimizers { x k} will converge to optimal x. Question: how to construct u ( x,x k)? Ying Sun and Daniel P. Palomar MM Algorithm 5 / 65

6 Problem Statement Consider the following optimization problem minimize f (x) x subject to x X, with X being a closed convex set and f (x) being continuous. f (x) is too complicated to manipulate. Idea: successively minimize an approximating function u ( x,x k) ( x k+1 = arg min u x,x k), x X hoping the sequence of minimizers { x k} will converge to optimal x. Question: how to construct u ( x,x k)? Ying Sun and Daniel P. Palomar MM Algorithm 5 / 65

7 Problem Statement Consider the following optimization problem minimize f (x) x subject to x X, with X being a closed convex set and f (x) being continuous. f (x) is too complicated to manipulate. Idea: successively minimize an approximating function u ( x,x k) ( x k+1 = arg min u x,x k), x X hoping the sequence of minimizers { x k} will converge to optimal x. Question: how to construct u ( x,x k)? Ying Sun and Daniel P. Palomar MM Algorithm 5 / 65

8 Terminology Distance from a point to a set: Directional derivative: d (x,s ) = inf x s. s S f (x;d) liminf λ 0 f (x + λd) f (x). λ Stationary point: x is a stationary point if f (x;d) 0, d such that x + d X. Ying Sun and Daniel P. Palomar MM Algorithm 6 / 65

9 Majorization-Minimization Construction rule: u (y,y) = f (y), y X u (x,y) f (x), x,y X u (x,y;d) x=y = f (y;d), d with y + d X u (x,y) is continuous in x and y Pictorially: f (x) u ( x,x k) (A1) (A2) (A3) (A4) x k Ying Sun and Daniel P. Palomar MM Algorithm 7 / 65

10 Algorithm Majorization-Minimization (Successive Upper-Bound Minimization): 1: Find a feasible point x 0 X and set k = 0 2: repeat 3: x k+1 = argmin x X u ( x,x k) (global minimum) 4: k k + 1 5: until some convergence criterion is met Ying Sun and Daniel P. Palomar MM Algorithm 8 / 65

11 Convergence Under assumptions A1-A4, every limit point of the sequence { x k } is a stationary point of the original problem. If further assume that the level set X 0 = { x f (x) f ( x 0)} is compact, then ( lim d x k,x ) = 0, k where X is the set of stationary points. Ying Sun and Daniel P. Palomar MM Algorithm 9 / 65

12 Outline 1 The Majorization-Minimization Algorithm 2 Block Coordinate Descent 3 Exact Jacobi Successive Convex Approximation Extensions

13 Construct Surrogate Function The performance of Majorization-Minimization algorithm depends crucially on the surrogate function u ( x,x k). Guideline: the global minimizer of u ( x,x k) should be easy to find. Suppose f (x) = f 1 (x) + κ (x), where f 1 (x) is some nice function and κ (x) is the one needed to be approximated. Ying Sun and Daniel P. Palomar MM Algorithm 11 / 65

14 Construction by Convexity Suppose κ (t) is convex, then ( κ i with α i 0 and α i = 1. ) α i t i α i κ (t i ) i Ying Sun and Daniel P. Palomar MM Algorithm 12 / 65

15 For example: ( ) κ w T x = κ (w ( T x x k) + w T x k) ( ) = κ i α i κ i α i ( wi ( xi x k i α i + w T x k ( ( wi xi xi k ) ) + w T x k α i )) If further assume that w and x are positive (α i = w i x k i /w T x k ): ( ) κ w T x i w i xi k ( w T w T x k κ x k ) x i The surrogate functions are separable (parallel algorithm). x k i Ying Sun and Daniel P. Palomar MM Algorithm 13 / 65

16 Construction by Taylor Expansion Suppose κ (x) is concave and differentiable, then ( κ (x) κ x k) ( + κ x k)( x x k), which is a linear upper-bound. Suppose κ (x) is convex and twice differentiable, then ( κ (x) κ x k) + κ (x k) T ( x x k) + 1 (x x k) T M (x x k) 2 if M 2 κ (x) 0, x. Ying Sun and Daniel P. Palomar MM Algorithm 14 / 65

17 Construction by Inequalities Arithmetic-Geometric Mean Inequality: ( n i=1x i ) 1/n Cauchy-Schwartz Inequality: Jensen s Inequality: with κ ( ) being convex. 1 n x xt x k x k κ (Ex) Eκ (x) n x i i=1 Ying Sun and Daniel P. Palomar MM Algorithm 15 / 65

18 Outline 1 The Majorization-Minimization Algorithm 2 Block Coordinate Descent 3 Exact Jacobi Successive Convex Approximation Extensions

19 EM Algorithm Assume the complete data set {x, z} consists of observed variable x and latent variable z. Objective: estimate parameter θ Θ from x. Maximum likelihood estimator: ˆθ = argmin θ Θ logp (x θ) EM (Expectation Maximization) algorithm: E-step: evaluate p ( z x,θ k) guess z from current estimate of θ M-step: update θ as θ k+1 = arg min θ Θ u ( θ,θ k), where ( u θ,θ k) = E z x,θ k logp (x,z θ) update θ from guessed complete data set Ying Sun and Daniel P. Palomar MM Algorithm 17 / 65

20 An MM Interpretation of EM The objective function can be written as logp (x θ) = loge z θ p (x z,θ) ( ( p z x,θ k ) ) p (x z,θ) = loge z θ p (z x,θ k ) ( ) p (x z,θ) = loge z x,θ k p (z x,θ k ) p (z θ) ( ) p (x z,θ) E z x,θ k log p (z x,θ k ) p (z θ) (Jensen s Inequality) = E z x,θ k logp (x,z θ) + E z x,θ k p (z x,θ k) }{{} u(θ,θ k ) Ying Sun and Daniel P. Palomar MM Algorithm 18 / 65

21 Proximal Minimization f (x) is convex. Solve min x f (x) by solving the equivalent problem minimize f (x) + 1 x X,y X 2c x y 2. Objective function is strongly convex in both x and y. Algorithm: { x k+1 = arg min f (x) + 1 } x y k 2 x X 2c y k+1 = x k+1. An MM interpretation: x k+1 = arg min x X { f (x) + 1 2c } x x k 2 Ying Sun and Daniel P. Palomar MM Algorithm 19 / 65

22 DC Programming Consider the unconstrained problem minimize x R n f (x), where f (x) = g (x)+h(x) with g (x) convex and h (x) concave. DC (Difference of Convex) Programming generates { x k} by solving ( g x k+1) ( = h x k). An MM interpretation: x k+1 = arg min x { g (x) + h (x k) T ( x x k)}. Ying Sun and Daniel P. Palomar MM Algorithm 20 / 65

23 Outline 1 The Majorization-Minimization Algorithm 2 Block Coordinate Descent 3 Exact Jacobi Successive Convex Approximation Extensions

24 Power Control by GP [Chi-Tan-Pal-O Ne-Jul J07] Problem: maximize system throughput. Essentially we need to solve the following problem: minimize P P j i G ij P j +n i j G ij P j +n i Objective function is the ratio of two posynomials. Minorize a posynomial, denoted by g (x) = i m i (x), by mononial: ( ) mi (x) αi g (x) i α i where α i = m i(x k ). (Arithmetic-Geometric Mean Inequality) g(x k ) Solution: approximate the denominator posynomial j G ij P j + n i by monomial. Ying Sun and Daniel P. Palomar MM Algorithm 22 / 65

25 Reweighted l 1 -norm Sparsity signal recovery problem minimize x subject to x 0 Ax = b l 1 -norm approximation minimize x subject to x 1 Ax = b General form minimize x subject to n i=1 φ ( x i ) Ax = b Ying Sun and Daniel P. Palomar MM Algorithm 23 / 65

26 [Can-Wak-Boy J08] Assume φ (t) is concave nondecreasing, at xi k, φ ( x i ) is majorized by wi k x i with wi k = φ (t) t= xi. At each iteration a weighted l 1 -norm is solved minimize w k 1 x i x i 1 subject to Ax = b x 0 x x log 1 x Ying Sun and Daniel P. Palomar MM Algorithm 24 / 65

27 Sparse Generalized Eigenvalue Problem l 0 -norm regularized generalized eigenvalue problem maximize x T Ax ρ x x 0 subject to x T Bx = 1. Replace x i 0 by some nicely behaved function g p (x i ) x i p, 0 < p 1 log(1 + x i /p)/log (1 + 1/p), p > 0 1 e x i /p, p > 0. Take g p (x i ) = x i p for example. Ying Sun and Daniel P. Palomar MM Algorithm 25 / 65

28 [Son-Bab-Pal J15a] Majorize g p (x i ) at xi k by quadratic function wi k xi 2 + ci k. The surrogate function for g p (x i ) = x i p is defined as ( ) u x i,xi k = p 2 x k i p 2 ( xi p ) x k p 2 i. Solve at each iteration the following GEVP: maximize x T Ax ρx T diag ( w k) x x subject to x T Bx = 1 However, as x i 0, w i +... Ying Sun and Daniel P. Palomar MM Algorithm 26 / 65

29 Smooth approximation of g p (x): { p gp ε (x) = 2 εp 2 x 2, x ε x p ( 1 p ) 2 ε p, x > ε When x ε, w remains to be a constant. Ying Sun and Daniel P. Palomar MM Algorithm 27 / 65

30 objective value IRQM (proposed) DC SGEP iterations Ying Sun and Daniel P. Palomar MM Algorithm 28 / 65

31 Sequence Design Complex unimodular sequence {x n C} N n=1. Autocorrelation: r k = N n=k+1 x nxn k = r k, k = 0,...,N 1. Integrated sidelobe level (ISL): Problem formulation: N 1 ISL = k=1 r k 2. minimize {x n } N n=1 subject to ISL x n = 1, n = 1,...,N. Ying Sun and Daniel P. Palomar MM Algorithm 29 / 65

32 By Fourier transform: ISL 2N p=1 [ a H p x 2 N with x = [x 1,...,x N ] T, a p = [ 1,e jω p,...,e jω p(n 1) ] T and ω p = 2π 2N (p 1). Equivalent problem: ] 2 minimize x subject to 2N ( p=1 a H p xx H ) 2 a p x n = 1, n. Ying Sun and Daniel P. Palomar MM Algorithm 30 / 65

33 [Son-Bab-Pal J15b] Define A = [a 1,...,a 2N ], p k = [ a H 1 x k 2,..., a H 2N x k 2] T, Ã = A ( diag ( p k) p k maxi ) A H. Quadratic surrogate function: const. pmaxx k H AA H x + 2Re (x (Ã ( H 2N 2 x k x k) ) ) H x k Equivalent to with minimize x subject to x y 2 x n = 1, n ( ( y = Ã 2N 2 x k x k) ) H x k Closed-form solution: x n = e j arg(y n). Ying Sun and Daniel P. Palomar MM Algorithm 31 / 65

34 55 50 benchmark proposed method Integrated sidelobe level (db) iteration Ying Sun and Daniel P. Palomar MM Algorithm 32 / 65

35 Covariance Estimation x i elliptical(0,σ) Fitting normalized sample s i = Gaussian distribution x i x i 2 to Angular Central f (s i ) det(σ) 1/2 ( s T i Σ 1 s i ) K/2 [Sun-Bab-Pal J14] Shrinkage penalty Solve the following problem: h (Σ) = log det(σ) + Tr ( Σ 1 T ) minimize log det(σ) + K Σ N log( x T i Σ 1 ) x i + αh (Σ) subject to Σ 0 Ying Sun and Daniel P. Palomar MM Algorithm 33 / 65

36 At Σ k, the objective function is majorized by (1 + α)logdet(σ) + K N N i=1 Surrogate function is convex in Σ 1. x T i x T i Σ 1 x i ( (Σ k) + αtr 1 Σ 1 T ) xi Setting the gradient to zero leads to the weighted sample average Σ k+1 = 1 K 1 + α N x T i x i x T i ( Σ k) + α α T xi Ying Sun and Daniel P. Palomar MM Algorithm 34 / 65

37 20 18 Objective Function Value Iteration Ying Sun and Daniel P. Palomar MM Algorithm 35 / 65

38 Outline 1 The Majorization-Minimization Algorithm 2 Block Coordinate Descent 3 Exact Jacobi Successive Convex Approximation Extensions

39 Problem Statement Block Coordinate Descent Consider the following problem minimize x X f (x) Set X possesses Cartesian product structure X = m i=1 X i. Observation: the problem minimize f ( x 0 1 x i X,...,x0 i 1,x i,x 0 ) i+1,...,x0 m i with x 0 i taking some feasible value, is easy to solve. Ying Sun and Daniel P. Palomar MM Algorithm 37 / 65

40 Outline 1 The Majorization-Minimization Algorithm 2 Block Coordinate Descent 3 Exact Jacobi Successive Convex Approximation Extensions

41 Block Coordinate Descent (BCD) Block Coordinate Descent Denote x (x 1,...,x m ), f ( x 0 1,...,x0 i 1,x i,x 0 ) ( i+1,...,x0 m f xi,x 0) Block Coordinate Descent (nonlinear Gauss-Seidel) 1: Initialize x 0 X and set k = 0. 2: repeat 3: k = k + 1, i = (k mod n) + 1 4: x k i = argmin xi X i f ( x i,x k 1) 5: x k i x k 1 i, k i 6: until some convergence criterion is met Ying Sun and Daniel P. Palomar MM Algorithm 39 / 65

42 Convergence Block Coordinate Descent [Ber B99] Assume that f (x) is continuously differentiable over the set X. x k i = argmin xi X i f ( x i,x k 1) has a unique solution. Then every limit point of the sequence { x k} is a stationary point. [Gri-Sci J00] Generalizations globally convergent for m = 2. f is component-wise strictly quasi-convex w.r.t. m 2 components. f is pseudo-convex. Ying Sun and Daniel P. Palomar MM Algorithm 40 / 65

43 Outline 1 The Majorization-Minimization Algorithm 2 Block Coordinate Descent 3 Exact Jacobi Successive Convex Approximation Extensions

44 BS-MM Algorithm Block Coordinate Descent Combination of MM and BCD Block Succesive Majorization-Minimization (BS-MM): 1: Initialize x 0 X and set k = 0. 2: repeat 3: k = k + 1, i = (k mod n) + 1 4: X k = arg min xi X i u i ( xi,x k 1) 5: Set x k i to be an arbitrary element in X k 6: x k i x k 1 i, k i 7: until some convergence criterion is met Generalization of BCD Ying Sun and Daniel P. Palomar MM Algorithm 42 / 65

45 Convergence Block Coordinate Descent Surrogate function u i (, ) satisfies the following assumptions u i (y i,y) = f (y), y X, i u i (x i,y) f (y 1,...,y i 1,x i,y i+1,...,y n ), x i X i, y X, i u i (x i,y;d i ) xi = f (y;d), =y i d = (0,...,d i,...,0) such that y i + d i X i, i u i (x i,y) is continuous in (x i,y), i (B1) (B2) (B3) (B4) In short, u i ( xi,x k) majorizes f (x) on the ith block. Ying Sun and Daniel P. Palomar MM Algorithm 43 / 65

46 Block Coordinate Descent Under assumptions B1-B4, for simplicity additionally assume that f is continuously differentiable, u i (x i,y) is quasi-convex in x i, each subproblem min xi X i u i ( xi,x k 1) has a unique solution for any x k 1 X, then every limit point of { x k} is a stationary point. level set X 0 = { x f (x) f ( x 0)} is compact, each subproblem min xi X i u i ( xi,x k 1) has a unique solution for any x k 1 X for at least m 1 blocks, then lim k d ( x k,x ) = 0. More restrictive assumption than MM due to the cyclic update behavior. Ying Sun and Daniel P. Palomar MM Algorithm 44 / 65

47 Outline 1 The Majorization-Minimization Algorithm 2 Block Coordinate Descent 3 Exact Jacobi Successive Convex Approximation Extensions

48 Alternating Proximal Minimization Block Coordinate Descent Consider the problem minimize f (x 1,...,x m ) x subject to x i X i, with f ( ) being convex in each block. The convergence of BCD is not easy to establish since each subproblem may have multiple solutions. Alternating Proximal Minimization solves minimize x i subject to f ( x k 1,...,xk i 1,x i,x k i+1,...,xk m x i X i Strictly convex objective unique minimizer ) + 1 2c x i x k 2 i Ying Sun and Daniel P. Palomar MM Algorithm 46 / 65

49 Proximal Splitting Algorithm Block Coordinate Descent Consider the following problem minimize x subject to m i=1 f i (x i ) + f m+1 (x 1,...,x m ) x i X i, i = 1,...,m with f i convex and lower semicontinuous, f m+1 convex and Cyclically update: f m+1 (x) f m+1 (y) β i x i y i x k+1 i = prox γfi ( x k i γ xi f m+1 (x k)), with the proximity operator defined as prox f (x) = arg min y X f (y) x y 2. Ying Sun and Daniel P. Palomar MM Algorithm 47 / 65

50 Proximal Splitting Algorithm Block Coordinate Descent BS-MM interpretation: ( u i x i,x k) = f i (x i ) + 1 2γ Check: + f j (x k j j i f m+1 ( x k) + 1 2γ ( f m+1 x k) + β i ) 2 f m+1 (x k i,x i x i x k i ) + f m+1 ( x k i,x i x i x k i x i x k i 2 + xi f m+1 ( ). x k) T (x i x k i 2 ( + xi f m+1 x k) T ) (x i x k i 2 ( + xi f m+1 x k) T ) (x i x k i with γ [ε i,2/β i ε i ] and ε i (0,min{1,1/β i }). ) (Descent lemma) Ying Sun and Daniel P. Palomar MM Algorithm 48 / 65

51 Outline 1 The Majorization-Minimization Algorithm 2 Block Coordinate Descent 3 Exact Jacobi Successive Convex Approximation Extensions

52 Block Coordinate Descent Robust Estimation of Location and Scatter x i elliptical(µ,r) [Sun-Bab-Pal J15] Fitting x i to a Cauchy distribution with pdf ( ) f (x) det(r) 1/2 1 + (x i µ) T (K+1)/2 R 1 (x i µ) Shrinkage penalty h (t,t) = K log ( Tr ( R 1 T )) +logdet(r)+log Solve the following problem: minimize µ,r 0 ( ) 1 + (t µ) T R 1 (t µ) ) logdet(r) + K+1 N N i=1 (1 log + (x i µ) T R 1 (x i µ) +αh (t,t) Ying Sun and Daniel P. Palomar MM Algorithm 50 / 65

53 Block Coordinate Descent BS-MM Algorithm update: µ t+1 = (K + 1) N i=1 w i (µ t,r t )x i + Nαw t (µ t,r t )t (K + 1) N i=1 w i (µ t,r t ) + Nαw t (µ t,r t ) R t+1 = K + 1 N ( )( )( ) T N + Nα w i µt+1,r t xi µ t+1 xi µ t+1 i=1 + Nα N + Nα w ( )( t µt+1,r t µt+1 t )( µ t+1 t ) T + NαK N + Nα ( Tr T R 1 t ) T where w i (µ,r) = w t (µ,r) = (x i µ) T R 1 (x i µ) (t µ) T R 1 (t µ). Ying Sun and Daniel P. Palomar MM Algorithm 51 / 65

54 Block Coordinate Descent Objective function value Iterations Ying Sun and Daniel P. Palomar MM Algorithm 52 / 65

55 Outline 1 The Majorization-Minimization Algorithm 2 Block Coordinate Descent 3 Exact Jacobi Successive Convex Approximation Extensions

56 Exact Jacobi Successive Convex Approximation Extensions Problem Statement Consider the following problem: minimize x subject to f (x) x i X i where the X i s are closed and convex sets, f (x) = L l=1 f l (x 1,...,x m ). Conditional gradient update (Frank-Wolfe): direction d k x k x k with x k+1 = x k + γ k d k x k i = arg min x i X i xi f (x k) T ( ) x i x k i step-size γ k (0,1], chosen to guarantee convergence. Ying Sun and Daniel P. Palomar MM Algorithm 54 / 65

57 Exact Jacobi Successive Convex Approximation Extensions Exact Jacobi SCA Algorithm Idea: Conditional gradient update linearize all the f l s at x k. Each function f l might be convex w.r.t. some block x i. We want to preserve the convex property of f l ( xi,x k i). Solution: keep the convex f l ( xi,x k i) s and linearize the others. Ying Sun and Daniel P. Palomar MM Algorithm 55 / 65

58 Exact Jacobi Successive Convex Approximation Extensions Define C i as the set of indices of l such that f l ( xi,x k i) is convex. Approximate f (x) on the ith block at point x k : with ( f i x i,x k) ( ) = f l x i,x k i + π i (x k) T ( ) x i x k i l C i }{{}}{{} linear approx. convex terms non-convex terms + τ ( ) i T ( x 2 i x k i Hi x k)( ) x i x k i, }{{} proximal term π i (x k) = xi f l (x k) ( and H i x k) c Hi I. l / C i Ying Sun and Daniel P. Palomar MM Algorithm 56 / 65

59 Exact Jacobi Successive Convex Approximation Extensions Exact Jacobi SCA update: ( ) ( ˆx i x k,τ i = arg min f i x i,x k) x i X i Step-size rule x k+1 = x k + γ k (ˆx x k) constant step-size that depends on the Lipschitz constant of f diminishing step-size Remark: update of the blocks can be done sequentially (Gauss-Seidel SCA Algorithm) Ying Sun and Daniel P. Palomar MM Algorithm 57 / 65

60 Outline 1 The Majorization-Minimization Algorithm 2 Block Coordinate Descent 3 Exact Jacobi Successive Convex Approximation Extensions

61 Exact Jacobi Successive Convex Approximation Extensions Extensions FLEXA non-smooth objective function inexact update direction flexible block update choice HyFLEXA Ying Sun and Daniel P. Palomar MM Algorithm 59 / 65

62 Exact Jacobi Successive Convex Approximation Extensions Comparison BS-MM FLEXA convergence stationary point stationary point objective function continuous continuous may not be smooth may not be smooth constraint set Cartesian Cartesian & convex update rule sequential sequential or parallel global upper-bound local approximation approx. function unique minimizer not required can be non-convex convex approx. Ying Sun and Daniel P. Palomar MM Algorithm 60 / 65

63 Exact Jacobi Successive Convex Approximation Extensions Summary We have studied Majorization-Minimization algorithm Block Coordinate Descent algorithm algorithm We have briefly introduced Distributed Successive Convex Approximation algorithm Ying Sun and Daniel P. Palomar MM Algorithm 61 / 65

64 References I D. R. Hunter and K. Lange. A tutorial on MM algorithms. Amer. Statistician, volume 58, pp , M. Razaviyayn, M. Hong, and Z. Luo. A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim., volume 23, no. 2, pp , G. Scutari, F. Facchinei, P. Song, D. Palomar, and J.-S. Pang. Decomposition by partial linearization: Parallel optimization of multi-agent systems. IEEE Trans. Signal Process., volume 62, no. 3, pp , Y. Sun, P. Babu, and D. Palomar. Majorization-minimization algorithms in signal processing, communications, and machine learning. IEEE Transactions on Signal Processing, volume 65, no. 3, pp , M. Chiang, C. W. Tan, D. Palomar, D. O Neill, and D. Julian. Power control by geometric programming. IEEE Trans. Wireless Commun, volume 6, no. 7, pp , 2007.

65 References II E. J. Candes, M. Wakin, and S. Boyd. Enhancing sparsity by reweighted l1 minimization. J. Fourier Anal. Appl., volume 14, no. 5-6, pp , J. Song, P. Babu, and D. P. Palomar. Sparse generalized eigenvalue problem via smooth optimization. IEEE Trans. Signal Process., volume 63, no. 7, pp , Optimization methods for designing sequences with low autocorrelation sidelobes. IEEE Trans. Signal Process., volume 63, no. 15, pp , Y. Sun, P. Babu, and D. P. Palomar. Regularized Tyler s scatter estimator: Existence, uniqueness, and algorithms. IEEE Trans. Signal Processing, volume 62, no. 19, pp , D. P. Bertsekas. Nonlinear Programming. Athena Scientific, L. Grippo and M. Sciandrone. On the convergence of the block nonlinear gauss seidel method under convex constraints. Oper. Res. Lett., volume 26, no. 3, pp , 2000.

66 References III Y. Sun, P. Babu, and D. P. Palomar. Regularized robust estimation of mean and covariance matrix under heavy-tailed distributions. IEEE Transactions on Signal Processing, volume 63, no. 12, pp , 2015.

67 Thanks For more information visit:

Majorization Minimization (MM) and Block Coordinate Descent (BCD)

Majorization Minimization (MM) and Block Coordinate Descent (BCD) Majorization Minimization (MM) and Block Coordinate Descent (BCD) Wing-Kin (Ken) Ma Department of Electronic Engineering, The Chinese University Hong Kong, Hong Kong ELEG5481, Lecture 15 Acknowledgment:

More information

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Convex Functions Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Definition convex function Examples

More information

Sum-Power Iterative Watefilling Algorithm

Sum-Power Iterative Watefilling Algorithm Sum-Power Iterative Watefilling Algorithm Daniel P. Palomar Hong Kong University of Science and Technolgy (HKUST) ELEC547 - Convex Optimization Fall 2009-10, HKUST, Hong Kong November 11, 2009 Outline

More information

Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization

Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization Meisam Razaviyayn meisamr@stanford.edu Mingyi Hong mingyi@iastate.edu Zhi-Quan Luo luozq@umn.edu Jong-Shi Pang jongship@usc.edu

More information

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization

More information

Majorization Minimization - the Technique of Surrogate

Majorization Minimization - the Technique of Surrogate Majorization Minimization - the Technique of Surrogate Andersen Ang Mathématique et de Recherche opérationnelle Faculté polytechnique de Mons UMONS Mons, Belgium email: manshun.ang@umons.ac.be homepage:

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

ELEG5481 SIGNAL PROCESSING OPTIMIZATION TECHNIQUES 6. GEOMETRIC PROGRAM

ELEG5481 SIGNAL PROCESSING OPTIMIZATION TECHNIQUES 6. GEOMETRIC PROGRAM ELEG5481 SIGNAL PROCESSING OPTIMIZATION TECHNIQUES 6. GEOMETRIC PROGRAM Wing-Kin Ma, Dept. Electronic Eng., The Chinese University of Hong Kong 1 Some Basics A function f : R n R with domf = R n ++, defined

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

BLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS

BLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS BLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS Songtao Lu, Ioannis Tsaknakis, and Mingyi Hong Department of Electrical

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

A New Trust Region Algorithm Using Radial Basis Function Models

A New Trust Region Algorithm Using Radial Basis Function Models A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010 Outline 1 Introduction 2 Background Taylor series approximations

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications Professor M. Chiang Electrical Engineering Department, Princeton University March

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Coordinate descent methods

Coordinate descent methods Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5

More information

Index Tracking in Finance via MM

Index Tracking in Finance via MM Index Tracking in Finance via MM Prof. Daniel P. Palomar and Konstantinos Benidis The Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong

More information

LARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION

LARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION LARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION Aryan Mokhtari, Alec Koppel, Gesualdo Scutari, and Alejandro Ribeiro Department of Electrical and Systems

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29 Optimization Yuh-Jye Lee Data Science and Machine Intelligence Lab National Chiao Tung University March 21, 2017 1 / 29 You Have Learned (Unconstrained) Optimization in Your High School Let f (x) = ax

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

On the iterate convergence of descent methods for convex optimization

On the iterate convergence of descent methods for convex optimization On the iterate convergence of descent methods for convex optimization Clovis C. Gonzaga March 1, 2014 Abstract We study the iterate convergence of strong descent algorithms applied to convex functions.

More information

Variable Metric Forward-Backward Algorithm

Variable Metric Forward-Backward Algorithm Variable Metric Forward-Backward Algorithm 1/37 Variable Metric Forward-Backward Algorithm for minimizing the sum of a differentiable function and a convex function E. Chouzenoux in collaboration with

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Computing the MLE and the EM Algorithm

Computing the MLE and the EM Algorithm ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms JOTA manuscript No. (will be inserted by the editor) Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms Peter Ochs Jalal Fadili Thomas Brox Received: date / Accepted: date Abstract

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Peter Ochs, Jalal Fadili, and Thomas Brox Saarland University, Saarbrücken, Germany Normandie Univ, ENSICAEN, CNRS, GREYC, France

More information

Mobile Network Energy Efficiency Optimization in MIMO Multi-Cell Systems

Mobile Network Energy Efficiency Optimization in MIMO Multi-Cell Systems 1 Mobile Network Energy Efficiency Optimization in MIMO Multi-Cell Systems Yang Yang and Dario Sabella Intel Deutschland GmbH, Neubiberg 85579, Germany. Emails: yang1.yang@intel.com, dario.sabella@intel.com

More information

Generalized Gradient Descent Algorithms

Generalized Gradient Descent Algorithms ECE 275AB Lecture 11 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 11 ECE 275A Generalized Gradient Descent Algorithms ECE 275AB Lecture 11 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Estimators based on non-convex programs: Statistical and computational guarantees

Estimators based on non-convex programs: Statistical and computational guarantees Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright

More information

On the Convergence of the Concave-Convex Procedure

On the Convergence of the Concave-Convex Procedure On the Convergence of the Concave-Convex Procedure Bharath K. Sriperumbudur and Gert R. G. Lanckriet UC San Diego OPT 2009 Outline Difference of convex functions (d.c.) program Applications in machine

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration E. Chouzenoux, A. Jezierska, J.-C. Pesquet and H. Talbot Université Paris-Est Lab. d Informatique Gaspard

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information

arxiv: v1 [math.oc] 7 Dec 2018

arxiv: v1 [math.oc] 7 Dec 2018 arxiv:1812.02878v1 [math.oc] 7 Dec 2018 Solving Non-Convex Non-Concave Min-Max Games Under Polyak- Lojasiewicz Condition Maziar Sanjabi, Meisam Razaviyayn, Jason D. Lee University of Southern California

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Minimax Problems. Daniel P. Palomar. Hong Kong University of Science and Technolgy (HKUST)

Minimax Problems. Daniel P. Palomar. Hong Kong University of Science and Technolgy (HKUST) Mini Problems Daniel P. Palomar Hong Kong University of Science and Technolgy (HKUST) ELEC547 - Convex Optimization Fall 2009-10, HKUST, Hong Kong Outline of Lecture Introduction Matrix games Bilinear

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

Dual Decomposition.

Dual Decomposition. 1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Lecture 2 February 25th

Lecture 2 February 25th Statistical machine learning and convex optimization 06 Lecture February 5th Lecturer: Francis Bach Scribe: Guillaume Maillard, Nicolas Brosse This lecture deals with classical methods for convex optimization.

More information

ELE539A: Optimization of Communication Systems Lecture 6: Quadratic Programming, Geometric Programming, and Applications

ELE539A: Optimization of Communication Systems Lecture 6: Quadratic Programming, Geometric Programming, and Applications ELE539A: Optimization of Communication Systems Lecture 6: Quadratic Programming, Geometric Programming, and Applications Professor M. Chiang Electrical Engineering Department, Princeton University February

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

Vector Derivatives and the Gradient

Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 10 ECE 275A Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego

More information

Sparse Approximation via Penalty Decomposition Methods

Sparse Approximation via Penalty Decomposition Methods Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Taylor-like models in nonsmooth optimization

Taylor-like models in nonsmooth optimization Taylor-like models in nonsmooth optimization Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with Ioffe (Technion), Lewis (Cornell), and Paquette (UW) SIAM Optimization 2017 AFOSR,

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

On the convergence of a regularized Jacobi algorithm for convex optimization

On the convergence of a regularized Jacobi algorithm for convex optimization On the convergence of a regularized Jacobi algorithm for convex optimization Goran Banjac, Kostas Margellos, and Paul J. Goulart Abstract In this paper we consider the regularized version of the Jacobi

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

Solving Dual Problems

Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

ELE539A: Optimization of Communication Systems Lecture 16: Pareto Optimization and Nonconvex Optimization

ELE539A: Optimization of Communication Systems Lecture 16: Pareto Optimization and Nonconvex Optimization ELE539A: Optimization of Communication Systems Lecture 16: Pareto Optimization and Nonconvex Optimization Professor M. Chiang Electrical Engineering Department, Princeton University March 16, 2007 Lecture

More information

Convex optimization problems. Optimization problem in standard form

Convex optimization problems. Optimization problem in standard form Convex optimization problems optimization problem in standard form convex optimization problems linear optimization quadratic optimization geometric programming quasiconvex optimization generalized inequality

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

6.254 : Game Theory with Engineering Applications Lecture 7: Supermodular Games

6.254 : Game Theory with Engineering Applications Lecture 7: Supermodular Games 6.254 : Game Theory with Engineering Applications Lecture 7: Asu Ozdaglar MIT February 25, 2010 1 Introduction Outline Uniqueness of a Pure Nash Equilibrium for Continuous Games Reading: Rosen J.B., Existence

More information

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with H. Bauschke and J. Bolte Optimization

More information

Penalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

QUADRATIC MAJORIZATION 1. INTRODUCTION

QUADRATIC MAJORIZATION 1. INTRODUCTION QUADRATIC MAJORIZATION JAN DE LEEUW 1. INTRODUCTION Majorization methods are used extensively to solve complicated multivariate optimizaton problems. We refer to de Leeuw [1994]; Heiser [1995]; Lange et

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information