First-order methods for structured nonsmooth optimization

Size: px
Start display at page:

Download "First-order methods for structured nonsmooth optimization"

Transcription

1 First-order methods for structured nonsmooth optimization Sangwoon Yun Department of Mathematics Education Sungkyunkwan University Oct 19, 2016 Center for Mathematical Analysis & Computation, Yonsei University

2 Outline I. Coordinate (Gradient) Descent Method II. Incremental Gradient Method III. (Linearized) Alternating Direction Method of Multipliers

3 Thank you! I. Coordinate (Gradient) Descent Method

4 Structured Nonsmooth Optimization min x F (X) = f (x) + P(x), f : real-valued, (convex) smooth on domf P: proper, convex, lsc In particular, P: separable i.e., P(x) = n j=1 P j(x j ) Bound constraint: P(x) = where l u (possibly with or ). l 1 -norm: P(x) = λ x 1 with λ > 0 { 0 if l x u; else, or indicator function of linear constraints (Ax = b).

5 Bound-constrained Optimization min l x u f (x), where f : R N R is smooth, l u (possibly with or components). Can be reformulated as the following unconstrained optimization problem: min x { 0 if l x u where P(x) =. else f (x) + P(x),

6 l 1 -regularized Convex Minimization 1. l 1 -regularized linear least squares problem Find x so that Ax b 0 and x has few nonzeros. Formulate this as an unconstrained convex optimization problem: min x R n Ax b λ x 1 2. l 1 -regularized logistic regression problem min w R n 1,v R 1 m m log(1 + exp( (w T a i + vb i ))) + λ w 1, i=1 where a i = b i z i and (z i, b i ) R n 1 { 1, 1}, i = 1,..., m are a given set of (observed or training) examples.

7 Support Vector Machines

8 Support Vector Machines Support Vector Classification Training points : z i R p, i = 1,..., n. Consider a simple case with two classes (linear separable case): Define a vector a : a i = { 1 if z i in class 1 1 if z i in class 2 A hyperplane (0 = w T z b) separates data with the maximal margin. Margin is the distance of the hyperplane to the nearest of the positive and negative points. Nearest points lie on the planes ±1 = w T z b

9 Support Vector Machines Convex Quadratic Programming Problem: Support Vector Classification The (original) Optimization Problem min a,b 1 2 a 2 2 subject to z i ( a T x i b ) 1, i = 1,..., n. The Modified Optimization Problem (allows, but penalizes, the failure of a point to reach the correct margin) min a,b,ξ 1 2 a C n i=1 ξ i subject to z i ( a T x i b ) 1 ξ i, ξ i 0, i = 1,..., n.

10 Support Vector Machines SVM (Dual) Optimization Problem (Convex Quadratic Program) min x 1 2 x T Qx e T x subject to 0 x i C, i = 1,..., n, a T x = 0, where a { 1, 1} n, 0 < C, e = [1,..., 1] T, Q R n n is a sym. pos. semidef. with Q ij = a i a j K (z i, z j ), K : R p R p R ( kernel function ), and z i R p ( ith data point ), i = 1,..., n. Popular Choices of K : linear kernel K (z i, z j ) = z T i z j radial basis function kernel K (z i, z j ) = exp( γ z i z j 2 2 ) sigmoid kernel K (z i, z j ) = tanh(γz T i z j ) where γ is a constant. Q is an n n fully dense matrix and even indefinite. (n 5000)

11 Rank Minimization Now, imagine that we only observe a few entries of a data matrix. Then is it possible to accurately guess the entries that we have not seen? Netflix problem: Given a sparse matrix where M ij is the rating given by user i on movie j, predict the rating a user would assign to a movie he has not seen, i.e., we would like to infer users preference for unrated movies. (impossible! in general) The problem is ill-posed. Intuitively, users preferences depend only on a few factors, i.e., rank(m) is small. Thus can be formulated as the low-rank matrix completion problem (affine rank minimization): X R m n min { rank(x) X ij = M ij, (i, j) Ω }, (NP hard!) where Ω is an index set of p observed entries.

12 Rank Minimization Nuclear norm minimization: { min X := X R m n m i=1 where σ i (X) s are singular values of X. } σ i (X) X ij = M ij, (i, j) Ω. a more general nuclear norm minimization problem: X R m n min { X : A(X) = b }. When the matrix variable is restricted to be diagonal, the above problem reduces to the following l 1 -minimization problem: x R n min { x 1 : Ax = b }.

13 Nuclear Norm Regularized Least Squares Problem If the observation b is contaminated with noise consider the following nuclear norm regularized least squares problem: where µ > 0 is a given parameter. 1 min X R m n 2 A(X) b µ X. appeared in many applications of engineering and science including collaborative filtering global positioning system identification remote sensing computer vision

14 Sparse Portfolio Selection How to allocate an investor s available capital into a prefixed set of assets with the aims of maximizing the expected return and minimizing the investment risk. Traditional Markowitz portfolio selection model: min x x T Qx subject to µ T x = β, e T x = 1, x 0, where β is the desired expected return of the portfolio. Modified models: min x x T Qx subject to µ T x = β, e T x = 1, x 0, x 0 K. min x x 0 subject to µ T x = β, x T Qx α, e T x = 1, x 0.

15 Sparse Covariance Selection Undirected graphical models offer a way to describe and explain relationships among a set of variables, central element of multivariate data analysis From a sample covariance matrix, wish to estimate the true covariance matrix, in which some of entries of its inverse are zero, by maximizing l 1 -regularized log-likelihood: where max X S n log detx subject to X ij = 0, (i, j) V, ˆΣ, X (i,j) V ρ ij X ij ˆΣ S n + is an empirical covariance matrix and ˆΣ is singular or nearly so May want to impose structural conditions on Σ 1, such as conditional independence, which is reflected as zero entries in Σ 1. V is a collection of all pairs of conditional independent nodes ρ ij > 0: parameter controlling the trade-off between the goodness-of-fit and the sparsity of X

16 Sparse Covariance Selection log det X + ˆΣ, X is strictly convex, cont. diff. on its domain S++, n O(n 3 ) opers. to evaluate. In applications, n can exceed Dual problem can be expressed: min X S n log det X n subject to (X ˆΣ) ij υ ij, i, j = 1,..., n, where υ ij = ρ ij for all (i, j) V and υ ij = for all (i, j) V.

17 Coordinate Descent Method When P 0. Given x R n, Choose i N = {1,..., n}. Update Repeat until convergence. x new = arg min f (u). u u j =x j j i Gauss-Seidel rule: Choose i cyclically, 1, 2,..., n, 1, 2,... Gauss-Southwell rule: Choose i with f x i (x) maximum.

18 Coordinate Descent Method Properties: If f convex, then every cluster point of the x-sequence is a minimizer. If f nonconvex, then G-Seidel can cycle 1 but G-Southwell still converges. Convergence is possible when P M. J. D. Powell, On search directions for minimization algorithms, Math. Program. 4, (1973), P. Tseng, Convergence of block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl. 109, (2001),

19 Coord. Gradient Descent Method Descent direction. For x domp, choose J ( ) N and H 0 n, Then solve min { f d d j =0 j J (x)t d d T Hd + P(x + d) P(x)} direc. subprob Let d H (x; J ) and q H (x; J ) be the opt. soln and obj. value of the direc. subprob. Facts: d H (x; N ) = 0 F (x; d) 0 d R n. H is diagonal d H (x; J ) = d H (x; j), q H (x; J ) = q H (x; j). j J j J q H (x; J ) 1 2 d T Hd where d = d H (x; J ).

20 Coord. Gradient Descent Method This coord. grad. descent approach may be viewed as a hybrid of gradient-projection and coordinate descent. In particular, { 0 if l x u if J = N and P(x) =, then d H (x; N ) is a scaled else gradient-projection direction for bound-constrained optimization. if f is quadratic and we choose H = 2 f (x), then d H (x; J ) is a (block) coordinate descent direction. If H is diagonal, then subproblems can be solved in parallel. If P 0, then d H (x) j = f (x) j /H jj. { 0 if l x u If P(x) =, then else d H (x) j = median{l j x j, f (x) j /H jj, u j x j }. If P is the 1-norm, then d H (x) j = median{( f (x) j λ)/h jj, x j, ( f (x) j + λ)/h jj }.

21 Coord. Gradient Descent Method Stepsize: Armijo rule Choose α to be the largest element of {β k } k=0,1,... satisfying F (x + αd) F(x) σαq H (x; J ) (0 < β < 1, 0 < σ < 1). For the l 1 -regularized linear least squares problem, the minimization rule α arg min{f (x + td) t 0} or the limited minimization rule α arg min{f(x + td) 0 t s}, where 0 < s <, can also be used.

22 Coord. Gradient Descent Method Choose J : Gauss-Seidel rule: J cycles through {1}, {2},..., {n}. Gauss-Southwell-r rule: d D (x; J ) υ d D (x; N ) where 0 < υ 1, D 0 n is diagonal (e.g., D = diag(h)). Gauss-Southwell-q rule: q D (x; J ) υ q D (x; N ), Where 0 < υ 1, D 0 n is diagonal (e.g., D = diag(h)).

23 Coord. Gradient Descent Method Advantage of CGD CGD method is simple, highly parallelizable, and is suited for solving large-scale problems. CGD not only has cheaper iterations than exact coordinate descent, it also has stronger global convergence properties.

24 Convergence Results Global convergence If 0 λi D, H λi, J is chosen by G-Seidel, G-Southwell-r, G-Southwell-q rule, is chosen by Armijo rule, then every cluster point of the x-sequence generated by CGD method is a stationary point of F.

25 Convergence Results Local convergence rate 0 λi D, H λi, If J is chosen by G-Seidel or Gauss-Southwell-q rule, α is chosen by Armijo rule, in addition, if P and f satisfy any of the following assumptions, then the x-sequence generated by CGD method converges at R-linear rate. C1 f is strongly convex, f is Lipschitz cont. on domp. C2 f is (nonconvex) quadratic. P is polyhedral. C3 f (x) = g(ex) + q T x, where E R m N, q R N, g is strongly convex, g is Lipschitz cont. on R m. P is polyhedral. C4 f (x) = max y Y {(Ex) T y g(y)} + q T x, where Y R m is polyhedral, E R m N, q R N, g is strongly convex, g is Lipschitz cont. on R m. P is polyhedral.

26 Convergence Results Complexity Bound If f is convex with Lipschitz cont. grad., then the number of iterations for achieving ɛ-optimality is Gauss-Seidel rule: ( n 2 Lr 0 ) O, ɛ where L is a Lipschitz constant and r 0 = max { dist(x, X ) 2 F(x) F (x 0 ) }. Gauss-Southwell-q rule: ( Lr 0 O {0, υɛ + max Lυ ( )}) e 0 ln r 0, where e 0 = F (x 0 ) min x X F (x).

27 Thank you! II. Incremental Gradient Method

28 Sum of several functions min x F (x) := f (x) + P(x), where c > 0, P : R n (, ] is a proper, convex, lower semicontinuous (lsc) function, and m f (x) := f i (x), where each function f i is real-valued and smooth (i.e., continuously differentiable) on R n. i=1

29 Sum of several functions In applications, m is often large (exceeding 10 4 ). In this case, traditional gradient methods would be inefficient since they require evaluating f i (x) for all i before updating x. Incremental gradient methods (IGM), in contrast, update x after evaluation of f i (x) for only one or a few i. In the unconstrained case P 0, this method has the basic form x k+1 = x k + α k f ik (x k ), k = 0, 1,..., where i k is chosen to cycle through 1,..., m (i.e., i 0 = 1, i 1 = 2,..., i m 1 = m, i m = 1,... ) and α k > 0.

30 Sum of several functions For global convergence of IGMs, stepsize α k (also called learning rate ) needs to diminish to zero, which can lead to slow convergence 3. If a constant stepsize is used, only convergence to an approximate solution can be shown 4. Methods 5 to overcome this difficulty were proposed. However, these methods need additional assumptions such as f i (x) = 0 for all i at a stationary point x to achieve global convergence without the stepsize tending to zero. Moreover, its extension to P 0 is problematic. 3 D. P. Bertsekas, A new class of incremental gradient methods for least squares problems, SIAM J. Optim., 7 (1997), M. V. Solodov, Incremental gradient algorithms with stepsizes bounded away from zero, Comput. Optim. Appl., 11 (1998), P. Tseng, An incremental gradient(-projection) method with momentum term and adaptive stepsize rule, SIAM J. Optim., 8 (1998),

31 Sum of several functions For the case of P 0, Blatt, Hero and Gauchman 6 proposed a method: g k = g k 1 + f ik (x k ) f ik (x k m ), x k+1 = x k αg k, with α > 0, g 1 = m i=1 f i(x i m 1 ), x 0, x 1,..., x m R n given. Computes the gradient of a single component function at each iteration. But instead of updating x using this gradient, it uses the sum of the m most recently computed gradients. This method requires more storage (O(mn) instead of O(n)) and slightly more communication/computation per iteration. 6 D. Blatt, A. O. Hero, and H. Gauchman, A convergent incremental gradient method with a constant step size, SIAM J. Optim., 18 (2007), pp

32 IGM: Constant Stepsize 0. Choose x 0, x 1, domp and α (0, 1]. Initialize k = 0. Go to Choose H k 0 and 0 τ k i k for i = 1,..., m, and compute g k, d k, and x k+1 by g k = m i=1 d k = arg min d R n f i (x τ k i ), x k+1 = x k + αd k. Increment k by 1 and return to 1. τ k i = { g k, d + 1 } 2 d, Hk d + P(x k + d), The method of Blatt et al. corresponds to the special case of P 0, H k = I, K = m 1, and { k if i = (k mod m) + 1; τ k 1 i otherwise, 1 i m, k m,

33 IGM: Constant Stepsize Assumption 1 (a) τi k k K for all i and k, where K 0 is an integer. (b) λi H k λi for all k, where 0 < λ λ. Assumption 2 f i (y) f i (z) L i y z y, z domp, for some L i 0, i = 1,..., m. Let L = m i=1 L i.

34 IGM: Constant Stepsize Global Convergence Let {x k }, {d k }, {H k } be sequences generated by Algorithm 1 under Assumptions 1 and 2, and with α < 2λ/(L(2K + 1)). Then {d k } 0 and every cluster point of {x k } is a stationary point.

35 IGM: Adaptive Stepsize 0. Choose x 0, x 1, domp, β (0, 1), σ > 1 2, and α (0, 1]. Initialize k = 0. Go to Choose H k 0 and 0 τ k i g k = m i=1 d k = arg min d R n f i (x τ k i ), k for i = 1,..., m, compute g k, d k by { g k, d + 1 } 2 d, Hk d + cp(x k + d). Choose α init k [α, 1] and let α k be the largest element of {α init k βj } j=0,1,... satisfying the descent-like condition k 1 F c (x k + α k d k ) F c (x k ) σkl α k d k 2 + L α j d j 2 2 j=(k K ) + and set x k+1 = x k + αd k. Increment k by 1 and return to 1.

36 IGM: Adaptive Stepsize Global Convergence Let {x k }, {d k }, {H k }, {α k } be sequences generated by Algorithm 2 under Assumptions 1 and 2. Then the following results hold. (a) For each k 0, the descent-like condition holds whenever α k ᾱ, where ᾱ = λ L(σK +K /2+1/2). (b) We have α k min{α, βᾱ} for all k. (c) {d k } 0 and every cluster point of {x k } is a stationary point.

37 IGM: 1-memory with Adaptive Stepsize 0. Choose x 0 domp. Initialize k = 0 and g 1 = 0. Go to Choose H k 0 and compute g k, d k, and x k+1 by g k k = k + 1 gk 1 + m k + 1 f i k (x k ) with i k = (k mod m) + 1, { d k = arg min g k, d + 1 } d R n 2 d, Hk d + cp(x k + d), x k+1 = x k + α k d k, with α k (0, 1]. Increment k by 1 and return to 1.

38 IGM: 1-memory with Adaptive Stepsize Assumption 3 (a) α k =. k=0 (b) lim l l j=0 (x 1 = x 0 ). j + 1 l + 1 δ j = 0, where δ j := max x k+i x k+m i=0,1,...,m k=jm 1

39 IGM: 1-memory with Adaptive Stepsize Global Convergence Let {x k }, {d k }, {H k }, {α k } be sequences generated by Algorithm 3 under Assumptions 1(b), 2, and 3. Then the following results hold. (a) { x k+1 x k } 0 and { f (x k ) g k } 0. (b) lim inf k d k = 0. (c) If {x k } is bounded, then there exists a cluster point of {x k } that is a stationary point.

40 Thank you! III. (Linearized) Alternating Direction Method of Multipliers

41 Total Variation Regularized Linear Least Squares Problem Image restorations such as image deconvolution, image inpainting and image denoising is often formulated as an inverse problem unknown true image u R n. b = Au + η, observed image (or measurements) b R l. η is a Gaussian noise. A R l n is a linear operator, typically a convolution operator in deconvolution, a projection in inpainting and the identity in denoising. The unknown images can be recovered by solving TV regularized linear least squares problem (Primal): 1 min x R n 2 Ax b µ x, where µ > 0 and x = n i=1 ( x) i 2 with ( x) i R 2.

42 Gaussian Noise Figure: denoising (a) original: (b) noised image (c) recovered image

43 Gaussian Noise Figure: deblurring (a) original: (b) motion blurred image (c) recovered image

44 Gaussian Noise Figure: inpainting (a) original: (b) scratched image (c) recovered image

45 Algorithms Saddle Point Formulation and Dual Problem 1. Dual Norm: min u max p 1 u, p Au b Convex Conjugate (Legendre-Fenchel transform) of J(u) = u : min sup p, u J (p) + 1 u p 2 Au b 2 2. where J (p) = sup z p, z J(z). 3. Lagrangian Function: max inf L 1(u, p, z). p u,z where L 1 (u, p, z) =: 1 2 Au b z + p, u z

46 Algorithms 4. Dual Problem: max p { inf p, u J (p) + 1 u 2 Au b 2 2 = J (p) H (div p) where H(u) = 1 2 Au b Lagrangian Function with y = div p: max inf L 2(u, p, y). u p,y }. where L 2 (u, p, y) =: J (p) + H (div p) + u, div p y

47 ADMM 7 E. Esser, X. Zhang, and T. Chan, A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science, SIAM J. Imaging Sci., J. Douglas and H. H. Rachford, On the numerical solution of heat conduction problems in two or three space variables, Trans. Amer. Math. Soc., 1956 Alternating Direction Method of Multipliers 7 Augmented Lagrangian Function: L α (u, p, z) =: 1 2 Au b z + p, u z + α 2 u z 2 2 u k+1 z k+1 = arg min u = arg min d p k+1 = p k + α(z k+1 u k+1 ) 1 2 Au b p k, u z k + α 2 u zk 2 2 z + p k, u k+1 z + α 2 uk+1 z 2 2 ADMM on primal (or 3) is equivalent to Douglas-Rachford 8 on dual (4). ADMM on dual (or 5) is equivalent to Douglas-Rachford on primal.

48 ADMM Alternating Minimization Algorithm & Forward-Backward Splitting Method AMA 9 on primal (or 3) u k+1 z k+1 = arg min u 1 2 Au b p k, u z k = arg min d p k+1 = p k + α(z k+1 u k+1 ) z + p k, u k+1 z + α 2 uk+1 z 2 2 FBS 10 on dual (4) u k+1 p k+1 = arg min u = arg min d 1 2 Au b p k, u J (p) p, u k α p pk 2 2 FBS on dual is equivalent to AMA on primal (or 3). 9 P. Tseng, Applications of a splitting algorithm to decomposition in convex programming and variational inequalities, SIAM J. Control and Optimization, P. L. Combettes and V. R. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Model. Simul., 2005

49 Proximal Splitting Method Linearly Constrained Separable Convex Programming Problem: min x,y {f (x) + g(y) Qx = y}. Augmented Lagrangian Function: L α (x, y, z) := f (x) + g(y) + z, y Qx + α 2 Qx y 2 2. x k+1 = arg min L α (x, y k, z k ) u y k+1 = arg min L α (x k+1, y, z k ) z z k+1 = z k + α(qx k+1 y k+1 )

50 Proximal Splitting Method If f (x) = 1 2 Ax b 2 2, x k+1 = (A T A + αq T Q) 1 (A T b + Q T z k + αq T y k ). If f (x) = 1, Ax b, log(ax) Need inner solver or inversion involving Q and/or A. If f (x) = x + be x, 1 Need inner solver.

51 Poisson Noise Figure: deblurring (a) original (b) blurred image (c) recovered image

52 Multiplicative Noise

53 Multiplicative Noise Copyright : Sandia Nat. Lab. (

54 Multiplicative Noise 4M (2 22 ) 20sec.

55 Multiplicative Noise

56 Multiplicative Noise

57 Multiplicative Noise

58 Proximal Splitting Method In order to avoid any inner iterations or inversions involving Laplacian operator required in algorithms based on augment Lagrangian. Alternating minimization algorithm u k+1 = L 0 (x, y k, z k ) z k+1 = L α (x k+1, y, z k ) p k+1 = p k + α(z k+1 u k+1 ) Linearized augmented Lagrangian with proximal function u k+1 = LL α (x, y k, z k ) z k+1 = L α (x k+1, y, z k ) p k+1 = p k + α(z k+1 u k+1 ) LL α (x, x k, y k, z z ) := x f (x k ), x x k + g(y k ) + z k, y k Qx + αq T (Qx k y k ), x x k + 1 2δ x x k 2 2

59 Thank you! Thank you!

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Solving DC Programs that Promote Group 1-Sparsity

Solving DC Programs that Promote Group 1-Sparsity Solving DC Programs that Promote Group 1-Sparsity Ernie Esser Contains joint work with Xiaoqun Zhang, Yifei Lou and Jack Xin SIAM Conference on Imaging Science Hong Kong Baptist University May 14 2014

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Sparse and Regularized Optimization

Sparse and Regularized Optimization Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

A Coordinate Gradient Descent Method for l 1 -regularized Convex Minimization

A Coordinate Gradient Descent Method for l 1 -regularized Convex Minimization A Coordinate Gradient Descent Method for l 1 -regularized Convex Minimization Sangwoon Yun, and Kim-Chuan Toh January 30, 2009 Abstract In applications such as signal processing and statistics, many problems

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1

SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1 SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1 Masao Fukushima 2 July 17 2010; revised February 4 2011 Abstract We present an SOR-type algorithm and a

More information

A Coordinate Gradient Descent Method for Nonsmooth Nonseparable Minimization

A Coordinate Gradient Descent Method for Nonsmooth Nonseparable Minimization A Coordinate Gradient Descent Method for Nonsmooth Nonseparable Minimization Zheng-Jian Bai Michael K. Ng Liqun Qi September 3, 2009 Abstract This paper presents a coordinate gradient descent approach

More information

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient (PDHG) algorithm proposed

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

A Coordinate Gradient Descent Method for l 1 -regularized Convex Minimization

A Coordinate Gradient Descent Method for l 1 -regularized Convex Minimization A Coordinate Gradient Descent Method for l 1 -regularized Convex Minimization Sangwoon Yun, and Kim-Chuan Toh April 16, 2008 Abstract In applications such as signal processing and statistics, many problems

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between

More information

Variational Image Restoration

Variational Image Restoration Variational Image Restoration Yuling Jiao yljiaostatistics@znufe.edu.cn School of and Statistics and Mathematics ZNUFE Dec 30, 2014 Outline 1 1 Classical Variational Restoration Models and Algorithms 1.1

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Frist order optimization methods for sparse inverse covariance selection

Frist order optimization methods for sparse inverse covariance selection Frist order optimization methods for sparse inverse covariance selection Katya Scheinberg Lehigh University ISE Department (joint work with D. Goldfarb, Sh. Ma, I. Rish) Introduction l l l l l l The field

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration E. Chouzenoux, A. Jezierska, J.-C. Pesquet and H. Talbot Université Paris-Est Lab. d Informatique Gaspard

More information

Self-dual Smooth Approximations of Convex Functions via the Proximal Average

Self-dual Smooth Approximations of Convex Functions via the Proximal Average Chapter Self-dual Smooth Approximations of Convex Functions via the Proximal Average Heinz H. Bauschke, Sarah M. Moffat, and Xianfu Wang Abstract The proximal average of two convex functions has proven

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

Image restoration: numerical optimisation

Image restoration: numerical optimisation Image restoration: numerical optimisation Short and partial presentation Jean-François Giovannelli Groupe Signal Image Laboratoire de l Intégration du Matériau au Système Univ. Bordeaux CNRS BINP / 6 Context

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine CS295: Convex Optimization Xiaohui Xie Department of Computer Science University of California, Irvine Course information Prerequisites: multivariate calculus and linear algebra Textbook: Convex Optimization

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Coordinate descent methods

Coordinate descent methods Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Convex Optimization and Support Vector Machine

Convex Optimization and Support Vector Machine Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b. Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Investigating the Influence of Box-Constraints on the Solution of a Total Variation Model via an Efficient Primal-Dual Method

Investigating the Influence of Box-Constraints on the Solution of a Total Variation Model via an Efficient Primal-Dual Method Article Investigating the Influence of Box-Constraints on the Solution of a Total Variation Model via an Efficient Primal-Dual Method Andreas Langer Department of Mathematics, University of Stuttgart,

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with H. Bauschke and J. Bolte Optimization

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information