First-order methods for structured nonsmooth optimization
|
|
- Irene Jacobs
- 5 years ago
- Views:
Transcription
1 First-order methods for structured nonsmooth optimization Sangwoon Yun Department of Mathematics Education Sungkyunkwan University Oct 19, 2016 Center for Mathematical Analysis & Computation, Yonsei University
2 Outline I. Coordinate (Gradient) Descent Method II. Incremental Gradient Method III. (Linearized) Alternating Direction Method of Multipliers
3 Thank you! I. Coordinate (Gradient) Descent Method
4 Structured Nonsmooth Optimization min x F (X) = f (x) + P(x), f : real-valued, (convex) smooth on domf P: proper, convex, lsc In particular, P: separable i.e., P(x) = n j=1 P j(x j ) Bound constraint: P(x) = where l u (possibly with or ). l 1 -norm: P(x) = λ x 1 with λ > 0 { 0 if l x u; else, or indicator function of linear constraints (Ax = b).
5 Bound-constrained Optimization min l x u f (x), where f : R N R is smooth, l u (possibly with or components). Can be reformulated as the following unconstrained optimization problem: min x { 0 if l x u where P(x) =. else f (x) + P(x),
6 l 1 -regularized Convex Minimization 1. l 1 -regularized linear least squares problem Find x so that Ax b 0 and x has few nonzeros. Formulate this as an unconstrained convex optimization problem: min x R n Ax b λ x 1 2. l 1 -regularized logistic regression problem min w R n 1,v R 1 m m log(1 + exp( (w T a i + vb i ))) + λ w 1, i=1 where a i = b i z i and (z i, b i ) R n 1 { 1, 1}, i = 1,..., m are a given set of (observed or training) examples.
7 Support Vector Machines
8 Support Vector Machines Support Vector Classification Training points : z i R p, i = 1,..., n. Consider a simple case with two classes (linear separable case): Define a vector a : a i = { 1 if z i in class 1 1 if z i in class 2 A hyperplane (0 = w T z b) separates data with the maximal margin. Margin is the distance of the hyperplane to the nearest of the positive and negative points. Nearest points lie on the planes ±1 = w T z b
9 Support Vector Machines Convex Quadratic Programming Problem: Support Vector Classification The (original) Optimization Problem min a,b 1 2 a 2 2 subject to z i ( a T x i b ) 1, i = 1,..., n. The Modified Optimization Problem (allows, but penalizes, the failure of a point to reach the correct margin) min a,b,ξ 1 2 a C n i=1 ξ i subject to z i ( a T x i b ) 1 ξ i, ξ i 0, i = 1,..., n.
10 Support Vector Machines SVM (Dual) Optimization Problem (Convex Quadratic Program) min x 1 2 x T Qx e T x subject to 0 x i C, i = 1,..., n, a T x = 0, where a { 1, 1} n, 0 < C, e = [1,..., 1] T, Q R n n is a sym. pos. semidef. with Q ij = a i a j K (z i, z j ), K : R p R p R ( kernel function ), and z i R p ( ith data point ), i = 1,..., n. Popular Choices of K : linear kernel K (z i, z j ) = z T i z j radial basis function kernel K (z i, z j ) = exp( γ z i z j 2 2 ) sigmoid kernel K (z i, z j ) = tanh(γz T i z j ) where γ is a constant. Q is an n n fully dense matrix and even indefinite. (n 5000)
11 Rank Minimization Now, imagine that we only observe a few entries of a data matrix. Then is it possible to accurately guess the entries that we have not seen? Netflix problem: Given a sparse matrix where M ij is the rating given by user i on movie j, predict the rating a user would assign to a movie he has not seen, i.e., we would like to infer users preference for unrated movies. (impossible! in general) The problem is ill-posed. Intuitively, users preferences depend only on a few factors, i.e., rank(m) is small. Thus can be formulated as the low-rank matrix completion problem (affine rank minimization): X R m n min { rank(x) X ij = M ij, (i, j) Ω }, (NP hard!) where Ω is an index set of p observed entries.
12 Rank Minimization Nuclear norm minimization: { min X := X R m n m i=1 where σ i (X) s are singular values of X. } σ i (X) X ij = M ij, (i, j) Ω. a more general nuclear norm minimization problem: X R m n min { X : A(X) = b }. When the matrix variable is restricted to be diagonal, the above problem reduces to the following l 1 -minimization problem: x R n min { x 1 : Ax = b }.
13 Nuclear Norm Regularized Least Squares Problem If the observation b is contaminated with noise consider the following nuclear norm regularized least squares problem: where µ > 0 is a given parameter. 1 min X R m n 2 A(X) b µ X. appeared in many applications of engineering and science including collaborative filtering global positioning system identification remote sensing computer vision
14 Sparse Portfolio Selection How to allocate an investor s available capital into a prefixed set of assets with the aims of maximizing the expected return and minimizing the investment risk. Traditional Markowitz portfolio selection model: min x x T Qx subject to µ T x = β, e T x = 1, x 0, where β is the desired expected return of the portfolio. Modified models: min x x T Qx subject to µ T x = β, e T x = 1, x 0, x 0 K. min x x 0 subject to µ T x = β, x T Qx α, e T x = 1, x 0.
15 Sparse Covariance Selection Undirected graphical models offer a way to describe and explain relationships among a set of variables, central element of multivariate data analysis From a sample covariance matrix, wish to estimate the true covariance matrix, in which some of entries of its inverse are zero, by maximizing l 1 -regularized log-likelihood: where max X S n log detx subject to X ij = 0, (i, j) V, ˆΣ, X (i,j) V ρ ij X ij ˆΣ S n + is an empirical covariance matrix and ˆΣ is singular or nearly so May want to impose structural conditions on Σ 1, such as conditional independence, which is reflected as zero entries in Σ 1. V is a collection of all pairs of conditional independent nodes ρ ij > 0: parameter controlling the trade-off between the goodness-of-fit and the sparsity of X
16 Sparse Covariance Selection log det X + ˆΣ, X is strictly convex, cont. diff. on its domain S++, n O(n 3 ) opers. to evaluate. In applications, n can exceed Dual problem can be expressed: min X S n log det X n subject to (X ˆΣ) ij υ ij, i, j = 1,..., n, where υ ij = ρ ij for all (i, j) V and υ ij = for all (i, j) V.
17 Coordinate Descent Method When P 0. Given x R n, Choose i N = {1,..., n}. Update Repeat until convergence. x new = arg min f (u). u u j =x j j i Gauss-Seidel rule: Choose i cyclically, 1, 2,..., n, 1, 2,... Gauss-Southwell rule: Choose i with f x i (x) maximum.
18 Coordinate Descent Method Properties: If f convex, then every cluster point of the x-sequence is a minimizer. If f nonconvex, then G-Seidel can cycle 1 but G-Southwell still converges. Convergence is possible when P M. J. D. Powell, On search directions for minimization algorithms, Math. Program. 4, (1973), P. Tseng, Convergence of block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl. 109, (2001),
19 Coord. Gradient Descent Method Descent direction. For x domp, choose J ( ) N and H 0 n, Then solve min { f d d j =0 j J (x)t d d T Hd + P(x + d) P(x)} direc. subprob Let d H (x; J ) and q H (x; J ) be the opt. soln and obj. value of the direc. subprob. Facts: d H (x; N ) = 0 F (x; d) 0 d R n. H is diagonal d H (x; J ) = d H (x; j), q H (x; J ) = q H (x; j). j J j J q H (x; J ) 1 2 d T Hd where d = d H (x; J ).
20 Coord. Gradient Descent Method This coord. grad. descent approach may be viewed as a hybrid of gradient-projection and coordinate descent. In particular, { 0 if l x u if J = N and P(x) =, then d H (x; N ) is a scaled else gradient-projection direction for bound-constrained optimization. if f is quadratic and we choose H = 2 f (x), then d H (x; J ) is a (block) coordinate descent direction. If H is diagonal, then subproblems can be solved in parallel. If P 0, then d H (x) j = f (x) j /H jj. { 0 if l x u If P(x) =, then else d H (x) j = median{l j x j, f (x) j /H jj, u j x j }. If P is the 1-norm, then d H (x) j = median{( f (x) j λ)/h jj, x j, ( f (x) j + λ)/h jj }.
21 Coord. Gradient Descent Method Stepsize: Armijo rule Choose α to be the largest element of {β k } k=0,1,... satisfying F (x + αd) F(x) σαq H (x; J ) (0 < β < 1, 0 < σ < 1). For the l 1 -regularized linear least squares problem, the minimization rule α arg min{f (x + td) t 0} or the limited minimization rule α arg min{f(x + td) 0 t s}, where 0 < s <, can also be used.
22 Coord. Gradient Descent Method Choose J : Gauss-Seidel rule: J cycles through {1}, {2},..., {n}. Gauss-Southwell-r rule: d D (x; J ) υ d D (x; N ) where 0 < υ 1, D 0 n is diagonal (e.g., D = diag(h)). Gauss-Southwell-q rule: q D (x; J ) υ q D (x; N ), Where 0 < υ 1, D 0 n is diagonal (e.g., D = diag(h)).
23 Coord. Gradient Descent Method Advantage of CGD CGD method is simple, highly parallelizable, and is suited for solving large-scale problems. CGD not only has cheaper iterations than exact coordinate descent, it also has stronger global convergence properties.
24 Convergence Results Global convergence If 0 λi D, H λi, J is chosen by G-Seidel, G-Southwell-r, G-Southwell-q rule, is chosen by Armijo rule, then every cluster point of the x-sequence generated by CGD method is a stationary point of F.
25 Convergence Results Local convergence rate 0 λi D, H λi, If J is chosen by G-Seidel or Gauss-Southwell-q rule, α is chosen by Armijo rule, in addition, if P and f satisfy any of the following assumptions, then the x-sequence generated by CGD method converges at R-linear rate. C1 f is strongly convex, f is Lipschitz cont. on domp. C2 f is (nonconvex) quadratic. P is polyhedral. C3 f (x) = g(ex) + q T x, where E R m N, q R N, g is strongly convex, g is Lipschitz cont. on R m. P is polyhedral. C4 f (x) = max y Y {(Ex) T y g(y)} + q T x, where Y R m is polyhedral, E R m N, q R N, g is strongly convex, g is Lipschitz cont. on R m. P is polyhedral.
26 Convergence Results Complexity Bound If f is convex with Lipschitz cont. grad., then the number of iterations for achieving ɛ-optimality is Gauss-Seidel rule: ( n 2 Lr 0 ) O, ɛ where L is a Lipschitz constant and r 0 = max { dist(x, X ) 2 F(x) F (x 0 ) }. Gauss-Southwell-q rule: ( Lr 0 O {0, υɛ + max Lυ ( )}) e 0 ln r 0, where e 0 = F (x 0 ) min x X F (x).
27 Thank you! II. Incremental Gradient Method
28 Sum of several functions min x F (x) := f (x) + P(x), where c > 0, P : R n (, ] is a proper, convex, lower semicontinuous (lsc) function, and m f (x) := f i (x), where each function f i is real-valued and smooth (i.e., continuously differentiable) on R n. i=1
29 Sum of several functions In applications, m is often large (exceeding 10 4 ). In this case, traditional gradient methods would be inefficient since they require evaluating f i (x) for all i before updating x. Incremental gradient methods (IGM), in contrast, update x after evaluation of f i (x) for only one or a few i. In the unconstrained case P 0, this method has the basic form x k+1 = x k + α k f ik (x k ), k = 0, 1,..., where i k is chosen to cycle through 1,..., m (i.e., i 0 = 1, i 1 = 2,..., i m 1 = m, i m = 1,... ) and α k > 0.
30 Sum of several functions For global convergence of IGMs, stepsize α k (also called learning rate ) needs to diminish to zero, which can lead to slow convergence 3. If a constant stepsize is used, only convergence to an approximate solution can be shown 4. Methods 5 to overcome this difficulty were proposed. However, these methods need additional assumptions such as f i (x) = 0 for all i at a stationary point x to achieve global convergence without the stepsize tending to zero. Moreover, its extension to P 0 is problematic. 3 D. P. Bertsekas, A new class of incremental gradient methods for least squares problems, SIAM J. Optim., 7 (1997), M. V. Solodov, Incremental gradient algorithms with stepsizes bounded away from zero, Comput. Optim. Appl., 11 (1998), P. Tseng, An incremental gradient(-projection) method with momentum term and adaptive stepsize rule, SIAM J. Optim., 8 (1998),
31 Sum of several functions For the case of P 0, Blatt, Hero and Gauchman 6 proposed a method: g k = g k 1 + f ik (x k ) f ik (x k m ), x k+1 = x k αg k, with α > 0, g 1 = m i=1 f i(x i m 1 ), x 0, x 1,..., x m R n given. Computes the gradient of a single component function at each iteration. But instead of updating x using this gradient, it uses the sum of the m most recently computed gradients. This method requires more storage (O(mn) instead of O(n)) and slightly more communication/computation per iteration. 6 D. Blatt, A. O. Hero, and H. Gauchman, A convergent incremental gradient method with a constant step size, SIAM J. Optim., 18 (2007), pp
32 IGM: Constant Stepsize 0. Choose x 0, x 1, domp and α (0, 1]. Initialize k = 0. Go to Choose H k 0 and 0 τ k i k for i = 1,..., m, and compute g k, d k, and x k+1 by g k = m i=1 d k = arg min d R n f i (x τ k i ), x k+1 = x k + αd k. Increment k by 1 and return to 1. τ k i = { g k, d + 1 } 2 d, Hk d + P(x k + d), The method of Blatt et al. corresponds to the special case of P 0, H k = I, K = m 1, and { k if i = (k mod m) + 1; τ k 1 i otherwise, 1 i m, k m,
33 IGM: Constant Stepsize Assumption 1 (a) τi k k K for all i and k, where K 0 is an integer. (b) λi H k λi for all k, where 0 < λ λ. Assumption 2 f i (y) f i (z) L i y z y, z domp, for some L i 0, i = 1,..., m. Let L = m i=1 L i.
34 IGM: Constant Stepsize Global Convergence Let {x k }, {d k }, {H k } be sequences generated by Algorithm 1 under Assumptions 1 and 2, and with α < 2λ/(L(2K + 1)). Then {d k } 0 and every cluster point of {x k } is a stationary point.
35 IGM: Adaptive Stepsize 0. Choose x 0, x 1, domp, β (0, 1), σ > 1 2, and α (0, 1]. Initialize k = 0. Go to Choose H k 0 and 0 τ k i g k = m i=1 d k = arg min d R n f i (x τ k i ), k for i = 1,..., m, compute g k, d k by { g k, d + 1 } 2 d, Hk d + cp(x k + d). Choose α init k [α, 1] and let α k be the largest element of {α init k βj } j=0,1,... satisfying the descent-like condition k 1 F c (x k + α k d k ) F c (x k ) σkl α k d k 2 + L α j d j 2 2 j=(k K ) + and set x k+1 = x k + αd k. Increment k by 1 and return to 1.
36 IGM: Adaptive Stepsize Global Convergence Let {x k }, {d k }, {H k }, {α k } be sequences generated by Algorithm 2 under Assumptions 1 and 2. Then the following results hold. (a) For each k 0, the descent-like condition holds whenever α k ᾱ, where ᾱ = λ L(σK +K /2+1/2). (b) We have α k min{α, βᾱ} for all k. (c) {d k } 0 and every cluster point of {x k } is a stationary point.
37 IGM: 1-memory with Adaptive Stepsize 0. Choose x 0 domp. Initialize k = 0 and g 1 = 0. Go to Choose H k 0 and compute g k, d k, and x k+1 by g k k = k + 1 gk 1 + m k + 1 f i k (x k ) with i k = (k mod m) + 1, { d k = arg min g k, d + 1 } d R n 2 d, Hk d + cp(x k + d), x k+1 = x k + α k d k, with α k (0, 1]. Increment k by 1 and return to 1.
38 IGM: 1-memory with Adaptive Stepsize Assumption 3 (a) α k =. k=0 (b) lim l l j=0 (x 1 = x 0 ). j + 1 l + 1 δ j = 0, where δ j := max x k+i x k+m i=0,1,...,m k=jm 1
39 IGM: 1-memory with Adaptive Stepsize Global Convergence Let {x k }, {d k }, {H k }, {α k } be sequences generated by Algorithm 3 under Assumptions 1(b), 2, and 3. Then the following results hold. (a) { x k+1 x k } 0 and { f (x k ) g k } 0. (b) lim inf k d k = 0. (c) If {x k } is bounded, then there exists a cluster point of {x k } that is a stationary point.
40 Thank you! III. (Linearized) Alternating Direction Method of Multipliers
41 Total Variation Regularized Linear Least Squares Problem Image restorations such as image deconvolution, image inpainting and image denoising is often formulated as an inverse problem unknown true image u R n. b = Au + η, observed image (or measurements) b R l. η is a Gaussian noise. A R l n is a linear operator, typically a convolution operator in deconvolution, a projection in inpainting and the identity in denoising. The unknown images can be recovered by solving TV regularized linear least squares problem (Primal): 1 min x R n 2 Ax b µ x, where µ > 0 and x = n i=1 ( x) i 2 with ( x) i R 2.
42 Gaussian Noise Figure: denoising (a) original: (b) noised image (c) recovered image
43 Gaussian Noise Figure: deblurring (a) original: (b) motion blurred image (c) recovered image
44 Gaussian Noise Figure: inpainting (a) original: (b) scratched image (c) recovered image
45 Algorithms Saddle Point Formulation and Dual Problem 1. Dual Norm: min u max p 1 u, p Au b Convex Conjugate (Legendre-Fenchel transform) of J(u) = u : min sup p, u J (p) + 1 u p 2 Au b 2 2. where J (p) = sup z p, z J(z). 3. Lagrangian Function: max inf L 1(u, p, z). p u,z where L 1 (u, p, z) =: 1 2 Au b z + p, u z
46 Algorithms 4. Dual Problem: max p { inf p, u J (p) + 1 u 2 Au b 2 2 = J (p) H (div p) where H(u) = 1 2 Au b Lagrangian Function with y = div p: max inf L 2(u, p, y). u p,y }. where L 2 (u, p, y) =: J (p) + H (div p) + u, div p y
47 ADMM 7 E. Esser, X. Zhang, and T. Chan, A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science, SIAM J. Imaging Sci., J. Douglas and H. H. Rachford, On the numerical solution of heat conduction problems in two or three space variables, Trans. Amer. Math. Soc., 1956 Alternating Direction Method of Multipliers 7 Augmented Lagrangian Function: L α (u, p, z) =: 1 2 Au b z + p, u z + α 2 u z 2 2 u k+1 z k+1 = arg min u = arg min d p k+1 = p k + α(z k+1 u k+1 ) 1 2 Au b p k, u z k + α 2 u zk 2 2 z + p k, u k+1 z + α 2 uk+1 z 2 2 ADMM on primal (or 3) is equivalent to Douglas-Rachford 8 on dual (4). ADMM on dual (or 5) is equivalent to Douglas-Rachford on primal.
48 ADMM Alternating Minimization Algorithm & Forward-Backward Splitting Method AMA 9 on primal (or 3) u k+1 z k+1 = arg min u 1 2 Au b p k, u z k = arg min d p k+1 = p k + α(z k+1 u k+1 ) z + p k, u k+1 z + α 2 uk+1 z 2 2 FBS 10 on dual (4) u k+1 p k+1 = arg min u = arg min d 1 2 Au b p k, u J (p) p, u k α p pk 2 2 FBS on dual is equivalent to AMA on primal (or 3). 9 P. Tseng, Applications of a splitting algorithm to decomposition in convex programming and variational inequalities, SIAM J. Control and Optimization, P. L. Combettes and V. R. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Model. Simul., 2005
49 Proximal Splitting Method Linearly Constrained Separable Convex Programming Problem: min x,y {f (x) + g(y) Qx = y}. Augmented Lagrangian Function: L α (x, y, z) := f (x) + g(y) + z, y Qx + α 2 Qx y 2 2. x k+1 = arg min L α (x, y k, z k ) u y k+1 = arg min L α (x k+1, y, z k ) z z k+1 = z k + α(qx k+1 y k+1 )
50 Proximal Splitting Method If f (x) = 1 2 Ax b 2 2, x k+1 = (A T A + αq T Q) 1 (A T b + Q T z k + αq T y k ). If f (x) = 1, Ax b, log(ax) Need inner solver or inversion involving Q and/or A. If f (x) = x + be x, 1 Need inner solver.
51 Poisson Noise Figure: deblurring (a) original (b) blurred image (c) recovered image
52 Multiplicative Noise
53 Multiplicative Noise Copyright : Sandia Nat. Lab. (
54 Multiplicative Noise 4M (2 22 ) 20sec.
55 Multiplicative Noise
56 Multiplicative Noise
57 Multiplicative Noise
58 Proximal Splitting Method In order to avoid any inner iterations or inversions involving Laplacian operator required in algorithms based on augment Lagrangian. Alternating minimization algorithm u k+1 = L 0 (x, y k, z k ) z k+1 = L α (x k+1, y, z k ) p k+1 = p k + α(z k+1 u k+1 ) Linearized augmented Lagrangian with proximal function u k+1 = LL α (x, y k, z k ) z k+1 = L α (x k+1, y, z k ) p k+1 = p k + α(z k+1 u k+1 ) LL α (x, x k, y k, z z ) := x f (x k ), x x k + g(y k ) + z k, y k Qx + αq T (Qx k y k ), x x k + 1 2δ x x k 2 2
59 Thank you! Thank you!
Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationA General Framework for a Class of Primal-Dual Algorithms for TV Minimization
A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More informationAdaptive Primal Dual Optimization for Image Processing and Learning
Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationAn accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems
An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationSolving DC Programs that Promote Group 1-Sparsity
Solving DC Programs that Promote Group 1-Sparsity Ernie Esser Contains joint work with Xiaoqun Zhang, Yifei Lou and Jack Xin SIAM Conference on Imaging Science Hong Kong Baptist University May 14 2014
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationDistributed Optimization via Alternating Direction Method of Multipliers
Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition
More informationA Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices
A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationSparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More informationSparse and Regularized Optimization
Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity
More informationOptimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method
Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors
More informationRecent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables
Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop
More informationA Coordinate Gradient Descent Method for l 1 -regularized Convex Minimization
A Coordinate Gradient Descent Method for l 1 -regularized Convex Minimization Sangwoon Yun, and Kim-Chuan Toh January 30, 2009 Abstract In applications such as signal processing and statistics, many problems
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationSOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1
SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1 Masao Fukushima 2 July 17 2010; revised February 4 2011 Abstract We present an SOR-type algorithm and a
More informationA Coordinate Gradient Descent Method for Nonsmooth Nonseparable Minimization
A Coordinate Gradient Descent Method for Nonsmooth Nonseparable Minimization Zheng-Jian Bai Michael K. Ng Liqun Qi September 3, 2009 Abstract This paper presents a coordinate gradient descent approach
More informationSplitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches
Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationMinimizing the Difference of L 1 and L 2 Norms with Applications
1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationA GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION
A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient (PDHG) algorithm proposed
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationInexact Alternating Direction Method of Multipliers for Separable Convex Optimization
Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationA GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE
A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationA Coordinate Gradient Descent Method for l 1 -regularized Convex Minimization
A Coordinate Gradient Descent Method for l 1 -regularized Convex Minimization Sangwoon Yun, and Kim-Chuan Toh April 16, 2008 Abstract In applications such as signal processing and statistics, many problems
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationSEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS
SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between
More informationVariational Image Restoration
Variational Image Restoration Yuling Jiao yljiaostatistics@znufe.edu.cn School of and Statistics and Mathematics ZNUFE Dec 30, 2014 Outline 1 1 Classical Variational Restoration Models and Algorithms 1.1
More informationAbout Split Proximal Algorithms for the Q-Lasso
Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationFrist order optimization methods for sparse inverse covariance selection
Frist order optimization methods for sparse inverse covariance selection Katya Scheinberg Lehigh University ISE Department (joint work with D. Goldfarb, Sh. Ma, I. Rish) Introduction l l l l l l The field
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationEE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)
EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationA memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration
A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration E. Chouzenoux, A. Jezierska, J.-C. Pesquet and H. Talbot Université Paris-Est Lab. d Informatique Gaspard
More informationSelf-dual Smooth Approximations of Convex Functions via the Proximal Average
Chapter Self-dual Smooth Approximations of Convex Functions via the Proximal Average Heinz H. Bauschke, Sarah M. Moffat, and Xianfu Wang Abstract The proximal average of two convex functions has proven
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationADMM and Fast Gradient Methods for Distributed Optimization
ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work
More informationImage restoration: numerical optimisation
Image restoration: numerical optimisation Short and partial presentation Jean-François Giovannelli Groupe Signal Image Laboratoire de l Intégration du Matériau au Système Univ. Bordeaux CNRS BINP / 6 Context
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationOptimization for Learning and Big Data
Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for
More informationCS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine
CS295: Convex Optimization Xiaohui Xie Department of Computer Science University of California, Irvine Course information Prerequisites: multivariate calculus and linear algebra Textbook: Convex Optimization
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationCoordinate descent methods
Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationA derivative-free nonmonotone line search and its application to the spectral residual method
IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationConvex Optimization and Support Vector Machine
Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We
More informationLecture 3. Optimization Problems and Iterative Algorithms
Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex
More informationACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING
ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely
More informationConjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.
Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More information2 Regularized Image Reconstruction for Compressive Imaging and Beyond
EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationInvestigating the Influence of Box-Constraints on the Solution of a Total Variation Model via an Efficient Primal-Dual Method
Article Investigating the Influence of Box-Constraints on the Solution of a Total Variation Model via an Efficient Primal-Dual Method Andreas Langer Department of Mathematics, University of Stuttgart,
More informationLarge-scale Stochastic Optimization
Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS
ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationMS&E 318 (CME 338) Large-Scale Numerical Optimization
Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods
More informationA New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction
A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with H. Bauschke and J. Bolte Optimization
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationBeyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory
Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics
More informationLecture Notes on Support Vector Machine
Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More information