Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学

Size: px
Start display at page:

Download "Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学"

Transcription

1 Linearized Alternating Direction Method: Two Blocks and Multiple Blocks Zhouchen Lin 林宙辰北京大学 Dec. 3, 014

2 Outline Alternating Direction Method (ADM) Linearized Alternating Direction Method (LADM) Two Blocks LADM Multiple Blocks Proximal LADM Multiple Blocks Conclusions Lin et al., Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation, NIPS 011. Lin et al., Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning, ML, 015.

3 Background Optimization is everywhere Compressed Sensing: x kxk 1 ; s:t: Ax = b: RPCA w/ Missing Value: kak + kek 1 ; s:t: ¼ (A + E) = d: LASSO: x kax bk ; s:t: kxk 1 <= ": Image Restoration: x kax bk + jjrxjj 1 ; s:t: 0 <= x <= 55: Covariance Selection: X s:t: tr( X) log(det(x)) + ½eT jxje; X S n ; where S n = fx < 0j I 4 X 4 max Ig Pose Estimation: Q tr(wq); s:t: tr(a iq) = 0; i = 1; ; m; Q < 0; rank(q) <= 1:

4 Alternating Direction Method Model Problem: (ADM) x 1 ;x f 1 (x 1 ) + f (x ); s:t: A 1 (x 1 ) + A (x ) = b; where f i are convex functions and A i are linear mappings. ~L(x 1 ; x ; ) = f 1 (x 1 ) + f (x ) + h ; A 1 (x 1 ) + A (x ) bi + ka 1(x 1 ) + A (x ) bk F ; x k+1 1 = arg L(x1 ~ ; x k x 1 ; k); x k+1 = arg L(x ~ k+1 x 1 ; x ; k); k+1 = k + k[a 1 (x k+1 1 ) + A (x k+1 ) b]: Update k Augmented Lagrangian Function Assume: Easy

5 Linearized Alternating Direction Method (LADM) x k+1 1 = arg f 1 (x 1 ) + k x ka 1 (x 1 ) + A (x k 1 ) b + k= kk ; x k+1 = arg f (x ) + k ka (x k+1 1 ) + A (x ) b + k= kk x x Proximal Operation f 1 (x) + kx wk F f (x) + x kx wk F arg x kxk 1 + kx wk = T 1(w); T " (x) = sgn(x) max(jxj "; 0): T ε (x) x arg X kxk + " kx Wk F = " 1(W) = UT " 1(S)V T ; where W = USV T is the singular value decomposition (SVD) of W. Lin et al., Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation, NIPS 011.

6 Linearized Alternating Direction Introducing auxiliary variables: Method (LADM) x 1 ;x ;x 3 ;x 4 f 1 (x 1 ) + f (x ); s:t: x 1 = x 3 ; x = x 4 ; A 1 (x 3 ) + A (x 4 ) = b: ~L(x 1 ; x ; x 3 ; x 4 ; 1; ; 3) = f 1 (x 1 ) + f (x ) + h 1; x 1 x 3 i + h ; x x 4 i + h 3; A 1 (x 3 ) + A (x 4 ) bi + kx1 x 3 k F + kx x 4 k F + ka 1(x 3 ) + A (x 4 ) bk F ; Three drawbacks: 1. More blocks! more memory & slower convergence.. Matrix inversion is expensive. 3. Convergence is NOT guaranteed! Lin et al., Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation, NIPS 011.

7 Linearized Alternating Direction Method (LADM) x k+1 1 = arg f 1 (x 1 ) + k x ka 1 (x 1 ) + A (x k 1 ) b + k= kk ; x k+1 = arg f (x ) + k ka (x k+1 1 ) + A (x ) b + k= kk x x f 1 (x) + kx wk F f (x) + x kx wk F Linearize the quadratic term x k+1 1 = arg f 1 (x 1 ) + ha 1 ( k) + ka 1 (A 1(x k 1 ) + A (x k ) b); x 1 x k 1 i x 1 + k 1 kx 1 x k 1k = arg x 1 f 1 (x 1 ) + k 1 x k+1 = arg x f (x ) ha (x); yi = hx; A(y)i; 8x; y kx 1 x k 1 + A 1( k + k(a 1 (x k 1) + A (x k 1) b))=( k 1)k ; Adjoint Operator + k kx x k + A ( k + k(a 1 (x k+1 1 ) + A (x k ) b))=( k )k : Lin et al., Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation, NIPS 011.

8 LADM with Adaptive Penalty (LADMAP) Theorem: If f kg is non-decreasing and upper bounded, i > ka i k, i = 1;, then the sequence f(x k 1 ; xk ; k)g converges to a KKT point of the model problem. Lin et al., Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation, NIPS 011.

9 LADM with Adaptive Penalty Adaptive Penalty x k+1 1 = arg x 1 f 1 (x 1 ) (LADMAP) + k 1 kx 1 x k 1 + A 1( k + k(a 1 (x k 1) + A (x k 1) b))=( k 1)k ; x k+1 = arg x f (x ) + k kx x k + A ( k + k(a 1 (x k+1 1 ) + A (x k ) b))=( k )k : k 1(x k+1 1 x k 1) A 1( k + k(a 1 (x k 1) + A (x k 1) 1 (x k+1 1 ) k (x k+1 x k ) A ( k + k(a 1 (x k+1 1 ) + A (x k ) (x k+1 ) KKT condition: 9(x ; y ; ) such that Lin et al., Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation, NIPS A 1 (x 1 ) + A (x ) b = 0; A 1 ( 1 (x 1 ); A ( (x ):

10 LADM with Adaptive Penalty (LADMAP) Both k 1kx k+1 1 x k 1k=kA 1(b)k and k kx k+1 x k k=ka (b)k should be small. i = ka i k ) Approximate ka i (b)k by p ikbk Adaptive Penalty k+1 = ( max ; ½ k); ½= ½ k+1 ½0 ; if k max( p 1kx 1 x k 1k F ; 1; otherwise; where ½ 0 1 is a constant. Loop until ka 1 (x k+1 1 ) + A (x k+1 ) bk F < " 1 ; p kx k+1 x k k F )=kbk F < " ; k+1 k max( p 1kx 1 x k k+1 1k F ; p kx x k k)=kbk F < " : Lin et al., Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation, NIPS 011.

11 LADM with Adaptive Penalty Choice of parameters (LADMAP) 1. 0 = ", where / the size of b. 0 should not be too large, so that k increases in the rst few iterations.. ½ 0 1 should be chosen such that k increases steadily (but not necessarily every iteration). Lin et al., Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation, NIPS 011. k

12 LADM with Adaptive Penalty An example (LRR): (LADMAP) Z;E kzk + ¹kEk 1 ; s:t: X = XZ + E: A 1 (Z) = XZ; A (E) = E: A 1(Z) = X T Z; A (E) = E; 1 = kxk ; = 1: Lin et al., Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation, NIPS 011.

13 Experiment Table 1: Comparison among APG, ADM, LADM and LADMAP on the synthetic data. For each quadruple (s, p, d, ~r), the LRR problem, with ¹ = 0:1, was solved for the same data using di erent algorithms. We present typical running time (in 10 3 seconds), iteration number, relative error (%) of output solution ( ^E; ^Z) and the clustering accuracy (%) of tested algorithms, respectively. Size (s, p, d, ~r) Method Time Iter. (10, 0,00, 5) (15, 0,300, 5) (0, 5, 500, 5) (30, 30, 900, 5) k^z Z 0 k kz 0 k k ^E E 0 k ke 0 k Acc. APG ADM LADM LADMAP APG ADM LADM LADMAP APG ADM LADM LADMAP APG ADM LADM N.A. N.A. N.A. N.A. N.A. LADMAP

14 LADM with Parallel Splitting and Adaptive Penalty (LADMPSAP) Model problem: x 1 ; ;x n nx f i (x i ); s:t: nx A i (x i ) = b: Lin et al., Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning, ML, 015. X kxk + 1 ¹ kb P (X)k ; s:t: X 0; + X;e kxk + 1 ¹ kek ; s:t: b = P (X) + e; X 0; + kxk + 1 X;Y;e ¹ kek ; s:t: b = P (Y) + e; X = Y; Y 0; + kxk + 1 X;Y;e ¹ kek + Â Y 0 (Y); s:t: b = P (Y) + e; X = Y:

15 LADM with Parallel Splitting and Adaptive Penalty (LADMPSAP) Can we naively generalize two-block LADMAP for multi-block problems? No! Actually, the naive generalization of LADMAP may be divergent, e.g., when applied to the following problem with n 5: x 1 ; ;x n nx kx i k 1 ; s:t: nx A i x i = b: Lin et al., Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning, ML, 015. C. Chen et al. The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent. Preprint.

16 x k+1 i LADM with Parallel Splitting and Adaptive Penalty (LADMPSAP) = arg x i f i (x i )+ i k 0 0 x i x k i + k i k i = 1; ; n; k+1 = k + k à n X 11 nx A i (x k j ) baa =( i k) j=1 A i (x k+1 i ) b k+1 = ( max ; ½ k); i P1 j=1! ; A i (x k+1 P j ) + n A i (x k j ) j=i Parallel! ; where ½= ½ ½0 ; if k max p i x k+1 i 1; otherwise; x k i ; i = 1; ; n ª = kbk < " ; with ½ 0 > 1 being a constant and 0 < " 1 being a threshold. Lin et al., Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning, ML, 015.

17 LADM with Parallel Splitting and Adaptive Penalty (LADMPSAP) Theorem: If f kg is non-decreasing and upper bounded, i > nka i k, i = 1; ; n, then f(fx k i g; k)g generated by LADMPSAP converges to a KKT point of the problem. Remark: When n =, LADMPSAP is weaker than LADMAP: i > ka i k vs. i > ka i k : Related work: He & Yuan, Linearized Alternating Direction Method with Gaussian Back Substitution for Separable Convex Programg, preprint. Lin et al., Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning, ML, 015.

18 LADM with Parallel Splitting and Adaptive Penalty (LADMPSAP) Model problem: x 1 ; ;x n where X i µ R d i nx f i (x i ); s:t: nx A i (x i ) = b; x i X i ; i = 1; ; n; is a closed convex set. x 1 ; ;x n nx f i (x i )+ nx i=n+1 + Â xi X i n (x i ); s:t: nx A i (x i ) = b; x i = x n+i ; i = 1; ; n: Theorem: If f kg is non-decreasing and upper bounded, x n+1 ; ; x n are auxiliary variables, i > nka i k +, n+i >, i = 1; ; n, then f(fx k i g; k)g generated by LADMPSAP converges to a KKT point of the problem. i > n(ka i k + 1); n+i > n; i = 1; ; n Lin et al., Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning, ML, 015.

19 Experiment X kxk + 1 ¹ kb P (X)k ; s:t: X 0; + X;Y;e kxk + 1 ¹ kek + Â Y 0 (Y); s:t: b = P (Y) + e; X = Y: Lin et al., Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning, ML, 015.

20 Experiment Table 1: Numerical comparison on the NMC problem with synthetic data, average of 10 runs. q, t and d r denote, respectively, sample ratio, the number of measurements t = q(mn) and the \degree of freedom" de ned by d r = r(m + n r) for a matrix with rank r and q. Here we set m = n and x r = 10 in all the tests. X LADM LADMPSAP n q t=d r Iter. Time(s) RelErr FA Iter. Time(s) RelErr FA % E-5 6.1E E % E E E % E E E % E E E % E E E-5 0 Table 1: Numerical comparison on the image inpainting problem. Method #Iter. Time(s) PSNR FA FPCA dB 9.41E-4 LADM dB.9E-3 LADMPSAP dB 0 Lin et al., Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning, ML, 015.

21 LADM with Parallel Splitting and Adaptive Penalty (LADMPSAP) Enhanced convergence results: Theorem 1: If f kg is non-decreasing and +1 P k=1 1 k = +1, i > nka i i (x) is bounded, i = 1; ; n, then the sequence fx k i g generated by LADMP- SAP converges to an optimal solution to the model problem. Theorem : If f kg is non-decreasing, i > nka i i (x) is bounded, i = 1; ; n, then +1 P = +1 is also the necessary condition for the global k=1 1 k convergence of fx k i g generated by LADMPSAP to an optimal solution to the model problem. With the above analysis, when all the subgradients of the component objective functions are bounded we can remove the upper bound max. Lin et al., Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning, ML, 015.

22 LADM with Parallel Splitting and Adaptive Penalty (LADMPSAP) De ne x = (x T 1 ; ; xt n) T, x = ((x 1 )T ; ; (x )T ) T and f(x) = n P where (x 1; ; x ; ) is a KKT point of the model problem. f i (x i ), Proposition: ~x is an optimal solution to the model problem i there exists > 0, such that f(~x) f(x ) + nx ha i ( ); ~x i x nx i i + A i (~x i ) b = 0: Our criterion for checking the optimality of a solution is much simpler than that in He et al. 011, which has to compare with all (x 1 ; ; x n ; ) R d 1 R d n R m. Lin et al., Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning, ML, 015. B. S. He and X. Yuan. On the O(1/t) convergence rate of alternating direction method. Preprint, 011.

23 LADM with Parallel Splitting and Adaptive Penalty (LADMPSAP) Theorem 3: De ne ¹x K = K P f(¹x K ) f(x )+ nx k=0 k x k+1, where k = 1 k = P K j=0 A i ( ); ¹x K i x i + 0 nx A i (¹x K i ) b 1 j. Then <= C 0 = à ( )! where 1 ka i k P = (n+1) max 1; ; i = 1; ; n and C i nka i k 0 = n à i KX k=0 1 k! (1) x 0 i x i ; A much simpler proof of convergence rate (in ergodic sense)! Lin et al., Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning, ML, 015. B. S. He and X. Yuan. On the O(1/t) convergence rate of alternating direction method. Preprint, 011.

24 Proximal LADMPSAP Even more general problem: x 1 ; ;x n nx f i (x i ); s:t: nx A i (x i ) = b: f i (x i ) = g i (x i ) + h i (x i ); where both g i and h i are convex, g i is C 1;1 : krg i (x) rg i (y)k <= L i kx yk; 8x; y R d i ; and h i may not be di erentiable but its proximal operation is easily solvable.

25 Proximal LADMPSAP Linearize the augmented term to obtain: x k+1 i (k) i = arg x i h i (x i ) + g i (x i ) + ¾(k) i Further linearize g i : x k+1 i = arg x i x i x k i + Ay i h i (x i ) + g i (xk i ) + ¾(k) i A y i (^ k)=¾ (k) i (^ k)=¾ (k) i +hrg i (x k i ) + Ay(^ k); i x i x k i i + (k) i = arg h i (x i ) + (k) x i Convergence condition: i x i x k i + 1 (k) i ; i = 1; ; n; x i x k i [A y (^ k) i + rg i (x k i )] : = T i + k i, where T i L i and i > nka i k are both positive constants.

26 Experiment Group Sparse Logistic Regression with Overlap w;b 1 s sx log 1 + exp y i (w T x i + b) + ¹ tx ks j wk; (1) where x i and y i, i = 1; ; s, are the training data and labels, respectively, and w and b parameterize the linear classi er. S j, j = 1; ; t, are the selection matrices, with only one 1 at each row and the rest entries are all zeros. The groups of entries, S j w, j = 1; ; t, may overlap each other. Introducing ¹w = (w T ; b) T, ¹x i = (x T i ; 1)T, z = (z T 1 ; zt ; ; zt t )T, and ¹ S = (S; 0), where S = (S T 1 ; ; S T t ) T, (1) can be rewritten as j=1 ¹w;z 1 s sx log 1 + exp y i ( ¹w T ¹x i ) tx + ¹ kz j k; s:t: z = S ¹ ¹w; () j=1 The Lipschitz constant of the gradient of logistic function with respect to ¹w can be proven to be L ¹w 1 4s k ¹ Xk, where ¹ X = (¹x 1 ; ¹x ; ; ¹x s ).

27 Experiment (s, p, t, q) Method Time #Iter. (300, 901, 100, 10) (450, 1351, 150, 15) (600, 1801, 00, 0) k ^¹w ¹w k k ¹w k k^z z k kz k ADM LADM LADMPS LADMPSAP pladmpsap ADM LADM LADMPS LADMPSAP pladmpsap ADM LADM LADMPS LADMPSAP pladmpsap

28 Conclusions LADMAP, LADMPSAP, and P-LADMPSAP are very general methods for solving various convex programs. Adaptive penalty is important for fast convergence.

29 Thanks!

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation Zhouchen Lin Visual Computing Group Microsoft Research Asia Risheng Liu Zhixun Su School of Mathematical Sciences

More information

Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implications. Zhouchen Lin ( 林宙辰 ) 北京大学 Nov. 7, 2015

Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implications. Zhouchen Lin ( 林宙辰 ) 北京大学 Nov. 7, 2015 Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implications Zhouchen Lin ( 林宙辰 ) 北京大学 Nov. 7, 2015 Outline Sparsity vs. Low-rankness Closed-Form Solutions of Low-rank Models Applications

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ???? DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Nesterov s Optimal Gradient Methods

Nesterov s Optimal Gradient Methods Yurii Nesterov http://www.core.ucl.ac.be/~nesterov Nesterov s Optimal Gradient Methods Xinhua Zhang Australian National University NICTA 1 Outline The problem from machine learning perspective Preliminaries

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

then kaxk 1 = j a ij x j j ja ij jjx j j: Changing the order of summation, we can separate the summands, kaxk 1 ja ij jjx j j: let then c = max 1jn ja

then kaxk 1 = j a ij x j j ja ij jjx j j: Changing the order of summation, we can separate the summands, kaxk 1 ja ij jjx j j: let then c = max 1jn ja Homework Haimanot Kassa, Jeremy Morris & Isaac Ben Jeppsen October 7, 004 Exercise 1 : We can say that kxk = kx y + yk And likewise So we get kxk kx yk + kyk kxk kyk kx yk kyk = ky x + xk kyk ky xk + kxk

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Lecture 7: September 17

Lecture 7: September 17 10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively

More information

arxiv: v1 [math.oc] 2 Sep 2011

arxiv: v1 [math.oc] 2 Sep 2011 arxiv:1109.0367v1 [math.oc] 2 Sep 2011 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040

More information

Norms and Perturbation theory for linear systems

Norms and Perturbation theory for linear systems CHAPTER 7 Norms and Perturbation theory for linear systems Exercise 7.7: Consistency of sum norm? Observe that the sum norm is a matrix norm. This follows since it is equal to the l -norm of the vector

More information

Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections

Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections Jianhui Chen 1, Tianbao Yang 2, Qihang Lin 2, Lijun Zhang 3, and Yi Chang 4 July 18, 2016 Yahoo Research 1, The

More information

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices Chapter 7 Iterative methods for large sparse linear systems In this chapter we revisit the problem of solving linear systems of equations, but now in the context of large sparse systems. The price to pay

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Robust PCA CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Robust PCA 1 / 52 Previously...

More information

Lecture 1: September 25

Lecture 1: September 25 0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Image Compression Using Simulated Annealing

Image Compression Using Simulated Annealing Image Compression Using Simulated Annealing Aritra Dutta,GeonwooKim,MeiqinLi, Carlos Ortiz Marrero,MohitSinha,ColeStiegler k August 5 14, 2015 Mathematical Modeling in Industry XIX Institute of Mathematics

More information

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations Chuangchuang Sun and Ran Dai Abstract This paper proposes a customized Alternating Direction Method of Multipliers

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Is the test error unbiased for these programs?

Is the test error unbiased for these programs? Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Announcements. stuff stat 538 Zaid Hardhaoui. Statistics. cry. spring. g fa. inference. VC dimension covering

Announcements. stuff stat 538 Zaid Hardhaoui. Statistics. cry. spring. g fa. inference. VC dimension covering Announcements spring Convex Optimization next quarter ML stuff EE 578 Margam FaZe CS 547 Tim Althoff Modeling how to formulate real world problems as convex optimization Data science constrained optimization

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

CS60021: Scalable Data Mining. Large Scale Machine Learning

CS60021: Scalable Data Mining. Large Scale Machine Learning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Distributed ADMM for Gaussian Graphical Models Yaoliang Yu Lecture 29, April 29, 2015 Eric Xing @ CMU, 2005-2015 1 Networks / Graphs Eric Xing

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING SIAM J. OPTIM. Vol. 8, No. 1, pp. 646 670 c 018 Society for Industrial and Applied Mathematics HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Learning Task Grouping and Overlap in Multi-Task Learning

Learning Task Grouping and Overlap in Multi-Task Learning Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International

More information

Least Squares SVM Regression

Least Squares SVM Regression Least Squares SVM Regression Consider changing SVM to LS SVM by making following modifications: min (w,e) ½ w 2 + ½C Σ e(i) 2 subject to d(i) (w T Φ( x(i))+ b) = e(i), i, and C>0. Note that e(i) is error

More information

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

MATH 680 Fall November 27, Homework 3

MATH 680 Fall November 27, Homework 3 MATH 680 Fall 208 November 27, 208 Homework 3 This homework is due on December 9 at :59pm. Provide both pdf, R files. Make an individual R file with proper comments for each sub-problem. Subgradients and

More information

4 Frequent Directions

4 Frequent Directions 4 Frequent Directions Edo Liberty[3] discovered a strong connection between matrix sketching and frequent items problems. In FREQUENTITEMS problem, we are given a stream S = hs 1,s 2,...,s n i of n items

More information

Distributed Convex Optimization

Distributed Convex Optimization Master Program 2013-2015 Electrical Engineering Distributed Convex Optimization A Study on the Primal-Dual Method of Multipliers Delft University of Technology He Ming Zhang, Guoqiang Zhang, Richard Heusdens

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input

More information

2.3. Clustering or vector quantization 57

2.3. Clustering or vector quantization 57 Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION

LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION JUNFENG YANG AND XIAOMING YUAN Abstract. The nuclear norm is widely used to induce low-rank solutions for

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Warm up. Regrade requests submitted directly in Gradescope, do not instructors.

Warm up. Regrade requests submitted directly in Gradescope, do not  instructors. Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

An ADMM algorithm for optimal sensor and actuator selection

An ADMM algorithm for optimal sensor and actuator selection An ADMM algorithm for optimal sensor and actuator selection Neil K. Dhingra, Mihailo R. Jovanović, and Zhi-Quan Luo 53rd Conference on Decision and Control, Los Angeles, California, 2014 1 / 25 2 / 25

More information

A Proof of Theorem 1. Dual Weighted Generalized Nuclear Norm. C Proof of Theorem 2

A Proof of Theorem 1. Dual Weighted Generalized Nuclear Norm. C Proof of Theorem 2 A Proof of Theorem Proof or all Z = P i c i i with kak A = P i c i, where i = Au i v T i BT, we can construct the i-th column of W and H as Clearly, we have Z = WH T and w i = p c i Au i and h i = p c

More information

1 Computing with constraints

1 Computing with constraints Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)

More information

Positive Definite Matrix

Positive Definite Matrix 1/29 Chia-Ping Chen Professor Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Positive Definite, Negative Definite, Indefinite 2/29 Pure Quadratic Function

More information

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank

More information

arxiv: v1 [math.oc] 23 May 2017

arxiv: v1 [math.oc] 23 May 2017 A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18 XVIII - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No18 Linearized alternating direction method with Gaussian back substitution for separable convex optimization

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information