Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance

Size: px
Start display at page:

Download "Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance"

Transcription

1 Date: Mar. 3rd, 2017 Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance Presenter: Songtao Lu Department of Electrical and Computer Engineering Iowa State University 1

2 Outline Formulation: Spcetral Clustering Joint Factor Analysis and Latent Clustering Algorithms SymNMF Joint Factor Analysis and Latent Clustering Deep Neural Networks for Clustering Other Problems Conclusion 2

3 Applications: Graph Partitioning Figure : The above two graphs are the same graph re-organized and drawn from the stochastic block model (SBM) with 1000 vertices, 5 balanced communities, within-cluster probability of 1/50 and across-cluster probability of 1/1000. Emmanuel Abbe, http : // soc CD1.pdf 3

4 Clustering Problem Formulation 4

5 Kernel K-means clustering K-means clustering K J K = x i m k 2 k=1 i C k (1) K 1 =c 2 x T n i x j k i,j C k (2) k=1 m k = i C k x i /n k is the centroid of cluster C k of n k points. c 2 = i x i 2 5

6 Kernel K-means clustering K-means clustering (matrix form) J K = Tr(X T X) Tr(H T X T XH) (3) H = (h 1,..., h K ), h T k h l = δ kl h k = (0,..., 0, 1, }. {{.., 1 }, 0,..., 0) T /n 1/2 k n k min J K becomes W = X T X max J W (H) = Tr(H T WH) (4) H T H=I,H 0 6

7 Kernel K-means clustering A nonlinear transformation (mapping) x i φ(x i ) (5) Kernel K-means can be written as K minimize φ(x i ) m k 2, (6) C k, k i C k k=1 where m k = i C k φ(x i )/n k is the centroid of cluster C k of n k points. Kernel K-means is equivalent to maximize H k 1 n k i,j C k W i,j = Tr(H T WH) (7) Kernel: W i,j = φ(x i ) T φ(x j ) Membership Matrix: H = (h 1,..., h K ), h T k h l = δ kl, h k = (0,..., 0, 1, }. {{.., 1 }, 0,..., 0) T /n 1/2 k n k 7

8 Spectral Clustering 8

9 Spectral Clustering VS K-Means 9

10 Challenge of Spectral Clustering W = W W W 3 (8) If λ 3 (W 1 ) > max(λ 1 (W 2 ), λ 2 (W 3 )), three leading eigenvector of the similarity matrix are 10

11 Challenge of Spectral Clustering Original Data New Representation in Eigenvectors Spectral Clustering (accuracy: 37.95%) SymNMF (accuracy: 88.78%) D. Huang, S. Yun and H. Park, SymNMF: nonnegative low rank approximation of a similarity matrix for graph clustering, Journal of Global Optimization, vol. 62,no. 3,pp , July,

12 Kernel K-means clustering Equivalence between K-means and matrix factorization H = arg min 2Tr(H T WH) (9) H T H=I,H 0 = arg min W 2 H T F 2Tr(HT WH) + H T H 2 F (10) H=I,H 0 = arg min W HH T 2 H T F (11) H=I,H 0 12

13 Motivation Relaxed version of K-means: Spectral Clustering min H HH T W 2 F maximize Tr(H T WH) (12) H T H=I,H 0 subject to HHT = I H T W SymNMF min H HH T W 2 F H subject to H 0 H T W H D. Huang, S. Yun and H. Park, SymNMF: nonnegative low rank approximation of a similarity matrix for graph clustering, Journal of Global Optimization, vol. 62,no. 3,pp , July,

14 SymNMF for Clustering samples (N) similarity matrix features (M) construct SymNMF: minimize X R N K subject to X 0 XX T Z 2 F Z R N N : pairwise similarity matrix X R N K : clustering indicator matrix X T Z X 14

15 Joint factor analysis and latent clustering min X WH 2 W R N F,H R F M F (13) s.t. W 0, H 0 (14) 15

16 Joint factor analysis and latent clustering min S,M X SM 2 F (15) s.t. S(i, j) {0, 1}, S(i, :) 0 = 1 (16) 16

17 Joint factor analysis and latent clustering Step1 : dimension reduction via factorization (e.g. SVD, NMF) Step 2: perform K-means clustering on the latent factor W Drawbacks of two-step approach ignores latent cluster structure when performing dimension reduction uses naive factorization when clustering 17

18 Motivation A real-world example 2 clusters of documents taken from Reuters text corpus NMF with rank= 2 The figure shows the weights of each document on the two latent topics. 18

19 Latent clustering {X(i, :)} N i = 1 have latent representations drawn from K clusters The rows of W can be divided into K clusters min W,H,S,M X WH 2 F + λ W SM 2 F (17) s.t. W 0, H 0, S(i, :) 0 = 1, S(i, k) {0, 1}, i, k (18) λ 0 is a pre-specified regularization parameter S R N K denotes cluster membership. S(i, k) = 1 means that W(i, :) belongs to cluster k M R K F denotes centroid matrix, where each centroid is M(k, :) 19

20 DNN min W,S,M where w i = f(x i ; H) min H,Z,S,M N f(x i ; H) SM 2 F (19) i=1 s.t. S(i, :) 0 = 1, S(i, k) {0, 1}, i, k (20) N l(g(f(x i ; H), Z), x i ) + λ 2 f(x i; H) SM 2 F (21) i=1 s.t. S(i, :) 0 = 1, S(i, k) {0, 1}, i, k (22) where ˆx i = g(w i ; Z) 20

21 Algorithms 21

22 Symmetric Nonnegative Matrix Factorization P1 : minimize X R N K subject to X 0 f(x) 1 2 XXT Z 2 F Challenges: 4th order polynomial (nonconvex) in terms of X No Lipschitz continuous gradient General matrix Z R N N non-symmetric indefinite contains negative entries K is any integer in [1, N]. 22

23 SymNMF (two dimensional case, K = 1) minimize x subject to x xxt Z 2 F where x = [x 1, x 2 ] T. Hessian matrix: ( 12x 2 H = 1 + 4x 2 2 4Z 11 8x 1 x 2 4Z 12 8x 2 x 1 4Z 21 12x x2 1 4Z 22 ) [0, 0] T [0, 0] T [0, 0] T Z: positive definite Z: indefinite Z: negative definite 23

24 Background and Motivation Clustering Symmetric Nonnegative Matrix Factorization (SymNMF) Probabilistic clustering [Zass et al 05] Community detection [Wang et al 11] [Ma et al 10] Overlapping community detection [Zhang et al 13] Graph partitioning and image segmentation [Park et al 15] Clustering accuracy K-means variants NMF variants Spectral clustering variants SymNMF D. Huang, S. Yun and H. Park, SymNMF: nonnegative low rank approximation of a similarity matrix for graph clustering, Journal of Global Optimization, vol. 62,no. 3,pp , July,

25 Related Work Existing Algorithms Projected Gradient Descent (PGD) [Kuang, Park et al 12] X (t+1) = proj + [X (t) α f(x (t) )] where proj + [X] = max{x, 0}. Projected Newton Method (PNewton) [Kuang, Park et al 12] X (t+1) = proj + [X (t) α( 2 f(x (t) )) 1 f(x (t) )] Disadvantages: no global convergence guarantee (no Lipschitz continuous gradient) 25

26 Related Work Existing Algorithms (continue) Eigen-value decomposition based SymNMF [Huang, Sidiropoulos et al 14] + Z = U K Σ K U T K and Let B U KΣ 1/2 K + Assume Z = XX T minimize X,Q subject to 1 2 X BQ 2 F X 0, Q T Q = QQ T = I Disadvantages: assume there is an exact decomposition (i.e., Z = XX T ) there is no proof that the optimal objective value is 0 26

27 Related Work Existing Algorithms (continue) Alternating Non-negative Least Square (ANLS) [Kuang, Park et al 15] 1 minimize X,Y 2 XYT Z 2 F + λ X Y 2 F subject to Y 0, X 0 Disadvantages: KKT points of the problem is different from P1 Coordinate Descent (CD) [Vandaele, Gillis et al 16] where minimize X :,j 0 X :,jx T :,j R(j) 2 F R (j) = Z K k=1,k j X :,k X T :,k Disadvantages: no convergence guarantee to KKT points (the optimal solution of each subproblem is not unique) 27

28 New Formulation of SymNMF Our formulation of SymNMF: P2 : 1 minimize Z 2 X R N K,Y R N K 2 XYT F subject to Y 0, X = Y, Y i,: 2 2 τ, i Advantages of the new formulation: variable splitting feasible set is compact (closed and bounded) 28

29 New Formulation of SymNMF How to solve this problem? P2 : 1 minimize Z 2 X R N K,Y R N K 2 XYT F subject to Y 0, X = Y, Y i,: 2 2 τ, i 29

30 Alternating Direction Method of Multipliers (ADMM) Problem: minimize x,y subject to h(x) + g(y) Ax + By = c where x R N, z R M, A R P N, B R P M, and c R P. The augmented Lagrangian is L(x, y; λ) = h(x) + g(y) + λ T (Ax + Bz c) + ρ/2 Ax + By c 2 2 where ρ > 0 ADMM consists of the iterations [Boyd et al 04] x (t+1) = arg min x L(x, y (t) ; λ (t) ) y (t+1) = arg min y L(x(t+1), y; λ (t) ) λ (t+1) = λ (t) + ρ(ax (t+1) + By (t+1) c) 30

31 ADMM for SymNMF Partial augmented Lagrangian: L(X, Y; Λ) = 1 2 XYT Z 2 F + Y X, Λ + ρ 2 Y X 2 F Λ R N K : dual variables : inner product operator ρ > 0 Y-subproblem minimize Y 0, Y i,: 2 τ, i 1 2 XYT Z 2 F + Y X, Λ + ρ 2 Y X 2 F X-subproblem minimize X 1 2 XYT Z 2 F + Y X, Λ + ρ 2 Y X 2 F 31

32 Comparison with Classical ADMM classical ADMM: minimize x X,y Y subject to h(x) + g(y) Ax + By = c P2 : minimize X,Y subject to 1 2 XYT Z 2 F Y 0, X = Y, Y i,: 2 2 τ, i Challenges: Nonconvex objective Objective function is non-separable Recent analysis results of ADMM for nonconvex problem do not apply [Hong et al 16] [Pong et al 15] 32

33 Nonconvex Splitting SymNMF (NS-SymNMF)) Parameter Update: iteration dependent penalty parameter: Primal Update: Y (t+1) = arg β (t) = 6 ρ X(t) (Y (t) ) T Z 2 F 1 min Y 0, Y i,: 2 2 τ, i 2 X(t) Y T Z 2 F + ρ 2 Y X(t) + Λ (t) /ρ 2 F + β(t) 2 Y Y(t) 2 F } proximal term X (t+1) = arg min X Dual Update: 1 2 X(Y(t+1) ) T Z 2 F + ρ 2 X Λ(t) /ρ Y (t+1) 2 F Λ (t+1) = Λ (t) + ρ(y (t+1) X (t+1) ) 33

34 Convergence Analysis of ADMM (convex) Convex case [Boyd et al 11],[Hong, Luo 12]: X (t) X 2 + Y (t) Y 2 0 and Λ (t) Λ 2 0 where (X, Y ; Λ ) is the globally optimal primal-dual pair. 34

35 Convergence Analysis (Convergence Rate) Define the proximal gradient of the augmented Lagrangian function as L(X, Y, Λ) where the operator proj Y (W) arg [ YT proj Y [Y T Y (L(Y, X, Λ)] X L(X, Y, Λ) min W Y 2 Y 0, Y i,: 2 2 τ, i F. We use the following quantity to measure the progress of the algorithm primal gap dual gap P(X (t), Y (t), Λ (t) ) L(X (t), Y (t), Λ (t) ) 2 F + X (t) Y (t) 2 F. ] If lim t P(X (t), Y (t), Λ (t) ) = 0, then a KKT point of P2 is obtained. 35

36 Numerical Results Data sets: Synthetic data sets Real data sets Algorithms Comparison: PGD: Projected Gradient Descent [Kuang, Park et al 12] PNewton: Projected Newton Method [Kuang, Park et al 12] SNMF: Eigen Value based SymNMF [Huang, Sidiropoulos 14] ANLS: Alternating Non-negative Least Square [Kuang, Park et al 15] CD: Coordinate Descent [Vandaele, Gillis et al 16] 36

37 Numerical Results Initialization 20 random initializations (Y or X follows i.i.d uniform distribution in the range [0, τ]) Every algorithm starts from the same initial point Parameter Chosen of NS-SymNMF: Accelerating convergence rate: Initialization: ρ (0) = N τ where τ is the average of the column norm of Z. ρ (t+1) = min{ρ (t) /(1 ɛ/ρ (t) ), 6.1Nτ} where ɛ = 10 3 β (t) = 6ξ (t) X (t) Y (t) Z 2 F /ρ(t) where ξ (t+1) = min{ξ (t) /(1 ɛ/ξ (t) ), 1} and ξ (1) =

38 Numerical Results (Synthetic Data) Data Set I: Random symmetric matrices: M R N N + i.i.d. Gaussian Z = M + M T Data Set II: Adjacency matrices N = 2000, K = 4 The numbers of data points within each cluster are 300,500,800,400. Data points {x i } R, i = 1,..., N. Mean: 2,3,6,8; Variance: 0.5. Gaussian function Z i,j = exp( (x i x j ) 2 /(2σ 2 )) where σ 2 = 0.5 Relative objective value: XX T Z 2 F / Z 2 F 38

39 Numerical Results (Random symmetric matrices) Data set I: Random symmetric matrices: N = 500, K = Monte Carlo (MC) trials Full rank 39

40 Numerical Results (Adjacency matrices) Swamp Data set II: Adjacency matrices: N = 2000, K = 4 Optimality Gap: X proj + [X X (g(x, Y))] 20 MC trials 40

41 Numerical Results (Optimality) Check local optimality: initialize δ as 1 decrease it by 0.01 each time check the minimum eigenvalue of T. More examples: fix the ratio of the number of nodes within each cluster (i.e., 3 : 5 : 8 : 4) test on the different total numbers of nodes N λmin (T ) δ Local Optimality (true) % % % 41

42 Numerical Results (Real Data Set) Text Mining (dense similarity matrix): Vertices: documents Edges: similarity between to documents Datasets: Reuters; topic detection and tracking2 (TDT2) [Cai, et al 11] Social Network (sparse similarity matrix): Vertices: individuals Edges: relationship Datasets: -Enron [Leskovec et al 09]; Brightkite (location-based social networking) [Cho et al 11], Facebook [McAuley et al 12] 42

43 Numerical Results Algorithms NS-SymNMF PGD ANLS SNMF CD Mean and Variance 1.01e-2±5.35e e-2±7,34e e-2±1.25e e e-2±1.21e-6 Text mining data dense similarity matrix : topic detection and tracking2 (TDT2) N = 8, 939 documents, K = 25 classes Gaussian function (similarity) 43

44 Numerical Results Algorithms NS-SymNMF ANLS SNMF CD Mean and Variance 8.75e-1±9.52e e-1±1.93e e ±1.49e-3 Social network data sparse similarity matrix : Brightkite (location-based social networking) people N = 58, 228, edges 428, 156, K = 50 44

45 Joint factor analysis and latent clustering min X DWH 2 W,H,S,M,{d i } N F + λ W SM 2 F + η H 2 F (23) i=1 s.t. W 0, H 0, W(i, :) 2 = 1, i (24) D = diag(d 1,..., d N ), (25) S(i, :) 0 = 1, S(i, k) {0, 1}, i, k (26) η 0 is a regularization parameter the Euclidean distance based K-means clustering on the unit 2-norm ball is equivalent to correlation based clustering η H 2 F control the scaling ambiguity 45

46 Joint factor analysis and latent clustering min W,H,Z,S,M,{d i } N i=1 X DWH 2 F + λ W SM 2 F + η H 2 F µ 0 and Z is a slack variable a large µ to enforce W Z + µ W Z 2 F s.t. W 0, H 0, Z(i, :) 2 = 1, i (27) D = diag(d 1,..., d N ), (28) S(i, :) 0 = 1, S(i, k) {0, 1}, i, k (29) 46

47 Alternating Optimization W-update W = arg min W 0 X DWH 2 F + λ W SM 2 F + µ W Z 2 F (30) H-update d i -update H = arg min H 0 X DWH 2 F + η H 2 F (31) d i = arg min d i X(i, :) d i W(i, :)H 2 2 (32) d i = X(i, :)bt i b i, where b T i = W(i, :)H Z-update Z = arg min Z Z(:,i) 2 =1, i W 2 F (33) S, M-update K-means 47

48 Numerical Results Reuters text corpus, top 41 clusters, 8213 documents, words Test for various number of clusters k For each k, 10 Monte-Carlo trials by randomly picking k-clusters locally consistent concept factorization (LCCF) [Cai et al 11] min X U,V XUVT 2 F + λtr(vt LV) s.t.u 0, V 0 (34) L: Graph Laplacian. Intuition: ambient proximity latent proximity 48

49 DNN 49

50 Other Works Other works: Tensor decomposition Distributed matrix factorization 50

51 PARAFAC 2 PARAllel FACtor analysis 2 (PARAFAC2) [Harshman 72]: Cross-Language Information Retrieval [Chew 07] and Multilingual Document Clustering [Romeo 14 ]: terms X(:, :, k) = F k D k (C)A T, k = 1,..., K documents concept terms weight concept documents X(:, :, k) F k D k (C) A T French Chinese Spanish Italian English X(:, :, 2) X(:, :, 1) X(:, :, 4) X(:, :, 3) X(:, :, 5) 51

52 Distributed Matrix Factorization minimize XY T Z 2 F X,Y Distributed matrix factorization: minimize U i,y i, i subject to N i=1 U i Y T i Z i 2 F U i = U j, i, j E U 2, Y 2 U 1, Y 1 Z 2 U 4, Y 4 U 3, Y 3 E Z 1 U 7, Y 7 Z 3 Z 4 Z 5 Z 7 Z 6 U 6, Y 6 U 5, Y 5 Convergence analysis KKT points [Hong 16] Global optimal solution (future work) 52

53 Conclusion Clustering Formulation Spectral Clustering Joint Factor Analysis and Latent Clustering Algorithms Joint Factor Analysis and Latent Clustering SymNMF Other Problems Tensor decomposition Distributed Matrix Factorization 53

54 Thanks for Your Attention! 54

55 References D. Kuang, S. Yun, and H. Park, SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering, Journal of Global Optimization, vol. 62, no. 3, pp , Jul Songtao Lu, Mingyi Hong, and Zhengdao Wang, A nonconvex splitting method for symmetric nonnegative matrix factorization: Convergence analysis and optimality, IEEE Transactions on Signal Processing, Feb Bo Yang, Xiao Fu, Nicholas D. Sidiropoulos, Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering, 2016 B. Yang, X. Fu, N. D. Sidiropoulos, Learning from hidden traits: Joint factor analysis and latent clustering, IEEE Transactions on Signal Processing, accepted, Sep

First-order methods of solving nonconvex optimization problems: Algorithms, convergence, and optimality

First-order methods of solving nonconvex optimization problems: Algorithms, convergence, and optimality Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2018 First-order methods of solving nonconvex optimization problems: Algorithms, convergence, and optimality

More information

A Stochastic Nonconvex Splitting Method for Symmetric Nonnegative Matrix Factorization

A Stochastic Nonconvex Splitting Method for Symmetric Nonnegative Matrix Factorization A Stochastic Nonconvex Splitting Method for Symmetric Nonnegative Matrix Factorization Songtao Lu Mingyi Hong Zhengdao Wang Iowa State University Iowa State University Iowa State University Abstract Symmetric

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems?

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Mingyi Hong IMSE and ECpE Department Iowa State University ICCOPT, Tokyo, August 2016 Mingyi Hong (Iowa State University)

More information

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL) Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization Nick Gould (RAL) x IR n f(x) subject to c(x) = Part C course on continuoue optimization CONSTRAINED MINIMIZATION x

More information

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009 UC Berkeley Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 5 Fall 2009 Reading: Boyd and Vandenberghe, Chapter 5 Solution 5.1 Note that

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

More information

On Optimal Frame Conditioners

On Optimal Frame Conditioners On Optimal Frame Conditioners Chae A. Clark Department of Mathematics University of Maryland, College Park Email: cclark18@math.umd.edu Kasso A. Okoudjou Department of Mathematics University of Maryland,

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ???? DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods Spectral Clustering Seungjin Choi Department of Computer Science POSTECH, Korea seungjin@postech.ac.kr 1 Spectral methods Spectral Clustering? Methods using eigenvectors of some matrices Involve eigen-decomposition

More information

Fast Coordinate Descent methods for Non-Negative Matrix Factorization

Fast Coordinate Descent methods for Non-Negative Matrix Factorization Fast Coordinate Descent methods for Non-Negative Matrix Factorization Inderjit S. Dhillon University of Texas at Austin SIAM Conference on Applied Linear Algebra Valencia, Spain June 19, 2012 Joint work

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

10 Numerical methods for constrained problems

10 Numerical methods for constrained problems 10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside

More information

Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY

Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY OUTLINE Why We Need Matrix Decomposition SVD (Singular Value Decomposition) NMF (Nonnegative

More information

10725/36725 Optimization Homework 4

10725/36725 Optimization Homework 4 10725/36725 Optimization Homework 4 Due November 27, 2012 at beginning of class Instructions: There are four questions in this assignment. Please submit your homework as (up to) 4 separate sets of pages

More information

1 Non-negative Matrix Factorization (NMF)

1 Non-negative Matrix Factorization (NMF) 2018-06-21 1 Non-negative Matrix Factorization NMF) In the last lecture, we considered low rank approximations to data matrices. We started with the optimal rank k approximation to A R m n via the SVD,

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Distributed Convex Optimization

Distributed Convex Optimization Master Program 2013-2015 Electrical Engineering Distributed Convex Optimization A Study on the Primal-Dual Method of Multipliers Delft University of Technology He Ming Zhang, Guoqiang Zhang, Richard Heusdens

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

NONNEGATIVE matrix factorization (NMF) has become

NONNEGATIVE matrix factorization (NMF) has become 1 Efficient and Non-Convex Coordinate Descent for Symmetric Nonnegative Matrix Factorization Arnaud Vandaele 1, Nicolas Gillis 1, Qi Lei 2, Kai Zhong 2, and Inderjit Dhillon 2,3, Fellow, IEEE 1 Department

More information

Learning From Hidden Traits: Joint Factor Analysis and Latent Clustering

Learning From Hidden Traits: Joint Factor Analysis and Latent Clustering 1 Learning From Hidden Traits: Joint Factor Analysis and Latent Clustering Bo Yang, Student Member, IEEE, Xiao Fu, Member, IEEE, Nicholas D. Sidiropoulos, Fellow, IEEE arxiv:165.6711v1 [cs.lg] 1 May 16

More information

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China) Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote

More information

Dual Ascent. Ryan Tibshirani Convex Optimization

Dual Ascent. Ryan Tibshirani Convex Optimization Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate

More information

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING SIAM J. OPTIM. Vol. 8, No. 1, pp. 646 670 c 018 Society for Industrial and Applied Mathematics HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX

More information

Applications of Linear Programming

Applications of Linear Programming Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal

More information

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications Professor M. Chiang Electrical Engineering Department, Princeton University March

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 11, 2016 Paper presentations and final project proposal Send me the names of your group member (2 or 3 students) before October 15 (this Friday)

More information

Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.

More information

Overlapping Communities

Overlapping Communities Overlapping Communities Davide Mottin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides GRAPH

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition

ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition Wing-Kin (Ken) Ma 2017 2018 Term 2 Department of Electronic Engineering The Chinese University

More information

Clustering. SVD and NMF

Clustering. SVD and NMF Clustering with the SVD and NMF Amy Langville Mathematics Department College of Charleston Dagstuhl 2/14/2007 Outline Fielder Method Extended Fielder Method and SVD Clustering with SVD vs. NMF Demos with

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Mini-Course 1: SGD Escapes Saddle Points

Mini-Course 1: SGD Escapes Saddle Points Mini-Course 1: SGD Escapes Saddle Points Yang Yuan Computer Science Department Cornell University Gradient Descent (GD) Task: min x f (x) GD does iterative updates x t+1 = x t η t f (x t ) Gradient Descent

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

A DECOMPOSITION PROCEDURE BASED ON APPROXIMATE NEWTON DIRECTIONS

A DECOMPOSITION PROCEDURE BASED ON APPROXIMATE NEWTON DIRECTIONS Working Paper 01 09 Departamento de Estadística y Econometría Statistics and Econometrics Series 06 Universidad Carlos III de Madrid January 2001 Calle Madrid, 126 28903 Getafe (Spain) Fax (34) 91 624

More information

Inverse Power Method for Non-linear Eigenproblems

Inverse Power Method for Non-linear Eigenproblems Inverse Power Method for Non-linear Eigenproblems Matthias Hein and Thomas Bühler Anubhav Dwivedi Department of Aerospace Engineering & Mechanics 7th March, 2017 1 / 30 OUTLINE Motivation Non-Linear Eigenproblems

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 12: Graph Clustering Cho-Jui Hsieh UC Davis May 29, 2018 Graph Clustering Given a graph G = (V, E, W ) V : nodes {v 1,, v n } E: edges

More information

Sparse and Regularized Optimization

Sparse and Regularized Optimization Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

A New Trust Region Algorithm Using Radial Basis Function Models

A New Trust Region Algorithm Using Radial Basis Function Models A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010 Outline 1 Introduction 2 Background Taylor series approximations

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Online Nonnegative Matrix Factorization with General Divergences

Online Nonnegative Matrix Factorization with General Divergences Online Nonnegative Matrix Factorization with General Divergences Vincent Y. F. Tan (ECE, Mathematics, NUS) Joint work with Renbo Zhao (NUS) and Huan Xu (GeorgiaTech) IWCT, Shanghai Jiaotong University

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Nov 2, 2016 Outline SGD-typed algorithms for Deep Learning Parallel SGD for deep learning Perceptron Prediction value for a training data: prediction

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State

More information

10-725/ Optimization Midterm Exam

10-725/ Optimization Midterm Exam 10-725/36-725 Optimization Midterm Exam November 6, 2012 NAME: ANDREW ID: Instructions: This exam is 1hr 20mins long Except for a single two-sided sheet of notes, no other material or discussion is permitted

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information