Fast Nonnegative Matrix Factorization with Rank-one ADMM

Similar documents
Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE Artificial Intelligence Grad Project Dr. Debasis Mitra

ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds

Preconditioning via Diagonal Scaling

Fantope Regularization in Metric Learning

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

arxiv: v3 [cs.lg] 18 Mar 2013

arxiv: v1 [math.oc] 23 May 2017

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

AN ENHANCED INITIALIZATION METHOD FOR NON-NEGATIVE MATRIX FACTORIZATION. Liyun Gong 1, Asoke K. Nandi 2,3 L69 3BX, UK; 3PH, UK;

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

Preserving Privacy in Data Mining using Data Distortion Approach

Nonnegative Matrix Factorization

Simplicial Nonnegative Matrix Tri-Factorization: Fast Guaranteed Parallel Algorithm

Non-Negative Matrix Factorization with Quasi-Newton Optimization

NON-NEGATIVE SPARSE CODING

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Iterative Laplacian Score for Feature Selection

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations

c Springer, Reprinted with permission.

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning

Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION

Sparseness Constraints on Nonnegative Tensor Decomposition

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT

Nonnegative Matrix Factorization with Toeplitz Penalty

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Graph-Laplacian PCA: Closed-form Solution and Robustness

Block Coordinate Descent for Regularized Multi-convex Optimization

EUSIPCO

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

Sparse Gaussian conditional random fields

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing

Rank Selection in Low-rank Matrix Approximations: A Study of Cross-Validation for NMFs

Non-Negative Matrix Factorization

Quick Introduction to Nonnegative Matrix Factorization

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation

The Alternating Direction Method of Multipliers

Generalized Nonnegative Matrix Approximations with Bregman Divergences

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy

Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels

Recommendation Systems

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION

2 GU, ZHOU: NEIGHBORHOOD PRESERVING NONNEGATIVE MATRIX FACTORIZATION graph regularized NMF (GNMF), which assumes that the nearby data points are likel

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18

Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

NONNEGATIVE matrix factorization (NMF) is a

A Purely Geometric Approach to Non-Negative Matrix Factorization

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Introduction to Alternating Direction Method of Multipliers

Supervised Nonnegative Matrix Factorization to Predict ICU Mortality Risk

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Linearly-solvable Markov decision problems

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

On Optimal Frame Conditioners

Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization

Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms

Coupled Dictionary Learning for Unsupervised Feature Selection

Data Mining and Matrices

ONLINE NONNEGATIVE MATRIX FACTORIZATION BASED ON KERNEL MACHINES. Fei Zhu, Paul Honeine

ECS289: Scalable Machine Learning

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation

arxiv: v2 [math.oc] 6 Oct 2011

Single-channel source separation using non-negative matrix factorization

PROJECTIVE NON-NEGATIVE MATRIX FACTORIZATION WITH APPLICATIONS TO FACIAL IMAGE PROCESSING

Non-negative matrix factorization and term structure of interest rates

Analysis of Robust PCA via Local Incoherence

Dimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification

Jordan Journal of Mathematics and Statistics (JJMS) 5(3), 2012, pp A NEW ITERATIVE METHOD FOR SOLVING LINEAR SYSTEMS OF EQUATIONS

Non-linear Dimensionality Reduction

DID: Distributed Incremental Block Coordinate Descent for Nonnegative Matrix Factorization

A Unified Approach to Proximal Algorithms using Bregman Distance

Linear & nonlinear classifiers

Robust Sparse Recovery via Non-Convex Optimization

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Algorithm-Hardware Co-Optimization of Memristor-Based Framework for Solving SOCP and Homogeneous QCQP Problems

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

A Compressive Sensing Based Compressed Neural Network for Sound Source Localization

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Linear & nonlinear classifiers

An Optimization-based Approach to Decentralized Assignability

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

Adaptive Primal Dual Optimization for Image Processing and Learning

Fast Coordinate Descent methods for Non-Negative Matrix Factorization

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Transcription:

Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD, La Jolla, CA, 9093-0 dmeyer@math.ucsd.edu NEC Labs America, Princeton, NJ 08540 renqiang@nec-labs.com Abstract Nonnegative matrix factorization NMF, which aims to approximate a data matrix with two nonnegative low rank matrix factors, is a popular dimensionality reduction and clustering technique. Due to the non-convex formulation and the nonnegativity constraints over the two low rank matrix factors with rank r > 0, it is often difficult to solve NMF efficiently and accurately. Recently, the alternating direction method of multiplier was shown to be more accurate and efficient than classical approaches such as the multiplicative update rule or alternating least square. Nevertheless, the computation of is proportional to the cube of the rank r because of the underlying matrix inverse problem and thus may be inefficient when r is relatively large. In this paper, we propose a rank-one to address this problem. In each step, we search for a rank-one solution of NMF based upon and utilize greedy search to obtain the low rank matrix factors. In this way, rank-one avoids the matrix inverse problem and the computation is only linearly proportional to r. Thorough empirical studies demonstrate that rank-one is more efficient and accurate than baseline approaches. Introduction In the past decade, nonnegative matrix factorization NMF [5][6] and its extensions have been widely applied for various applications, e.g., face recognition [][3], scene classification [0], social network analysis [][], bioinformatics [9], etc. Essentially, NMF aims to find two nonnegative low rank matrix factors U R n r and V R r d so as to reconstruct a data matrix X R n d of n samples and d features, i.e., min U,V X UV F s.t. U 0, V 0 where r denotes rank, F is the Frobenius norm, and the inequalities are element-wise. Since NMF is non-convex and U and V are constrained to be nonnegative, it is difficult to solve NMF accurately and efficiently. To address this issue, Lee and Seung [6] developed multiplicative update rule which is guaranteed to obtain a locally optimal solution. Since converges very slowly, alternating least square [8] and projected gradient method [7] were subsequently developed. Recently, alternating direction method of multiplier [] has shown its superiority over and with respect to both reconstruction accuracy and efficiency [5][4][3]. Nevertheless, since the computation of involves a matrix inverse problem, the complexity of is proportional to the cube of the rank r of two low rank matrix factors and may be inefficient when r is relatively large. To resolve this issue, we propose rank-one in this paper. The main idea

is to find a rank-one solution of NMF based upon and utilize greedy search to obtain the low rank matrix factors. In this way, rank-one avoids the matrix inverse problem and the computation is only linearly proportional to r. Related Work In this section, we briefly introduce three representative approaches for optimizating NMF, i.e., multiplicative update rule [6], alternating least square [8], and alternating direction method of multiplier []. Notations: Given a matrix X R n d of n rows and d columns, we use x j R n to denote its j-th column, use x i R d to denote its i-th row, and use X ij to denote the entry in i-th row and j-th column of X.. Multiplicative Update Rule proposed by Lee and Seung [6] is the most popular method for NMF and can be written as: XV T ik U ik U ik UV V T ik and U T X kj V kj V kj U T. UV kj 3 By using with nonnegative initializations of U and V, both U and V will remain nonnegative throughout iterations. Since converges slowly in many real world applications, [8] can be utilized as the substitution.. Alternating Least Square developed by Paatero and Taper [8] minimizes the least square cost function in Eq. with respect to U and V, iteratively. The procedure is given as and U = P + XV T V V T 4 V = P + U T U U T X, 5 where denotes pseudo-inverse and P + projects the matrix onto a nonnegative set, i.e., P + U = maxu, 0. Although has shown its efficiency for NMF, recent research indicates that alternating direction method of multiplier [] is more effective and efficient than [5][4]..3 Alternating Direction Method of Multipliers The alternating direction method of multipliers [] aims to solve convex optimization problems by splitting them into smaller pieces such that each part of them is easier to handle. It could also be used for non-convex problems such as NMF [5][4][3]. Specifically, introduces two auxiliary variables S and T and considers the following equivalent formulation: min U,V,S,T X UV F 6 s.t. U S = 0, V T = 0, S 0, T 0. The augmented Lagrangian function of Eq. 6 is given as: LU, V, S, T, Λ, Π = X UV F + Λ, U S + Π, V T + ρ U S F + ρ V T F 7 where Λ R n r and Π R r d are Lagrange multipliers, is the matrix inner product, and ρ > 0 is the penalty parameter for the constraints. By minimizing L with respect to U, V, S, T, Λ, and Π, we can obtain the closed form solution for each step as shown in appendix Algorithm. Note that the main computations in classical are the inversion of an r r matrix and XV T or U T X. When r is relatively large e.g., 500 or 000 in some real world applications, the matrix inverse problem will consume substantial time at each step and thus will converge slowly. To resolve this issue, rank-one is developed.

Algorithm : Rank-one for NMF Input: X R n d, u i+ R n, v i+ R d, s i+ R n, t i+ R d, p R n, q R d, ρ Output: U and V : Set X = X; : Calculate u 0, v 0 based upon 3: For i = 0 to r, 4: Set k = 0 index of iteration 5: X = X uiv i 6: Repeat: [ T ] [ T ] 7: u i+k + = X v k i+ + ρsi+k pk / v i+ k v i+ k + ρ [ T ] [ T ] 8: v i+ k + = u i+k + X + ρt i+ k qk / u i+k + ui+k + + ρ 9: s i+k + = P + u i+k + + pk/ρ 0: t i+ k + = P + v i+ k + + qk/ρ : pk + = pk + ρ u i+k + s i+k + : qk + = qk + ρ v i+ k + t i+ k + 3: k = k + 4: Until convergence. 5: U = [U, u i+k + ], V = [V ; v i+ k + ], S = [S, s i+k + ], T = [T ; t i+ k + ]. 6: End for. 3 Rank-one for NMF In rank-one, we successively search for rank-one solution to NMF based upon and utilize greedy search to obtain the low rank matrix factors U R n r and V R r d. For initialization i = 0, we can adopt classical to obtain a rank-one solution of NMF. For the i + step, since U R n i and V R i d are already known and can be written as concatenations of column vectors U = [u,..., u i ] and row vectors V = [v ;...; v i ] respectively, rank-one can be formulated as: min X [u,..., u i, u i+ ][v ;...; v r ; v i+ ] u i+,v i+,s i+,t i+ F 8 s.t. u i+ s i+ = 0, v i+ t i+ = 0, s i+ 0, t i+ 0. Since we have known U = [u,..., u i ] and V = [v ;...; v i ], Eq. 8 can be simplified as: min u i+,v i+,s i+,t i+ X u i+ v i+ F s.t. u i+ s i+ = 0, v i+ t i+ = 0, s i+ 0, t i+ 0, where X = X UV. The augmented Lagrangian of Eq. 0 can be written as: Lu i+, v i+, s i+, t i+, p, q = X u i+ v i+ + p, u i+ s i+ + q, v i+ t i+ + F ρ u i+ s i+ + ρ vi+ t i+. 0 where p R n and q R d are Lagrangian multipliers, and ρ > 0 is the penalty parameter for the constraints. By minimizing L with respect to u i+, v i+, s i+, t i+, p, and q, we can obtain the closed form solution of each step as shown in Algorithm. In Algorithm, u i+ and v i+ are randomly initialized. s i+, t i+, p, and q are set to zero vectors of appropriate lengths. The stopping criterion for the inner loop of rank-one is met if the objective in Eq. 8 does not improve relative to a tolerance value. In the inner loop, the main computation are the u i+ k + updating and v i+ k + updating with computational complexity of Ond + and On + d, respectively. Assuming that on 9 3

Time seconds 300 50 00 50 00 50 0 Rank r a Runing time seconds vs. rank Accuracy 0.45 0.4 0.35 0.3 0.5 0. Rank r b Accuracy vs. rank NMI 0.6 0.4 0.3 Rank r c NMI vs. rank Figure : Experiment results on UMIST dataset. Time seconds 000 800 600 400 00 Accuracy 0.65 0.6 5 NMI 0.8 0.75 0.7 0.65 0.6 5 0 Rank r a Runing time seconds vs. rank 0.45 Rank r b Accuracy vs. rank Rank r c NMI vs. rank Figure : Experiment results on Coil0 dataset. average the inner loop takes k iterations to achieve convergence, the total computational complexity of rank-one will be Ondrk + nrk + drk which eliminates the matrix inverse problem in classical as shown in Algorithm of Appendix. 4 Experiments In this section, we evaluate the effectiveness and efficiency of our proposed approach by comparing with three baseline approaches, i.e.,,, and. In particular, we implement all four approaches over two publicly available datasets, i.e., UMIST and Coil0. UMIST dataset contains 575 size of 40 40 face images of 0 different persons. Coil0 dataset includes 440 size of 3 3 object images of 0 different classes. The experiment results over UMIST and Coil0 are shown in Figure and Figure, respectively. Figure a and a show the running time of each approach for convergence versus the rank r. we observe that when the rank r varies from 00 to 000, rank-one generally converges faster than other approaches, especially when r is large. This is because the computation of and is proportional to the cube of the rank due to the underlying matrix inverse problem; while the computation of rank-one is only linearly proportional to the cube of the rank. Moreover, we evaluate effectiveness of the low rank representations of four approaches based upon clustering accuracy and normalized mutual information NMI in Figure b, Figure c, Figure b, and Figure c. We observe that rank-one generally achieves better accuracy and NMI than the other approaches. This may because rank-one tends to yield more sparse low rank representations than other approaches. 5 Conclusion In this paper, we proposed a rank-one alternating direction method of multiplier for nonnegative matrix factorization NMF. In each step, rank-one seeks for a rank-one solution of NMF based upon and utilizes greedy search to obtain low rank matrix factors. In this way, rank-one avoids the underlying matrix inverse problem and the computation is only linearly proportional to the rank r. We conducted empirical studies based upon two publicly available datasets, i.e., UMIST and Coil0. Our results demonstrated that rank-one is more efficient and effective than,, and traditional. 4

References [] Boyd, S., Parikh, S., Chu, E., Peleato, B., and Eckstein, J. 0 Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends Rin Machine Learning, 3: -. [] Guan, N., Tao, D., Luo, Z., Yuan, B. 0 Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Transactions on Image Processing, vol. 0, no. 7, pp. 030-048. [3] Guan, N., Tao, D., Luo, and Shawe-Taylor, J. 0 MahNMF: Manhattan non-negative matrix factorization. arxiv preprint arxiv:07.3438 [4] Kim, J. and Park, H. 0 Fast nonnegative matrix factorization: An active-set-like method and comparisons. SIAM Journal on Scientific Computing 336: 36-38, 0. [5] Lee, D. and Seung, H. 999 Learning the parts of objects by non-negative matrix factorization. Nature 40:788-799. [6] Lee, D. and Seung, H. 00 Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems 3, pp. 556-56. Cambridge, MA: MIT Press. [7] Lin, C.-J. 007 Projected gradient methods for nonnegative matrix factorization. Neural Computation 90:756-779. [8] Paatero, P. and Tapper, U. 994 Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5-6. [9] Min, M. R., Ning, X., Cheng, C., and Gerstein, M. 04 Interpretable sparse high-order Boltzmann machines. The 7th International Conference on Artificial Intelligence and Statistics. [0] Song, D. and Tao, D. 00 Biologically inspired feature manifold for scene classification. IEEE Transactions on Image Processing, vol. 9, no., pp. 74-84. [] Song, D. and David, A. M. 04 A model of consistent node types in signed directed social networks. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining ASONAM, pp. 7-80. [] Song, D. and David, A. M. 04 Recommending positive links in signed social networks by optimizing a generalized AUC. The Twenty-Ninth AAAI Conference on Artificial Intelligence AAAI [3] Sun, L. and Févotte, C., 04 Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. IEEE International Conference on Acoustics, Speech, and Signal Processing. [4] Xu, Y., Yin, W., Wen, Z., and Zhang, Y. 0 An alternating direction algorithm for matrix completion with nonnegative factors. Frontiers of Mathematics in China, 7: 365-384. [5] Zhang, Y. 00. An alternating direction algorithm for nonnegative matrix factorization. Technical Report, Rice University. 5

Appendix Algorithm : for NMF : Input: X R n d, U R n r, V R r d, S R n r, T R r d, ρ : Output: U and V 3: Set k = 0 index of iteration 4: Repeat: 5: Uk + = XV k T + ρsk Λk V kv k T + ρi 6: V k + = Uk + T Uk + + ρi 7: Sk + = P + Uk + + Λk/ρ 8: T k + = P + V k + + Πk/ρ 9: Λk + = Λk + ρ Uk + Sk + 0: Πk + = Πk + ρ V k + T k + : k = k +. : Until convergence. Uk + T X + ρt k Πk In Algorithm, U and V are randomly initialized. S, T, Λ, and Π are set to zero matrices of appropriate sizes. The stopping criterion for is met if the objective in Eq. 6 does not improve relative to a tolerance value. In each step, the main computation are Uk + updating and V k + updating, which are Ondr +dr +r 3 and Ondr + nr + r 3, respectively. Assuming takes k iteration to achieve convergence, the total computational complexity for will be Ondrk + nrk + drk + kr 3. 6