Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD, La Jolla, CA, 9093-0 dmeyer@math.ucsd.edu NEC Labs America, Princeton, NJ 08540 renqiang@nec-labs.com Abstract Nonnegative matrix factorization NMF, which aims to approximate a data matrix with two nonnegative low rank matrix factors, is a popular dimensionality reduction and clustering technique. Due to the non-convex formulation and the nonnegativity constraints over the two low rank matrix factors with rank r > 0, it is often difficult to solve NMF efficiently and accurately. Recently, the alternating direction method of multiplier was shown to be more accurate and efficient than classical approaches such as the multiplicative update rule or alternating least square. Nevertheless, the computation of is proportional to the cube of the rank r because of the underlying matrix inverse problem and thus may be inefficient when r is relatively large. In this paper, we propose a rank-one to address this problem. In each step, we search for a rank-one solution of NMF based upon and utilize greedy search to obtain the low rank matrix factors. In this way, rank-one avoids the matrix inverse problem and the computation is only linearly proportional to r. Thorough empirical studies demonstrate that rank-one is more efficient and accurate than baseline approaches. Introduction In the past decade, nonnegative matrix factorization NMF [5][6] and its extensions have been widely applied for various applications, e.g., face recognition [][3], scene classification [0], social network analysis [][], bioinformatics [9], etc. Essentially, NMF aims to find two nonnegative low rank matrix factors U R n r and V R r d so as to reconstruct a data matrix X R n d of n samples and d features, i.e., min U,V X UV F s.t. U 0, V 0 where r denotes rank, F is the Frobenius norm, and the inequalities are element-wise. Since NMF is non-convex and U and V are constrained to be nonnegative, it is difficult to solve NMF accurately and efficiently. To address this issue, Lee and Seung [6] developed multiplicative update rule which is guaranteed to obtain a locally optimal solution. Since converges very slowly, alternating least square [8] and projected gradient method [7] were subsequently developed. Recently, alternating direction method of multiplier [] has shown its superiority over and with respect to both reconstruction accuracy and efficiency [5][4][3]. Nevertheless, since the computation of involves a matrix inverse problem, the complexity of is proportional to the cube of the rank r of two low rank matrix factors and may be inefficient when r is relatively large. To resolve this issue, we propose rank-one in this paper. The main idea

is to find a rank-one solution of NMF based upon and utilize greedy search to obtain the low rank matrix factors. In this way, rank-one avoids the matrix inverse problem and the computation is only linearly proportional to r. Related Work In this section, we briefly introduce three representative approaches for optimizating NMF, i.e., multiplicative update rule [6], alternating least square [8], and alternating direction method of multiplier []. Notations: Given a matrix X R n d of n rows and d columns, we use x j R n to denote its j-th column, use x i R d to denote its i-th row, and use X ij to denote the entry in i-th row and j-th column of X.. Multiplicative Update Rule proposed by Lee and Seung [6] is the most popular method for NMF and can be written as: XV T ik U ik U ik UV V T ik and U T X kj V kj V kj U T. UV kj 3 By using with nonnegative initializations of U and V, both U and V will remain nonnegative throughout iterations. Since converges slowly in many real world applications, [8] can be utilized as the substitution.. Alternating Least Square developed by Paatero and Taper [8] minimizes the least square cost function in Eq. with respect to U and V, iteratively. The procedure is given as and U = P + XV T V V T 4 V = P + U T U U T X, 5 where denotes pseudo-inverse and P + projects the matrix onto a nonnegative set, i.e., P + U = maxu, 0. Although has shown its efficiency for NMF, recent research indicates that alternating direction method of multiplier [] is more effective and efficient than [5][4]..3 Alternating Direction Method of Multipliers The alternating direction method of multipliers [] aims to solve convex optimization problems by splitting them into smaller pieces such that each part of them is easier to handle. It could also be used for non-convex problems such as NMF [5][4][3]. Specifically, introduces two auxiliary variables S and T and considers the following equivalent formulation: min U,V,S,T X UV F 6 s.t. U S = 0, V T = 0, S 0, T 0. The augmented Lagrangian function of Eq. 6 is given as: LU, V, S, T, Λ, Π = X UV F + Λ, U S + Π, V T + ρ U S F + ρ V T F 7 where Λ R n r and Π R r d are Lagrange multipliers, is the matrix inner product, and ρ > 0 is the penalty parameter for the constraints. By minimizing L with respect to U, V, S, T, Λ, and Π, we can obtain the closed form solution for each step as shown in appendix Algorithm. Note that the main computations in classical are the inversion of an r r matrix and XV T or U T X. When r is relatively large e.g., 500 or 000 in some real world applications, the matrix inverse problem will consume substantial time at each step and thus will converge slowly. To resolve this issue, rank-one is developed.

Algorithm : Rank-one for NMF Input: X R n d, u i+ R n, v i+ R d, s i+ R n, t i+ R d, p R n, q R d, ρ Output: U and V : Set X = X; : Calculate u 0, v 0 based upon 3: For i = 0 to r, 4: Set k = 0 index of iteration 5: X = X uiv i 6: Repeat: [ T ] [ T ] 7: u i+k + = X v k i+ + ρsi+k pk / v i+ k v i+ k + ρ [ T ] [ T ] 8: v i+ k + = u i+k + X + ρt i+ k qk / u i+k + ui+k + + ρ 9: s i+k + = P + u i+k + + pk/ρ 0: t i+ k + = P + v i+ k + + qk/ρ : pk + = pk + ρ u i+k + s i+k + : qk + = qk + ρ v i+ k + t i+ k + 3: k = k + 4: Until convergence. 5: U = [U, u i+k + ], V = [V ; v i+ k + ], S = [S, s i+k + ], T = [T ; t i+ k + ]. 6: End for. 3 Rank-one for NMF In rank-one, we successively search for rank-one solution to NMF based upon and utilize greedy search to obtain the low rank matrix factors U R n r and V R r d. For initialization i = 0, we can adopt classical to obtain a rank-one solution of NMF. For the i + step, since U R n i and V R i d are already known and can be written as concatenations of column vectors U = [u,..., u i ] and row vectors V = [v ;...; v i ] respectively, rank-one can be formulated as: min X [u,..., u i, u i+ ][v ;...; v r ; v i+ ] u i+,v i+,s i+,t i+ F 8 s.t. u i+ s i+ = 0, v i+ t i+ = 0, s i+ 0, t i+ 0. Since we have known U = [u,..., u i ] and V = [v ;...; v i ], Eq. 8 can be simplified as: min u i+,v i+,s i+,t i+ X u i+ v i+ F s.t. u i+ s i+ = 0, v i+ t i+ = 0, s i+ 0, t i+ 0, where X = X UV. The augmented Lagrangian of Eq. 0 can be written as: Lu i+, v i+, s i+, t i+, p, q = X u i+ v i+ + p, u i+ s i+ + q, v i+ t i+ + F ρ u i+ s i+ + ρ vi+ t i+. 0 where p R n and q R d are Lagrangian multipliers, and ρ > 0 is the penalty parameter for the constraints. By minimizing L with respect to u i+, v i+, s i+, t i+, p, and q, we can obtain the closed form solution of each step as shown in Algorithm. In Algorithm, u i+ and v i+ are randomly initialized. s i+, t i+, p, and q are set to zero vectors of appropriate lengths. The stopping criterion for the inner loop of rank-one is met if the objective in Eq. 8 does not improve relative to a tolerance value. In the inner loop, the main computation are the u i+ k + updating and v i+ k + updating with computational complexity of Ond + and On + d, respectively. Assuming that on 9 3

Time seconds 300 50 00 50 00 50 0 Rank r a Runing time seconds vs. rank Accuracy 0.45 0.4 0.35 0.3 0.5 0. Rank r b Accuracy vs. rank NMI 0.6 0.4 0.3 Rank r c NMI vs. rank Figure : Experiment results on UMIST dataset. Time seconds 000 800 600 400 00 Accuracy 0.65 0.6 5 NMI 0.8 0.75 0.7 0.65 0.6 5 0 Rank r a Runing time seconds vs. rank 0.45 Rank r b Accuracy vs. rank Rank r c NMI vs. rank Figure : Experiment results on Coil0 dataset. average the inner loop takes k iterations to achieve convergence, the total computational complexity of rank-one will be Ondrk + nrk + drk which eliminates the matrix inverse problem in classical as shown in Algorithm of Appendix. 4 Experiments In this section, we evaluate the effectiveness and efficiency of our proposed approach by comparing with three baseline approaches, i.e.,,, and. In particular, we implement all four approaches over two publicly available datasets, i.e., UMIST and Coil0. UMIST dataset contains 575 size of 40 40 face images of 0 different persons. Coil0 dataset includes 440 size of 3 3 object images of 0 different classes. The experiment results over UMIST and Coil0 are shown in Figure and Figure, respectively. Figure a and a show the running time of each approach for convergence versus the rank r. we observe that when the rank r varies from 00 to 000, rank-one generally converges faster than other approaches, especially when r is large. This is because the computation of and is proportional to the cube of the rank due to the underlying matrix inverse problem; while the computation of rank-one is only linearly proportional to the cube of the rank. Moreover, we evaluate effectiveness of the low rank representations of four approaches based upon clustering accuracy and normalized mutual information NMI in Figure b, Figure c, Figure b, and Figure c. We observe that rank-one generally achieves better accuracy and NMI than the other approaches. This may because rank-one tends to yield more sparse low rank representations than other approaches. 5 Conclusion In this paper, we proposed a rank-one alternating direction method of multiplier for nonnegative matrix factorization NMF. In each step, rank-one seeks for a rank-one solution of NMF based upon and utilizes greedy search to obtain low rank matrix factors. In this way, rank-one avoids the underlying matrix inverse problem and the computation is only linearly proportional to the rank r. We conducted empirical studies based upon two publicly available datasets, i.e., UMIST and Coil0. Our results demonstrated that rank-one is more efficient and effective than,, and traditional. 4

References [] Boyd, S., Parikh, S., Chu, E., Peleato, B., and Eckstein, J. 0 Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends Rin Machine Learning, 3: -. [] Guan, N., Tao, D., Luo, Z., Yuan, B. 0 Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Transactions on Image Processing, vol. 0, no. 7, pp. 030-048. [3] Guan, N., Tao, D., Luo, and Shawe-Taylor, J. 0 MahNMF: Manhattan non-negative matrix factorization. arxiv preprint arxiv:07.3438 [4] Kim, J. and Park, H. 0 Fast nonnegative matrix factorization: An active-set-like method and comparisons. SIAM Journal on Scientific Computing 336: 36-38, 0. [5] Lee, D. and Seung, H. 999 Learning the parts of objects by non-negative matrix factorization. Nature 40:788-799. [6] Lee, D. and Seung, H. 00 Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems 3, pp. 556-56. Cambridge, MA: MIT Press. [7] Lin, C.-J. 007 Projected gradient methods for nonnegative matrix factorization. Neural Computation 90:756-779. [8] Paatero, P. and Tapper, U. 994 Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5-6. [9] Min, M. R., Ning, X., Cheng, C., and Gerstein, M. 04 Interpretable sparse high-order Boltzmann machines. The 7th International Conference on Artificial Intelligence and Statistics. [0] Song, D. and Tao, D. 00 Biologically inspired feature manifold for scene classification. IEEE Transactions on Image Processing, vol. 9, no., pp. 74-84. [] Song, D. and David, A. M. 04 A model of consistent node types in signed directed social networks. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining ASONAM, pp. 7-80. [] Song, D. and David, A. M. 04 Recommending positive links in signed social networks by optimizing a generalized AUC. The Twenty-Ninth AAAI Conference on Artificial Intelligence AAAI [3] Sun, L. and Févotte, C., 04 Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. IEEE International Conference on Acoustics, Speech, and Signal Processing. [4] Xu, Y., Yin, W., Wen, Z., and Zhang, Y. 0 An alternating direction algorithm for matrix completion with nonnegative factors. Frontiers of Mathematics in China, 7: 365-384. [5] Zhang, Y. 00. An alternating direction algorithm for nonnegative matrix factorization. Technical Report, Rice University. 5

Appendix Algorithm : for NMF : Input: X R n d, U R n r, V R r d, S R n r, T R r d, ρ : Output: U and V 3: Set k = 0 index of iteration 4: Repeat: 5: Uk + = XV k T + ρsk Λk V kv k T + ρi 6: V k + = Uk + T Uk + + ρi 7: Sk + = P + Uk + + Λk/ρ 8: T k + = P + V k + + Πk/ρ 9: Λk + = Λk + ρ Uk + Sk + 0: Πk + = Πk + ρ V k + T k + : k = k +. : Until convergence. Uk + T X + ρt k Πk In Algorithm, U and V are randomly initialized. S, T, Λ, and Π are set to zero matrices of appropriate sizes. The stopping criterion for is met if the objective in Eq. 6 does not improve relative to a tolerance value. In each step, the main computation are Uk + updating and V k + updating, which are Ondr +dr +r 3 and Ondr + nr + r 3, respectively. Assuming takes k iteration to achieve convergence, the total computational complexity for will be Ondrk + nrk + drk + kr 3. 6