Matrix Co-Factorization on Compressed Sensing

Size: px

Start display at page:

Download "Matrix Co-Factorization on Compressed Sensing"

Alexander Ray
5 years ago
Views:

1 Matrix Co-Factorization on Compressed Sensing Jiho Yoo 1 and Seungjin Choi 1,2 1 Department of Computer Science 2 Division of IT Convergence Engineering Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu, Pohang , Korea zentasis, seungjin@postech.ac.kr Astract In this paper we address the prolem of matrix factorization on compressively-sampled measurements which are otained y random projections. While this approach improves the scalaility of matrix factorization, its performance is not satisfactory. We present a matrix co-factorization method where compressed measurements and a small numer of uncompressed measurements are jointly decomposed, sharing a factor matrix. We evaluate the performance of three matrix factorization methods in terms of Cramér-Rao ounds, including: (1) matrix factorization on uncompressed data (MF); (2) matrix factorization on compressed data (CS- MF); (3) matrix co-factorization on compressed and uncompressed data (CS-MCF). Numerical experiments demonstrate that CS-MCF improves the performance of CS-MF, emphasizing the useful ehavior of exploiting side information (a small numer of uncompressed measurements). 1 Introduction Matrix factorization is a decomposition method where a data matrix R M N is approximated as a product of two or more factor matrices which uncover the latent structure of the data. The 2-factor decomposition seeks latent factor matrices U R M K andv R N K which minimize the following ojective function J MF = 1 2 UV 2 F, (1) where F denotes the Froenious norm. When columns in are treated as data points in M-dimensional space, U and V are referred to e as asis matrix and encoding matrix, respectively. Exemplary matrix factorization methods include principal component analysis (PCA) [Jolliffe, 2002], nonnegative matrix factorization (NMF) [Lee and Seung, 1999], proailistic matrix factorization (PMF) [Srero et al., 2005], to name a few. Matrix factorization has estalished itself as a powerful technique in various applications such as collaorative prediction [Koren et al., 2009], document clustering [u et al., 2003; Yoo and Choi, 2010], music transcription [Smaragdis and Brown, 2003], and rain wave analysis [Lee and Choi, 2009]. In some of these applications such as collaorative prediction, the data matrix is sparse since most of entries are missing or unoserved. Thus, matrix factorization methods can e applied to a large scale prolem without much extra care, since only oserved elements are used for learning. On the other hand, data matrix is dense in some applications where spectrograms of music or rain wave data are analyzed. In such applications, it is not easy to handle large-scale data. Compressed sensing is a new sensing/sampling paradigm which ensures near-optimal recovery of sparse signals from a small numer of linear measurements [Donoho, 2006; Candes and Tao, 2006]. Compressed sensing is a promising approach to handling a large-scale dense data matrix since it is allowed to perform a processing (learning or data analysis) in the compressed domain (in which a compressivelysampled matrix is much smaller than the original input data matrix in size) rather than in the amient space. Let Y R P N e the compressively-sampled data matrix which is otained y Y = Φ, in which Φ R P M is a linear sensing matrix which is often given y a random projection. A direct application of the compressed sensing framework to matrix factorization, referred to as CS-MF, yields the following ojective function J CS-MF = 1 2 Y ΦUV 2 F. (2) CS-MF determines U and V which minimize (2), improving the scalaility since Y and Φ requires less space in storage, compared to. Recently, nonnegative matrix factorization (NMF) in the compressed domain, referred to as CS-NMF, was presented in [O Grady and Rickard, 2008], where (2) is minimized with nonnegativity constraints imposed on factor matrices U and V as well as the data matrix. In contrast to the standard compressed sensing framework where U is known in advance, CS-MF or CS-NMF estimates othu and V in the compressed-domain, which degrades the reconstruction performance. Matrix co-factorization (MCF) is a technique that decompose two or more data matrices jointly, sharing some factor matrices in factorizations. In addition to the data matrix, side information matrices are also considered for the joint decomposition, including lael information [Yu et al., 2005; Lee et al., 2010], link information [Zhu et al., 2007], intersuject variations [Lee and Choi, 2009], and spectrograms

2 of music [Yoo et al., 2010]. MCF was also applied to the general entity-relationship models [Singh and Gordon, 2008; Yoo and Choi, 2009] to learn the characteristics of each entity from the relationships. In this paper we present a method of MCF on compressively-sampled data, referred to as CS- MCF, where we decompose the compressed data matrix and a small portion of uncompressed original data jointly in order to improve the performance of matrix factorization in the compressed domain. We compute the Cramér-Rao ound (CRB) for matrix factorization on uncompressed data (MF), CS-MF, and CS-MCF in the case of Gaussian likelihood, showing that CS-MCF indeed improves the performance over CS-MF in terms of the reconstruction quality. We consider alternating least squares (ALS) algorithms for MF, CS-MF, and CS-MCF. In addition, we also consider multiplicative updates, as in NMF, comparing NMF, CS-NMF (NMF in the compressed domain), and CS-NMCF (nonnegative matrix cofactorization in the compressed domain with small uncompressed information). 2 Matrix Co-Factorization on Compressively Sampled Data Compressed sensing [Donoho, 2006; Candes and Tao, 2006] is a new framework which performs the measurement and compression simultaneously to reduce computation required in the measurement process. The fundamental theoretical results of compressed sensing is ased on the sparsity of the representation of given data, which makes possile to reduce the required numer of measurement less than the amount computed from the classical Shannon-Nyquist sampling theorem. Compressed sensing assumes that the asis which sparsely represents the given data is known, however, the appropriate representation asis for given data is usually not known in advance. In that cases we have to learn a representation matrix from a given data, however, the prolem of learning representation asis from the compressed measurement has not een studied much. Matrix factorization in the compressed domain considers a decomposition of the compressively-sampled data matrix Y = Φ where Φ R P M is a random projection. Thus given and Φ, CS-MF seeks Y ΦUV, determining factor matrices U and V which minimizes (2). In contrast to the compressed sensing framework where the asis matrix U is known in advance, CS-MF should estimate oth U and V given compressed data Y. We elieve that this explains the poor reconstruction performance of CS-MF in our experiments. Now we explain matrix co-factorization in the compressed domain with exploiting partial uncompressed data. We partition the data matrix R M N into two su-matrices c R M C and u R M (N C). Assuming that C > (N C), the su-matrix c is compressed y a random projection Φ, leading to Y = Φ c with the ause of notation. CS-MCF considers a joint decomposition of the compressed data matrix Y and partial uncompressed data u, minimizing the following ojective function: J CS-MCF = 1 2 Y ΦUV 2 F + λ 2 u UW 2 F, (3) c Y Φ U u u = = V U W Figure 1: Pictorial illustration of CS-MCF. The input data matrix is partitioned into c (to e compressed) and u (partial uncompressed data). Then Y = Φ c and u are jointly decomposed, sharing the factor matrix U, leading to Y ΦUV and u UW. where the factor matrix U R M K is shared, capturing the common characteristics in representing oth Y and u, and V R C K and W R (N C) K capture individual characteristics. The parameter λ controls the alance etween two decomposition. Pictorial illustration of CS-MCF is given in Fig ALS Updates ALS updates for U, V, W are easily derived y solving U V W = 0, = 0, = 0, for U,V,W, respectively, and the corresponding gradients of (3) are calculated as U V W leading to ALS updates = Φ Y V +Φ ΦUV V λ u W +λuw W, = Y ΦU +V U Φ ΦU, = u U +WU U, vec(u) ((V V ) (Φ Φ)+λ(W W) I M ) 1 vec(φ Y V +λ u W), V Y ΦU(U Φ ΦU) 1, W u U(U U) 1, wherei M is the identity matrix with sizem M, vec(u) = [u 1,...,u K ] is the vec operator which creates a column vector from the matrix U y stacking columns of U, and denotes the Kronecker product (tensor product).

3 Tale 1: Algorithm description for ALS updates and multiplicative updates. Model Updates Model Updates MF U V (V V ) 1 NMF U U UV V V V U(U U) 1 V V U V U U CS-MF vec(u) ((V V ) (Φ Φ)) 1 vec(φ Y V ) CS-NMF U U Φ Y V Φ ΦUV V V Y ΦU(U Φ ΦU) 1 V V Y ΦU V U Φ ΦU CS-MCF vec(u) ((V V ) (Φ Φ)+λ(W W) I M ) 1 CS-NMCF U U Φ Y V +λ u W Φ ΦUV V +λuw W vec(φ Y V +λ u W) V Y ΦU(U Φ ΦU) 1 V V Y ΦU V U Φ ΦU W u U(U U) 1 W W u U WU U 2.2 Multiplicative Updates As in NMF, we derive multiplicative updates for U,V,W which iteratively minimize (3) with nonnegativity constraints imposed on factor matrices, given the nonnegative data matrix. Our derivation follows the technique used in [Yoo and Choi, 2008] where the gradient of an ojective functionj is decomposed as J Θ = [ ] + [ ] J J, Θ Θ where [ ] L + [ Θ > 0 and L Θ] > 0, and then multiplicative updates for parameters Θ are given y Θ Θ [ J ] Θ [ J Θ ] +, where denotes Hadamard product (element-wise product) and the division is done in an element-wise manner. Applying this technique to the minimization of (3) yields Φ Y V +λ u W U U Φ ΦUV V +λuw W, Y ΦU V V V U Φ ΦU, W W u U WU U. ALS or multiplicative updating algorithms for MF and CS- MF are also derived in the same way, which are summarized in Tale 1. 3 Cramér-Rao Bounds We calculate the Cramér-Rao ound (CRB) to evaluate the performance of matrix factorization, CS-MF, and CS-MCF. CRB [Kay, 1993] provide the lower ound of the variance of uniased estimators of a deterministic parameter in the following form, E (θ θ)(θ θ) I 1, (4) whereθ is the estimated parameter of the model, θ is the true value of the parameter, A B denotes that A B is the positive semi-definite matrix, and I is the Fisher information matrix, each element of which is computed y I ij = E 2 logp(x θ) θ i θ j, (5) where p(x θ) is the likelihood. In the case of matrix factorization, we have factor matrices U R M K and V R N K as the parameters for the model, so θ ecomes θ = [ vec(u), vec(v ) ] = [ u 1,u 2,,u M,v 1,v 2,,v N], where u i and v j represent the column vector consist of the elements in the i-th and j-th row of U and V, respectively. Then, the Fisher information matrix has the following representation, [ ] IU I I = UV IUV I V Iu 1 Iu 1u M Iu 1v 1 Iu 1v N = Iu M u 1 Iu M Iu M v 1 Iu M v N Iv 1 u 1 Iv 1 u M Iv 1. Iv 1 v N Iv Nu 1 Iv 1u M Iv Nv 1 Iv N where each part is computed for the corresponding parameter, for example, [Iu i ] k,k [ I u i u i ] k,k = E 2 logp( U,V ) u ik u ik = E 2 logp( U,V ) u ik u i k, (6), (7) where[a] i,j represents the (i,j)-th element in the matrixa. The Fisher information matrix of CS-MCF has a different parametrization with the matrixu,v andw, which is I = I U I UV I UW IUV I V I VW, IUW I V W I W

4 Tale 2: Graphical model, likelihood, Fisher information matrix (FIM) and Cramér-Rao lower ound for matrix factorization (MF), matrix factorization for compressed measurements (CS-MF), and matrix co-factorization for compressed measurements (CS-MCF). MF CS-MF CS-MCF U V U V U V W c u Graphical Model Y Y Likelihood p( U,V ) p(y U,V ) p(y, u U,V,W) = j N(x j Uv j,σ) = j N(y j ΦUv j,σ) = j N(y j ΦUv j,σ ) k N(xu k Uw k,σ a ) FIM I U (V Σ 1 V ) I M (V V ) (Φ Σ 1 Φ) (V V ) (Φ Σ 1 a Φ)+(W Σ 1 W) I M I V (U Σ 1 U) I N (U Φ Σ 1 ΦU) I N (U Φ Σ 1 a ΦU) I C I UV vec(uσ 1 )vec(v ) vec(uφ Σ 1 Φ)vec(V ) vec(u Φ Σ 1 a Φ)vec(V ) I W I UW (U Σ 1 U) I N C vec(uσ 1 )vec(w) I V W 0 CRB (U known) CRB (U unknown) [I V ] 1 [I V ] 1 [ IV I VW [ ] 1 [ ] 1 IU I UV IU I UV IUV I V IUV I V I V W I W ] 1 I U I UV I UW I UV I V I V W I UW I V W I W 1 and each element of the parts is computed in the similar way to (6) and (7). The CRB for matrix factorization, CS-MF and CS-MCF are computed in the following susections. We use the maximum likelihood estimator (MLE), which is known to e asymptotically uniased and efficient under mild regularity conditions [Kay, 1993]. The analysis is ased on these asymptotic properties, assuming that there are sufficient numer of samples. The models, computed Fisher information matrices and the form of CRB are summarized in Tale 2. Since the solution of the matrix factorization is not uniquely determined, the direct comparison etween the estimated parameters and the ground-truth values is meaningless. Instead of directly comparing the parameters themselves, we use the expected reconstruction error ( ij ij ) 2 E ij = E = E (u i v j ũ i ṽj) 2, where ij, ũ i, andṽ j are the ground-truth values of the parameters, and ij is the element of the reconstructed input matrix from the estimated parameters u i and v j. Although the factor matrices is not uniquely determined, the reconstruction errore ij is the same for all the possile decompositions. The lower ound for the reconstruction error is represented y using the variances of the estimators, such as E ij = E (u i v j ũ i v j +ũ i v j ũ i ṽj) 2 = E v j (u i ũ i )(u i ũ i ) v j +E ũ i (v j ṽ j )(v j ṽ j ) ũ i +2E v j (u i ũ i )(v j ṽ j ) ũ i. If we assume that the estimator is uniased, the last term of the aove equation ecomes zero, and the remaining terms are lower-ounded y the computed CRB, E ij E v j [ I 1 ] u i v j +ũ i [ I 1 ] v j ũ i (8) where [ I 1] u i represents the part of the inverse Fisher information matrix corresponding to the position ofiu i. Then,

5 the error ound ecomes E ij ṽ [ j I 1 ] [I 1 u ṽ j + Tr ] [ i u I 1 ] i v j [ I 1 ] v ũ i. (9) j +ũ i Note that if we assume that the asis matrix U is known in advance, we only have to deal with the parameter V. With the computation similar to the aove, the lower ound of the reconstruction error is computed in the following form, E ij ũ [ i I 1 ] v ũ j i, (10) which is smaller than the case when the asis is unknown in the amount of the terms corresponding to U, which is [ ] I 1 [I u ṽ j + Tr ] [ ] 1 i u I 1 i v. j ṽ j 3.1 Cramér-Rao Bounds for Matrix Factorization The Fisher information matrix is computed from the likelihood of the model. If we assume that the noise is additive and distriuted as the Gaussian distriution, the model is written y = UV +N, wheren ij N(0,σ 2 ), andn(µ,σ 2 ) is the Gaussian distriution with mean µ and variance σ 2. Then, the likelihood of the model ecomes p( U,V ) = j N(x j Uv j,σ 2 I K ). Calculation ofi V The derivative of the log-likelihood with respect tov j is computed y logp( U,V ) v j = U Σ 1 x j U Σ 1 Uv j, where Σ = σ 2 I M. If we differentiate aove again with respect tov j, wherej j, the result ecomes zero. Ifj = j, then, 2 logp( U,V ) v j v j = U Σ 1 U. Therefore, the part of Fisher information matrix for V is computed y I V = E (U Σ 1 U) I N = (U Σ 1 U) I N. Calculation ofi U In this case, we use the likelihood of the model having the following form, p( U,V ) = i N(x i V u i,σ 2 I K ), where x i represents the i-th row of the input matrix. The first derivative with respect to u i is computed as logp( U,V ) u i = V Σ 1 x i V Σ 1 V u i. (11) Again, the second derivative ecomes zero if i i, and if i = i, 2 logp( U,V ) u i u i = V Σ 1 V. Therefore, Fisher information matrix foru is written y I U = E (V Σ 1 V ) I M = (V Σ 1 V ) I M. Calculation ofi UV To compute I UV, we differentiate (11) with respect to v j, which ecomes 2 logp( U,V ) u i v j = 1 ( xij σ 2 I K u i v ji K v j u ) i. Since Ex ij I K = u i v ji K, the first two terms of the aove equation vanishes when we take the expectation. Therefore, the Fisher information matrix for U and V is written y I UV = vec(uσ 1 )vec(v ). 3.2 Cramér-Rao Bound for CS-MF We model the CS-MF with the following two-step procedure, = UV +N a, Y = Φ +N, where [N a ] i,j N(0,σa) 2 is the representation error and [N ] i,j N(0,σ 2 ) is the measurement error. The likelihood is written y p(y U,V ) = p(y )p( U,V )d = j N(y j ΦUv j,σ), whereσ = σ 2 a ΦΦ +σ 2 I P, whereφ R P M. Calculation ofi V The derivative of the log-likelihood with respect tov j is computed y logp(y U,V ) v j = U Φ Σ 1 y j U Φ Σ 1 ΦUv j. Differentiate again with v j ecomes zero when j j, and ifj = j, then 2 logp(y U,V ) v j v j = U Φ Σ 1 ΦU. Therefore, the Fisher information matrix for V is computed y = E U Φ Σ 1 ΦU I N I V = U Φ Σ 1 ΦU I N.

6 Calculation ofi U The derivative with respect to u i is written y logp(y U,V ) u i = ) (v j y j Σ 1 φ i v j φ i Σ 1 ΦUv j. j The derivative of aove equation with respect to u i is simplified y 2 logp(y U,V ) u i u i = (Φ Σ 1 Φ) ii V V, which leads the Fisher information matrix foru to e I U = (V V ) (Φ Σ 1 Φ). Calculation ofi UV If we differentiate (12) with respect to v j, we otain 2 logp(y U,V ) u i v j + = φ i Σ 1 y j +I K φ i Σ 1 ΦUv j +I K U Φ Σ 1 φ i v j +. The first two terms disappear when we take the expectation, so the Fisher information matrix foru andv ecomes I UV = vec(u Φ Σ 1 Φ)vec(V ). The theoretical ound of CS-MF is larger than that of matrix factorization. The main difference of the computed Fisher information matrix is from the change of the termσ 1 of matrix factorization to the term Φ Σ 1 Φ. Note that the Σ 1 in the matrix factorization is Σ 1 = σ 2 a I M. (12) However,Φ Σ 1 Φ in the CS-MF is written y Φ Σ 1 Φ = Φ (σ 2 a ΦΦ +σ 2 I P) 1 Φ. This is usually smaller than the corresponding term in the matrix factorization. For example, if we assume thatσ 2 = 0 and each element of sensing matrix is drawn independently from N(0,1), we approximate the term using Φ Φ PI M and ΦΦ MI P, such that Φ Σ 1 Φ Φ (σ 2 ami P ) 1 Φ 1 M σ 2 a Φ Φ P M σ 2 a I M, which is P/M times smaller than that part of matrix factorization (12). Since the CRB is calculated as the inverse of the Fisher information matrix, CS-MF has larger CRB than that of the matrix factorization. 3.3 Cramér-Rao Bound for CS-MCF The relations in CS-MCF are modeled as in the previous section, such that c = UV +N a, Y = Φ c +N, u = UW +N c, where [N a ] i,j N(0,σa), 2 [N c ] i,j N(0,σa) 2 and [N ] i,j N(0,σ 2 ). In this case we have three parameters U, V, andw, so the Fisher information matrix is computed as I = I U J UV I UW IUV I V I V W. IUW I V W I W The likelihood is p(y, u U,V,W) = p(y U,V )p( u U,W) = j N(y j ΦUv j,σ ) l N(x u l Uw l,σ a ), whereσ = σ 2 a ΦΦ +σ 2 I P andσ a = σ 2 a I M. Calculation ofi U Differentiating the log-likelihood with respect tou i leads logp(y, u U,V,W) u i = ( ) v j y j Σ 1 φ i v j φ i Σ 1 ΦUv j j +W Σ 1 a x i W Σ 1 a Wu i. (13) Differentiating again with respect to u i yields 2 logp(y, u U,V,W) u i u i = (Φ Σ 1 Φ) ii V V W Σ 1 a W, wherei = i, and yields 2 logp(y, u U,V,W) u i u i = (Φ Σ 1 Φ) ii V V, wherei i. Therefore, the Fisher information matrix foru ecomes I U = (V V ) (Φ Σ 1 Φ)+(W Σ 1 a W) I M. Calculation ofi V, I W, I UV, andi V W I V, I W, I UV, and I V W are computed in the similar way as in matrix factorization and CS-MF, in the following forms, I V I W I UV I UW = (U Φ Σ 1 ΦU) I C, = (U Σ 1 a U) I N C, = vec(u Φ Σ 1 Φ)vec(V ), = vec(uσ 1 a )vec(w).

7 SNR (db) U-known MF CS-MF CS-MCF U-unknown MF CS-MF CS-MCF SNR (db) U-known NMF CS-NMF CS-NMCF U-unknown NMF CS-NMF CS-NMCF 0 and variance 1. For the multiplicative update algorithms which have the nonnegativity constraints, each element is sampled from a uniform distriution etween 0 and 1. For the co-factorizations, 20 out of 100 columns were chosen for the uncompressed prior. As a measure of the reconstruction performances, we used signal-to-noise ratio (SNR) 2 F SNR = 10log 10, UV 2 F (a) Figure 2: SNR computed from the CRB (dark) and SNR achieved from the algorithms (light), for the (a) ALS algorithms with Gaussian data () Multiplicative algorithms with nonnegative data. In each graph, left three are for the case of known representation U, and right three are for the case of unknownu. Calculation ofi V W Fisher information matrix forv andw,i V W, ecomes zero matrix of dimension CK (N C)K ecause the term W anishes in the derivative with respect tov j. The Fisher information matrix of CS-MCF is the comination of the Fisher information matrix of matrix factorization and Fisher information matrix of CS-MF. The parameter V in matrix factorization and CS-MF is divided into V and W in CS-MCF. The I V and I UV in CS-MCF follows the form of the corresponding parts of CS-MF, ut the I W and I UW follows the form of the corresponding parts of matrix factorization. Moreover, I U is the direct comination of the I U of matrix factorization andi U of CS-MF. In the previous section, we showed that in general the CRB of matrix factorization is smaller than the CRB of CS-MF, so this kind of comination leads smaller CRB than that of CS-MF. 4 Numerical Experiments For the numerical experiments, we computed the ounds of the reconstruction performances using the equation (9), and compared them with the actual reconstruction performances of the algorithms listed in the Tale 1. We also computed the ounds for the cases where we assume that the parameter U is known in advance (10). In that cases, the actual reconstruction performances were measured ased on the modified algorithms, where the parameter U is fixed with the groundtruth values. To compute the CRB, we have to know the ground-truth value of the parameters U, V, and W, which makes it difficult to use the real-world data where the ground-truth factor matrices are unknown. Therefore, we used synthetic data for the experiments. The factor matrices U R , V R and the sensing matrix Φ R are randomly generated, and the input matrices were drawn from them with additive Gaussian noise. For the ALS-ased algorithms, each element of the factor matrices and the sensing matrix is sampled from the Gaussian distriution with mean () where is the true value of the input matrix, U and V are the factor matrices otained y running the algorithms. The SNRs were averaged over the 100 different trials (Figure 2). The ounds were etter for the cases where the matrix U is known in advance, than the cases where the matrix U is unknown. This indicates that the estimation of oth U and V is more difficult than the estimation of V only. The ounds of CS-MF and CS-NMF were seriously degraded compared to the ounds of MF and NMF, as well as the actual performances, especially for the cases that the matrix U is unknown. However, the co-factorization ased methods showed much improved ounds, as well as the performances, compared to CS-MF and CS-NMF. The use of uncompressed prior as the side information actually helps to improve the reconstruction performance in the learning from the compressed data. As an illustrative example, we ran the algorithms for the compressively sampled data on the cameraman image (Figure 3). We divided the image using 8-y-8 patches to uild an input matrix, and compress each column into the size of % of the data is randomly selected to e used for the uncompressed prior. CS-MCF rought much etter reconstruction result than the CS-MF. 5 Conclusions We have presented a method for matrix co-factorization in the compressed domain where compressively-sampled data and partial uncompressed data were jointly decomposed, sharing a factor matrix. We calculated CRBs for three matrix factorization methods (MF, CS-MF, CS-MCF) in the case of Gaussian likelihood, showing that CS-MCF improves the reconstruction performance over CS-MF. Numerical experiments on image confirmed the etter performance of CS-MCF over CS-MF. Acknowledgments: This work was supported y NIPA- MSRA Creative IT/SW Research Project, the Converging Research Center Program funded y Korea MEST (No. 2010K001171), NIPA ITRC Support Program (NIPA C ), and NRF WCU Program (R ). References [Candes and Tao, 2006] E. Candes and T. Tao. Near optimal signal recovery from random projections: Universal encoding strategies? IEEE Transactions on Information Theory, 52(12): , 2006.

(a) () (c) (d) (e) Figure 3: Reconstruction example of an image: (a) original image () the reconstruction result of CS-MF (c) the reconstruction result of CS-MCF (d) the reconstruction result of

Principal Component Analysis. Spinger-Verlag, 2 edition, 2002. [Kay, 1993] S. M. Kay. Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice Hall, 1993. [Koren et al., 2009] Y.

8 (a) () (c) (d) (e) Figure 3: Reconstruction example of an image: (a) original image () the reconstruction result of CS-MF (c) the reconstruction result of CS-MCF (d) the reconstruction result of CS-NMF (e) the reconstruction result of CS-NMCF [Donoho, 2006] D. L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4): , [Jolliffe, 2002] I. T. Jolliffe. Principal Component Analysis. Spinger-Verlag, 2 edition, [Kay, 1993] S. M. Kay. Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice Hall, [Koren et al., 2009] Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8):30 37, [Lee and Choi, 2009] H. Lee and S. Choi. Group nonnegative matrix factorization for EEG classification. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), Clearwater Beach, Florida, [Lee and Seung, 1999] D. D. Lee and H. S. Seung. Learning the parts of ojects y non-negative matrix factorization. Nature, 401: , [Lee et al., 2010] H. Lee, J. Yoo, and S. Choi. Semisupervised nonnegative matrix factorization. IEEE Signal Processing Letters, 17(1):4 7, [O Grady and Rickard, 2008] P. D. O Grady and S. T. Rickard. Non-negative matrix factorisation of compressiely sampled non-negative signals. In Proceedings of the 8th IMA International Conference on Mathematics in Signal Processing, Cirencester, UK, [Singh and Gordon, 2008] A. P. Singh and G. J. Gordon. Relational learning via collective matrix factorization. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Las Vegas, Nevada, [Smaragdis and Brown, 2003] P. Smaragdis and J. C. Brown. Non-negative matrix factorization for polyphonic music transcription. In Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages , New Paltz, NY, [Srero et al., 2005] N. Srero, J. D. M. Rennie, and T. Jaakkola. Maximum-margin matrix factorization. In Advances in Neural Information Processing Systems (NIPS), volume 17. MIT Press, [u et al., 2003] W. u,. Liu, and Y. Gong. Document clustering ased on non-negative matrix factorization. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SI- GIR), Toronto, Canada, [Yoo and Choi, 2008] J. Yoo and S. Choi. Orthogonal nonnegative matrix factorization: Multiplicative updates on Stiefel manifolds. In Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL), Daejeon, Korea, [Yoo and Choi, 2009] J. Yoo and S. Choi. Weighted nonnegative matrix co-tri-factorization for collaorative prediction. In Proceedings of the 1st Asian Conference on Machine Learning (ACML), Nanjing, China, [Yoo and Choi, 2010] J. Yoo and S. Choi. Orthogonal nonnegative matrix-tri-factorization for co-clustering: Multiplicative updates on stiefel manifolds. Information Processing and Management, 46(5): , [Yoo et al., 2010] J. Yoo, M. Kim, K. Kang, and S. Choi. Nonnegative matrix partial co-factorization for drum source separation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, Texas, [Yu et al., 2005] K. Yu, S. Yu, and V. Tresp. Multi-lael informed latent semantic indexing. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), Salvador, Brazil, [Zhu et al., 2007] S. Zhu, K. Yu, Y. Chi, and Y. Gong. Comining content and link for classification using matrix factorization. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), Amsterdam, The Netherlands, 2007.

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds Jiho Yoo and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong,