Generalized Low Rank Approximations of Matrices Revisited

Size: px

Start display at page:

Download "Generalized Low Rank Approximations of Matrices Revisited"

Robyn Simpson
5 years ago
Views:

1 Generalzed Low Rank Approxmatons of Matrces Revsted Jun Lu, Songcan Chen, Zh-Hua Zhou, Senor Member, IEEE, and Xaoyang Tan, Member, IEEE Abstract Compared to Sngular Value Decomposton (SVD), Generalzed Low Rank Approxmatons of Matrces (GLRAM) can consume less computaton tme, obtan hgher compresson rato, and yeld compettve classfcaton performance. GLRAM has been successfully appled to applcatons such as mage compresson and retreval, and qute a few extensons have been successvely proposed. However, n lterature, some basc propertes and crucal problems wth regard to GLRAM have not been explored or solved yet. For ths sake, we revst GLRAM n ths paper. Frst, we reveal such a close relatonshp between GLRAM and SVD that GLRAM s objectve functon s dentcal to SVD s objectve functon except the mposed constrants. Second, we derve a lower-bound of GLRAM s objectve functon, and dscuss when the lower-bound can be touched. Moreover, from the vewpont of mnmzng the lower-bound, we answer one open problem rased by Ye (Machne Learnng, 25),.e., a theoretcal justfcaton of the expermental phenomenon that, under gven number of reduced dmenson, the lowest reconstructon error s obtaned when the left and rght transformatons have equal number of columns. Thrd, we explore when and why GLRAM can perform well n terms of compresson, whch s a fundamental problem concernng the usablty of GLRAM. Index Terms Dmensonalty Reducton, Sngular Value Decomposton, Generalzed Low Rank Approxmatons of Matrces, Reconstructon Error I. INTRODUCTION To obtan a compact representaton of data, one usually employs dmensonalty reducton, wth Sngular Value Decomposton (SVD) [] beng one of the most well-known methods. SVD has the appealng property that t can acheve the smallest reconstructon error among all the rank-k approxmatons, and has been successfully appled to face recognton [2], [3], nformaton retreval [4], [5], etc. Applcatons of SVD to hgh-dmensonal data, such as mages and vdeos, quckly run up aganst practcal computatonal lmts, manly due to the hgh tme and space complextes of the SVD computaton for large matrces [], [6]. To deal wth the problem of hgh space complexty, ncremental algorthms have been proposed n [7], [8], [9]. To speed the computaton of SVD, random samplng has been employed n [], [], [2]. Lke the tradtonal SVD, these mprovements on SVD all treat data as one-dmensonal vector patterns. Jun Lu, Songcan Chen, and Xaoyang Tan are wth the Department of Computer Scence & Engneerng, Nanjng Unversty of Aeronautcs & Astronautcs Nanjng, 26, P.R. Chna. Emals: {j.lu, s.chen, x.tan}@nuaa.edu.cn. Zh-Hua Zhou s wth the Natonal Key Laboratory for Novel Software Technology, Nanjng Unversty, Nanjng 293, P.R. Chna. Emal: zhouzh@nju.edu.cn. Correspondng author (S.C. Chen), Tel: , Fax: Recently, Ye [6], [3] proposed the Generalzed Low Rank Approxmatons of Matrces (GLRAM) method, whch treats data as the natve two-dmensonal matrx patterns. Benefted by the two-dmensonal representaton of data, GLRAM was reported to consume less computaton tme and yeld hgher compresson rato than SVD n applcatons such as mage compresson and retreval. Compared to the two-dmensonal methods such as two-dmensonal Prncpal Component Analyss [4], GLRAM employs two-sded transformatons rather than sngle-sded one, and yelds hgher compresson rato. Moreover, GLRAM s appled n [6] as a pre-processng step for SVD to devse the GLRAM+SVD method,.e., performng SVD after the ntermedate dmensonalty reducton stage usng GLRAM. Compared to GLRAM, GLRAM+SVD acheves a sgnfcant reducton of the reconstructon error, whle keepng the computaton cost small. GLRAM has been successfully appled n applcatons such as mage compresson and retreval, achevng compettve classfcaton performance to SVD. Snce the man goal of GLRAM tself s to obtan compact representaton of data, we study GLRAM from the vewpont of compresson n ths paper. It s generally beleved that GLRAM does not admt a closed-form soluton, and thus Ye employed an teratve procedure to compute the so-nvolved two-sded transformatons n [6], [3]. Followng Ye s poneer work, researchers have carred out qute a few studes that focus on non-teratve extensons for acceleratng the computaton of the two-sded transformatons. Dng and Ye [5] proposed a two-dmensonal SVD (2dSVD) based on row-row and column-column covarance matrces and studed the optmalty propertes of the 2dSVD; Zhang et al. [6] adopted a natural representaton for mages usng egenmages; Inoue and Urahama [7] proposed a dyadc SVD based on the hgher-order SVD presented by Lathauwer et al. [8]; Lang and Sh [9] clamed to obtan an analytcal GLRAM algorthm, but as ponted out n [2], [2], such a clam s ncorrect and the so-called analytcal GLRAM s n fact a varant of 2dSVD; Lu and Chen [2] proposed a Non-Iteratve GLRAM (NIGLRAM) to approxmately optmze the objectve functon of GLRAM; and Inoue and Urahama [2] proved that some non-teratve methods are n fact equvalent to some extent. Compared to GLRAM, these non-teratve extensons can be computed more effcently and meanwhle can obtan compettve compresson rato and classfcaton performance. GLRAM has also been extended to the scenaro of tensor representaton [22], [23], [24], [25]. For example, the MPCA (Multlnear Prncpal Component Analyss) proposed by Lu et al. [22] extends GLRAM to tensor wth modes over 2; and the

2 2 element rearrangement technque proposed by Yan et al. [25] can be used to mprove the approxmaton performance of GLRAM. In a recent study, Lang and Sh [26] conducted a theoretcal analyss on GLRAM. Frstly, based on the covarance matrx of Prncpal Component Analyss (PCA) [2], they gave the lower and upper bounds of GLRAM s objectve functon, and showed that the reconstructon error of GLRAM s not smaller than that of PCA when the reduced dmenson s same. Secondly, they proposed a non-teratve GLRAM algorthm to compute the transformaton matrces n a smlar way to NIGLRAM [2]. Despte of the successes acheved by GLRAM and ts extensons, some basc propertes and crucal problems wth regard to GLRAM have not been explored or solved yet n lterature. For example, as Ye ponted out n [6], there s not a theoretcal justfcaton of the expermental phenomenon that the lowest reconstructon error s obtaned when the left and rght transformaton matrces have equal number of columns. Lkewse, we also concern the followng two ponts: ) the relatonshp between GLRAM and SVD has not been well studed theoretcally, although GLRAM has been extensvely compared wth SVD emprcally n [6]; and 2) when and why GLRAM can perform well n terms of compresson, whch s a fundamental problem concernng the usablty of GLRAM. In ths paper, we revst GLRAM to answer the aforementoned problems. The studes presented n ths paper can be extended to beneft qute a few two-dmensonal and tensorbased methods [8], [22], [23], [24], [25], [27], [28], whch have attracted much attenton from researches n areas of machne learnng, computer vson, neural computaton, etc. In what follows, we brefly revew SVD and GLRAM n Secton II, dscuss on the relatonshp between GLRAM and SVD n Secton III, gve a new lower-bound of GLRAM s objectve functon to answer the problem why the lowest reconstructon error can be obtaned when the left and rght transformaton matrces have equal number of columns n Secton IV, explore when and why GLRAM can perform well n terms of compresson by studyng the relatonshp among some defned crtera n Secton V, conduct experments n Secton VI, and conclude ths paper n Secton VII. A. Notatons II. A BRIEF REVIEW OF SVD AND GLRAM The major notatons employed n ths paper are summarzed n Table I. Suppose the dataset s composed of N matrx patterns A R e f, =, 2,..., N, and a = vec(a ) denotes the d-dmensonal vector pattern obtaned by sequentally concatenatng the columns of A. Let I q denote a q q dentty matrx, denote the Kronecker product, and. F and. 2 denote the Frobenus norm and the Eucldean vector norm, respectvely. For subsequent dscusson convenence, we gve the followng equatons []: A 2 F = vec(a) 2 2, () vec(abc) = (C T A)vec(B), (2) (AC) (BD) = (A B)(C D), (3) (A B) T = A T B T, (4) I n I m = I m I n = I mn, (5) where A, B, C and D are matrces of proper szes. B. Sngular Value Decomposton In applcatons such as face recognton, the samples A s are naturally represented as matrx patterns. When applyng SVD for dmensonalty reducton (a well-known method that apples SVD for dmensonalty reducton s the Egenfaces [2]), one usually converts the matrx patterns A s to the concatenated vectors a, =, 2,..., N. Based on ths vector representaton, SVD computes k orthonormal projecton vectors n P svd = [p svd, p svd 2,..., p svd k ] Rd k to extract a svd = Psvd T a for a. P svd corresponds to the frst k columns of U n X = UΛV T, (6) where X = [a, a 2,..., a N ] s the data matrx, Λ = dag(λ, λ 2,..., λ N ) contans the sngular values n a nonncreasng order, and U and V contan the left and rght sngular vectors of X, respectvely. The reconstructed sample of a by SVD can be denoted as ã svd = P svd a svd, and t s easy to get that P svd provdes the globally optmal soluton to mn J svd (P ) = P T P =I k a P P T a 2 2, (7) where J svd (P svd ) s the reconstructon error brought by SVD s rank-k approxmaton. Therefore, SVD has the property that t can acheve the smallest reconstructon error among all the rank-k approxmatons. C. Generalzed Low Rank Approxmatons of Matrces In contrast to SVD that converts matrces A s to vectors a s, GLRAM drectly manpulates A s, computes two transformatons L = [l, l 2,..., l m ] R e m and R = [r, r 2,..., r n ] R f n wth orthonormal columns, and extracts A glram sample beng Ãglram = L T A R for A, wth the reconstructed = LA glram R T. To compute the blateral transformatons L and R, GLRAM mnmzes the followng reconstructon error: mn L T L=I m R T R=I n J glram(l, R) = A LL T A RR T 2 F. (8) Generally speakng, Eq. (8) does not admt a closed-form soluton, and therefore [6], [3] employed an teratve procedure whose key s to optmze L under fxed R and vce versa. More specfcally, under gven L, R s composed of the n egenvectors of M R = A T LL T A (9)

3 3 TABLE I NOTATIONS Notatons Descrptons Notatons Descrptons A the -th matrx pattern N number of samples e number of rows n A f number of columns n A d sample dmenson (d = ef) L the left transformaton R the rght transformaton l the -th column n L r the -th column n R m number of columns n L n number of columns n R s common value for m and n a concatenated vector pattern of A P svd the projecton matrx obtaned by SVD P glram the projecton matrx obtaned by GLRAM k the number of columns n P svd or P glram a svd extracted features by SVD A glram extracted features by GLRAM NMSE Normalzed Mean Square Error Normalzed Mean Lower-Bound Mean Smlartes of Left Subspaces Mean Smlartes of Rght Subspaces correspondng to the largest n egenvalues; and under gven R, L s composed of the m egenvectors of M L = A RR T A T () correspondng to the largest m egenvalues. III. RELATIONSHIP BETWEEN GLRAM AND SVD Although SVD and GLRAM extract features n dfferent forms (.e., SVD manpulates one-dmensonal vectors to extract a svd = Psvd T a, whle GLRAM manpulates twodmensonal matrces to extract A glram = L T A R), we shall reveal an essental relatonshp between GLRAM and SVD n the followng theorem. Theorem : GLRAM s objectve functon s dentcal to SVD s objectve functon except the mposed constrants. Proof: One one hand, SVD s objectve functon Eq. (7) can be wrtten as: mn J svd (P svd ) = a P svd Psvda T 2 2. () Psvd T P svd=i k On the other hand, we denote P glram as P glram = R L = [p glram, p glram 2,..., p glram mn ], (2) where for any =,..., n and j =,..., m, Employng Eqs. (2-5), we have a glram where a glram p glram ( )m+j = r l j. (3) P T glramp glram = (R L) T (R L) = I k, (4) =vec(a glram ) = (R L) T a = P T glrama, (5) s the concatenated vector of A glram, and k = mn s the number of reduced dmenson. Summarzng the results of Eqs. (2-5), GLRAM s objectve functon Eq. (8) can be wrtten as mn J glram (L, R) = L,R,P glram a P glram Pglrama T 2 2 subject to L T L = I m, R T R = I n, P glram = R L, P T glramp glram = I k. (6) Comparng GLRAM s objectve functon Eq. (6) wth SVD s Eq. (), we can fnd that, GLRAM just optmzes the same objectve functon as SVD except the mposed constrants,.e., SVD s P svd s only subjected to the orthonormal constrant Psvd T P svd = I k, whle, besdes the orthonormal constrant, GLRAM s P glram s subjected to addtonal Kronecker product constrants P glram = R L, L T L = I m and R T R = I n. Ths ends the proof. A. Remarks Frst, by usng Theorem, we can explan such expermental phenomenon reported n [6] that, GLRAM acheves hgher reconstructon error than SVD under the same number of reduced dmenson. Ths phenomenon attrbutes to the facts that: ) both GLRAM and SVD n fact optmzes the same objectve functon, and 2) GLRAM s subjected to addtonal Kronecker product constrants P glram = R L, L T L = I m and R T R = I n than SVD. It s worthwhle to note that, [26] also answered the queston why GLRAM acheves hgher reconstructon error than Prncpal Component Analyss [2] under the same number of reduced dmenson. However, the employed methodologes are dfferent,.e., we answered ths problem by revealng that GLRAM s objectve functon s dentcal to SVD s objectve functon except the mposed constrant, whle [26] answered the problem by explctly formulatng the objectve functon value of PCA as the lowerbound of GLRAM s objectve functon. Second, GLRAM does not need to explctly compute or store P glram R d k, but nstead computes and stores L R e m and R R f n, consumng em + fn elements; whle SVD needs to explctly compute and store the projecton matrx P svd R d k, consumng dk = efmn elements. As a result, compared to SVD, GLRAM not only consumes less computaton tme but also obtans hgher compresson rato when appled to the hgh-dmensonal applcatons such as mage compresson. Thrd, a svd by SVD can be denoted as a svd = [a T p svd, a T p svd 2,..., a T p svd k ] T, (7) and we have N (at psvd j ) 2 = λ 2 j, snce P svd corresponds to the frst k columns of U n Eq. (6). Moreover, consderng the fact that λ j s are n a non-ncreasng order, we have (a T p svd j ) 2 (a T p svd j+) 2. (8)

4 4 From Eq. (8), we know that the projecton vector p svd j preserves more nformaton than p svd j+, and thus the nformaton preservng abltes of SVD s projecton vectors are n a nonncreasng order. However, unlke (8) for SVD, the followng nequalty (a T p glram j ) 2 (a T p glram j+ ) 2 (9) generally does not hold for GLRAM (see Secton IV-B for a counter-example, and Secton VI-A for expermental verfcaton). As a result, despte of GLRAM s close relatonshps wth SVD presented n Theorem, GLRAM s stll qute dfferent from SVD. IV. A NEW LOWER-BOUND As reported n [6], [3], under gven number of reduced dmenson k = mn, GLRAM always obtans the lowest reconstructon error when settng s = m = n, but a rgorous theoretcal justfcaton behnd ths s stll not avalable [6]. In ths secton, we try to solve ths problem, before whch, we frst prove a lemma. Lemma : For any matrx A R e f, L R e m and R R f n that satsfy L T L = I m and R T R = I n, we have L T AR 2 F mn(m,n) δ (A) 2, (2) where δ (A) denotes the -th sngular value of matrx A. Proof: Let R R f (f n) be a matrx whose columns are the orthonormal complement of those n R, namely, [R R][R R] T = I f. Snce A R R T A T s postve semdefnte, we have trace(l T A R R T A T L), where trace(.) denotes the trace of a square matrx. Therefore, we have L T AR 2 F =trace(l T ARR T A T L) trace(l T A[R R][R R] T A T L) =trace(l T AA T L) m δ (A) 2. By a smlar dervaton, we have L T AR 2 F (2) n δ (A) 2. (22) From nequaltes (2) and (22), we can easly get the nequalty (2). Ths ends the proof. A. A Lower-bound of J glram (L, R) and a Theoretcal Justfcaton of m = n Theorem 2: Let the SVD of A be denoted as: A = U Λ V T, (23) where Λ = dag(δ (A ), δ 2 (A ),..., δ mn(e,f) (A )) contans the sngular values n a non-ncreasng order, U and V contan the left and rght sngular vectors, respectvely. Let k = mn be the reduced dmenson. Then J LB glram(m, n) = mn(e,f) j=mn(m,n)+ δ j (A ) 2 (24) s a lower-bound of J glram (L, R). Moreover, for gven reduced dmenson k, Jglram LB (m, n) acheves the mnmum when m = n. Proof: Employng Lemma, J glram (L, R) satsfes J glram (L, R) = A LL T A RR T 2 F = A 2 F L T A R 2 F A 2 F = mn(e,f) j=mn(m,n)+ =J LB glram(m, n), mn(m,n) j= δ j (A ) 2 δ j (A ) 2 (25) where the second equalty follows from L T L = I m and R T R = I n. Therefore, Jglram LB (m, n) offers a lower-bound of J glram (L, R). Furthermore, for gven A s and reduced dmenson k, the lower-bound Jglram LB (m, n) s not dependent on the computed L and R, but only dependent on m or n. It s easy to observe that Jglram LB (m, n) acheves the mnmum when m = n. From Theorem 2, we can offer a theoretcal justfcaton of m = n from the vewpont of mnmzng Jglram LB (m, n). Note that, although the justfcaton s derved from the lower-bound, t s generally applcable to J glram (L, R) when GLRAM can perform well n terms of compresson, snce, as wll be emprcally proven n Secton VI-B, J glram (L, R) s close to ts lower-bound n ths case. Pror to our work n ths paper, researches have come up wth some dfferent lower-bounds for GLRAM. The lowerbound provded n [5] s dependent on the obtaned L and R and thus s qute dfferent from ours. The lower-bound gven n [26] s the objectve functon value of PCA, whch s depcted by the egenvalues of the covarance matrx of all the vector patterns a s. Note that, the lower-bound gven by us s depcted by the sngular values of the ndvdual matrx patterns A s, and therefore t s qute dfferent from the one gven n [26]. Moreover, to the best of our knowledge, the lower-bounds gven n [5], [26] can not be drectly used to answer why GLRAM can obtan the lowest reconstructon error when m = n for gven reduced dmenson k = mn. In the end of ths subsecton, t s worthwhle to note that, snce the justfcaton of m = n s not from J glram (L, R), but ts lower-bound, t s possble that m = n s not a good choce n some cases. We shall pont out n Secton V-C that, when e and f are extremely mbalanced, m = n mght not be a good choce, and suggest to deal wth ths problem by a technque employed n [29].

5 5 B. When the Lower-bound Can Be Touched In ths subsecton, we explore when the lower-bound gven n Theorem 2 can be touched. In the followng, we assume s = m = n, and denote U s and V s respectvely as the frst s columns n U and V. The result n ths subsecton s gven n the followng theorem. Theorem 3: If the followng two condtons are satsfed:, =, 2,..., N span the same projecton subspace,.e., U s = U s Q, (26) U s where Q R s s s an orthonormal transformaton,, =, 2,..., N span the same projecton subspace,.e., V s = V s W, (27) V s where W R s s s an orthonormal transformaton, then the lower-bound Eq. (24) can be touched by settng L = U s and R = V s. Proof: Let L = U s, R = V s, and employng Eqs. (26) and (27), we have L T A R 2 F = U st A V s 2 F = Q T U st A V s W 2 F = U st A V s 2 F s = δ j (A ) 2, J glram (L, R) = = j= ( A 2 F L T A R 2 F ) mn(e,f) j=s+ δ j (A ) 2. (28) (29) Obvously, the lower-bound Eq. (24) s touched. A specal case of Theorem 3 s that, U s = Uj s, V s = Vj s,, j =, 2,..., N, where L = U s and R = V s enable GLRAM to touch the lower-bound. In ths specal case, L T A R = dag(δ (A ), δ 2 (A ),..., δ s (A )), and ncorporatng Eqs. (2), (3) and (5), we have t= (a T t p glram ( )s+j )2 = { N t= δ (A t ) 2 = j j. (3) In ths specal case, we have that: ) the nequalty (9) n Secton III-A does not hold, 2) the GLRAM projecton vectors that beneft reconstructon are n fact p glram ( )s+, =, 2,..., s, whle the remanng ones are useless, and 3) there are projecton vectors outsde P glram that can contan useful nformaton, so long as the rank of any A s over s. Generally speakng, the condtons offered n Theorem 3 are too strong to be satsfed n real applcatons. However, these two condtons remnd us to measure the relatonshp between A s by the smlartes between the subspaces spanned by U s s s (and V s), whch wll be dscussed n detal n the next secton. V. WHEN AND WHY GLRAM CAN PERFORM WELL IN TERMS OF COMPRESSION In ths secton, we explore a fundamental problem consderng the usablty of GLRAM, namely, when and why GLRAM can perform well n terms of compresson. Ths s qute mportant, snce t wll help practtoners to decde whether to use GLRAM or not for compresson purpose. A. Crtera Related to the Compresson Performance of GLRAM An mportant crteron that measures the compresson performance of GLRAM s the reconstructon error, J glram (L, R). It wll be nce f the value of a crteron s between [ ], whch s easer to use n many tasks, say, emprcal evaluaton. For ths sake, we employ the Normalzed Mean Square Error (NMSE) as NMSE = J glram(l, R)/N N A. (3) 2 F /N Obvously, NMSE s between [ ], and s proportonal to J glram (L, R). The larger NMSE s, the worse the compresson performance s, and vce versa. Based on NMSE, t s easy to defne the Normalzed Mean Retaned Informaton () as = NMSE. In contrast to NMSE, the larger s, the better the compresson performance s. One key parameter n NMSE s the value of s, and we want to obtan low NMSE at small s. The remanng problem s to defne some proper crtera that can help determne when and why low NMSE can be obtaned at small s. Our motvaton comes from the analyss of the normalzed reconstructon error of a gven matrx A. Accordng to Lemma, we have A 2 F LT AR 2 F A 2 F mn(r,c) from whch we can observe that δ (A) 2 s δ (A) 2 mn(r,c), δ (A) 2 The lower-bound of the normalzed reconstructon error s determned by the dstrbuton of the sngular values. Specfcally, f the leadng s sngular values of A does not domnate, the normalzed reconstructon error can not be small, and vce versa. Snce ths lower-bound s only determned by the gven sample tself, we call ths factor as the wthn-samples factor. When the lower-bound s small, t s meanngful to consder whether there exst L and R that can make the normalzed reconstructon error close to the lower-bound. Obvously, when L and R correspond to the leadng s left and rght sngular vectors of A, respectvely, the lowerbound s touched. Moreover, consderng the fact that we are lookng for L and R for compressng N samples A s, then L (and R) should be the tradeoff among U s s (and V s s). Snce L and R are tradeoff among all the matrx samples, we call ths factor as the between-samples factor.

6 6 To depct the wthn-samples factor, we make use of the sngular values of the matrces A s to compute the lowerbound Jglram LB (m, n) defned n Eq. (24). Smlar to the ntroducton of NMSE, whch can brng convenence to tasks such as evaluaton, we defne the Normalzed Mean Lower-Bound () as = J glram LB (m, n)/n N A. (32) 2 F /N Lke the relatonshp between J glram (L, R) and Jglram LB (m, n) revealed n Eq. (25), t s easy to get the relatonshp between NMSE and as NMSE. (33) Moreover, based on, t s easy to defne the Normalzed Mean Upper-Bound () as =, and t s easy to get the relatonshp between and as. (34) whch says that, the upper-bound of s. To depct the between-samples factor, we measure the relatonshp between A s by the smlartes between the subspaces spanned by U s s s (and V s). Specfcally, we defne the Mean Smlartes of Left Subspaces () as where = 2 N(N ) SLS(U s, U s j ) = j=+ SLS(U s, U s j ), (35) s U st Uj s 2 F (36) s the smlarty of subspaces spanned by U s and U j s. Smlarly, we defne the Mean Smlartes of Rght Subspaces () as where = 2 N(N ) SRS(V s, V s j ) = j=+ s the smlarty of subspaces spanned by V s SRS(V s, V s j ), (37) s V st Vj s 2 F (38) and Vj s. It s easy to get that, both and are between [ ]. Moreover, n the case descrbed n Theorem 3, both and obtan the maxmal value,.e.,. Wth the crtera defned above, we are ready to dscuss when and why GLRAM can perform well n terms of compresson n the comng subsectons. B. Dscusson on the Normalzed Mean Lower-Bound Crteron Consderng the defnton of of NMSE and ther relatonshp revealed n nequalty (33), t s easy to get the followng proposton Proposton : For gven s: When s large, NMSE s large too, and therefore GLRAM can not work well n terms of compresson. When s small, t s possble that GLRAM can work well n terms of compresson, dependng on whether NMSE can be close to. Note that, decreases wth ncreasng s, and n the extreme case, decreases to zero when s = mn(e, f). However, the purpose of GLRAM s to obtan compact representaton at low reconstructon error, and thus low should be obtaned at relatvely small s. Therefore, we have the followng proposton Proposton 2: For varyng s: To ensure that GLRAM can work well n terms of compresson (obtanng compact representaton at low reconstructon error), the value of N M LB should decrease sharply wth ncreasng s, so that can be low when s s small. To elaborate Proposton 2, we analyze the followng two specal cases. The frst one s that, the sngular values of A ( =,..., N) are the same, so that (s) = mn(e,f) s mn(e,f). Obvously, N M LB s large when s s far smaller than mn(e, f), and GLRAM can not compress well by obtanng low NMSE at low s. The second one s that, δ (A ), the frst sngular value of A ( =,..., N) s extremely larger than the rest sngular values, so that for every s, s equal to numercally. In ths case, one can choose s = to ensure small N M LB (= numercally). The dfference between these two specal cases are that, wth ncreasng s, decreases slowly (lnearly) n the frst case whle extremely sharply n the second one. Of course, for real databases, the dstrbuton of sngular values s between these two specal case, and obvously GLRAM s sutable for the case that the leadng few sngular values domnate. When s low at small s, t s worthwhle to consder whether the obtaned NMSE can be close to ts lower-bound. We wll dscuss ths n the next subsecton. C. Dscusson on the Smlartes Among Subspaces Crtera In lookng for L and R that compress the N samples A s, t s obvous that for a gven sample A, f L (and R) spans the same subspace as U s (and V s ), the reconstructon error of A can touch the lower-bound,.e., A LL T A RR T 2 F = mn(e,f) s+ δ j (A ) 2. Snce L (and R) s a tradeoff among the U s s s (and V s), then t s reasonable that, when U s s (and V s s) span close subspaces, the obtaned L (and R) s lkely to be close to the gven U s (and V s ) so that the reconstructon error of A s close to ts lower-bound. Based on ths analyss, we gve the followng proposton: Proposton 3: and Crtera: The larger and are, the more lkely NMSE s close to. We elaborate ths proposton by two examples. For the frst one, consderng the case presented n Theorem 3, t s obvous that = =. In ths example, NMSE equals. For the second one, suppose that there are N rank matrces A = u v T, where ut u j = ( j), v Tv j = ( j), u T u =, and v Tv =. We set s = m = n = and apply GLRAM to compress these N matrces. Obvously, n ths example, =, and = =. By some careful mathematcal deductons, we can get NMSE =

7 7 N N, whch s far larger than =, especally for large N. From Proposton 3, we know that, when M SLS and are large, NMSE s lkely to be close to ts lowerbound N M LB. For better revealng the relatonshp among NMSE,, and, we make use of, whch depcts the rato between the retaned nformaton and the upper-bound N M U B. Obvously, the larger s, the closer NMSE s to. Our experments n Secton VI-B show that, wth ncreasng and, generally ncreases as well. It s meanngful to pont out that, there are some specal cases when and are low, but NMSE s possbly close to. Here we also consder the two specal cases employed n Secton V-B. For the frst one, we assume that A s are matrces wth equal sngular values, and that the = NMSE spaces spanned by U s s s (and V ) are not close so that and are small. In ths case, t s possble that NMSE can be close to. For the second case, we assume that, δ (A ), the frst sngular value of A ( =,..., N) s extremely larger than the rest sngular values, and the frst left (and rght) sngular vectors U s (and V s) are the same, but the remanng left (and rght) sngular vectors dffer qute a lot. When when settng s >, t s possble that NMSE and are both numercally, but that and are small. However, these two cases can be addressed by usng the dstrbuton of under varyng s. For the frst case, N M LB decreases lnearly wth varyng s, and therefore GLRAM does perform well n terms of compresson due to hgh NMSE at low s. For the second one, snce decreases extremely sharply, then s = s enough for compact compresson, wth NMSE beng close to (both equal numercally), and = =. In the end of ths subsecton, t s worthwhle to make use of Proposton 3 to pont out that, when e and f are extremely mbalanced,.e., e f or f e, m = n mght not be a good choce for GLRAM. Wthout loss of generalty, we assume f = 2 and e f. In ths case, although we can set s = m = n = 2 (note that, m e and n f) achevng N M LB =, N M SE can be typcally large when the subspaces spanned by U s s dffer greatly so that s very low. From ths specal case, we can say that GLRAM should be appled to databases wth balanced e and f. Moreover, a possble way to deal wth mbalanced e and f s to reshape the mbalanced matrces to the balanced ones, and ths technque has been employed n [29]. From the dscusson n the prevous subsectons, we can say that when ) decreases sharply wth ncreasng s, and can acheve a low value at small s, and meanwhle 2) and are relatvely large, GLRAM can work well n terms of compresson. VI. EXPERIMENTS To study the nformaton preservng abltes of GLRAM projecton vectors, and to emprcally dscuss the relatonshp among the crtera defned n Secton V, we conduct experments on the followng three datasets (one synthetc and two real-world face datasets): RAND s a synthetc dataset consstng of 4 matrces of sze The 34 entres n each matrx are randomly generated from the normal dstrbuton wth mean and standard dervaton ; ORL contans the face mages of 4 persons, for a total of 4 mages. The resoluton of the face mages s 2 92; FERET [3] s a database that conssts of a total of 4,5 gray-scale mages representng,99 subjects. In ths paper, we conduct experments on a subset of the FERET test 996, fa, whch contans,96 face mages. The resoluton of the face mages s 5 3. When mplementng the GLRAM algorthm, we follow the smlar settng as n [6], namely, the parameter η (whch s defned as the rato between the values of Root Mean Square Reconstructon Error [6] correspondng to two adjacent teratve steps, and controls the convergence precson) s set to 6, and s = m = n =. A. On Informaton Preservng Abltes of p glram j s In ths subsecton, we explore the nformaton preservng abltes of p glram j s defned n (3). We conduct experments on RAND and ORL, and report results n Fg., where the x-axs corresponds to j, the number of projecton vector, and the y-axs corresponds to the logarthmc preserved nformaton ln( N j ) 2 ). From Fg., (at pglram we can clearly observe that, unlke SVD, the nformaton preservng abltes of GLRAM projecton vectors are not n a non-ncreasng order. We would lke to pont out that, (at pglram j the preserved nformaton of the projecton vector p glram j measured by N ) 2 has an equvalence relatonshp wth the reconstructon error by ths projecton vector measured by N a p glram j p glram T j a 2 2, as we can easly verfy N a p glram j p glram T j a 2 2 = N a 2 2 N (at pglram j ) 2. We choose the former measurement for better dstngushng the nformaton preservng abltes of each ndvdual projecton vector. Next, we explore whether there are certan projecton vectors that on one hand are orthogonal to GLRAM projecton vectors, and on the other hand can preserve more nformaton than the GLRAM projecton vectors. For ths sake, we compute ) r, = n +,..., f that are the f n egenvectors of M R correspondng to the rest f n egenvectors n a non-ncreasng order, and 2) l j, j = m +,..., e that are the e m egenvectors of M L correspondng to the rest e m egenvectors n a non-ncreasng order. Obvously, the projecton vectors r l j, = n+,..., f, j =,..., e and =,..., f, j = m +,..., e are orthogonal to the GLRAM projecton vectors r l j, =,..., n, j =,..., m. Our expermental results on RAND and ORL show that, there are respectvely 86 and 6 projecton vectors that on one hand are orthogonal to GLRAM projecton vectors n P glram, and on

8 8 Logarthmc preserved nformaton Logarthmc preserved nformaton RAND, GLRAM Number of projecton vector ORL, GLRAM Projecton vector Logarthmc preserved nformaton Logarthmc preserved nformaton RAND, SVD Projecton vector ORL, SVD Projecton vector Fg.. The nformaton preservng abltes of p glram j s. The x-axs corresponds to j, the number of projecton vector, and the y-axs corresponds to the logarthmc preserved nformaton ln( P N (at pglram j ) 2 ). the other hand can preserve more nformaton than the worst GLRAM projecton vector. B. On Dfferent Crtera Frst, we verfy s = m = n from the vewpont of mnmzng the lower-bound of GLRAM s objectve functon, when the GLRAM soluton s close to ts lower-bound. For ths sake, we fx the reduced dmenson k = mn =, and try the followng fve combnatons for m n:, 5 2,, 2 5, and, and conduct experments on FERET. The results are reported n Fg. 2, from whch we can see that, ) NMSE s close to, and 2) both and NMSE obtan the mnmal values when s = m = n = NMSE reasons are that: ) the crteron s very small; and 2) the crtera and are very large. Therefore, f the value of s relatvely small and meanwhle the values of and are relatvely large, GLRAM can acheve relatvely low NMSE. Lookng at = NMSE, whch depcts the rato between the retaned nformaton and the upper-bound, we can see that the values on FERET and ORL s very hgh (over.9), whle the value on RAND s very small (less than.). These results are n accordance wthe results that the and crtera are hgh on FERET and ORL, whle low on RAND. TABLE II VALUES OF THE CRITERIA ON DIFFERENT DATASETS. Crteron RAND ORL FERET NMSE Thrd, we have a look at the dstrbuton of the crteron wth varyng s. We report results on ORL and RAND n Fg. 3, from whch we can see that ) when s =, s already very low (less than.) on ORL but qute hgh (over.9) on RAND, and 2) wth ncreasng s, decreases quckly to below. when s 8 on ORL, but remans over. when s 5. Therefore, to ensure that GLRAM can perform well n terms of compresson, should decrease sharply wth ncreasng s ORL RAND s s.2 x 5x2 x 2x5 x mxn Fg. 2. NMSE and under varyng m n. The x-axs corresponds to dfferent szes of m n, and the y-axs denote the values of NMSE or. Second, we report the values of the crtera on dfferent datasets n Table II. From ths table, we can observe that GLRAM performs poorly on the synthetc dataset RAND, wth the crteron NMSE =.9849 beng typcally hgh. The underlyng reasons are: ) =.6877 s very hgh, whch leads to NMSE =.6877; and 2) =.2982 and =.329 are very small, and thus NMSE s much larger than. Therefore, f the value of N M LB s relatvely large and meanwhle the values of and are relatvely small, GLRAM can not acheve relatvely low NMSE. GLRAM performs well on the real mage datasets ORL and FERET, wth N M SE <.5. The underlyng Fg. 3. The dstrbuton of under varyng s. Fourth, we explore the relatonshp between N M SE an under fxed and. For ths sake, we make use of the fractonal order sngular value representaton of each matrx pattern A as [3] A α = U Λ α V T, (39) where Λ α = dag(δ (A ) α,..., δ mn(e,f) (A ) α ), U, V and δ j (A ) are defned n Eq. (23), and α s a nonnegatve parameter. Obvously, A α under dfferent nonnegatve α shares the same left and rght sngular vectors as A. Therefore, by settng dfferent nonnegatve α s, we wll generate dfferent datasets {A α }N s wth dentcal and. Moreover, for any gven A, δ j (A ) s are postve and n a non-ncreasng order, and thus the larger α s, the hgher the crteron s. We report the results n the frst row of Fg. 4, from whch we can observe On RAND, the value of decreases wth ncreasng α, but the value of NMSE remans very hgh (over

9 9.9). Specfcally, when α =, N M LB =.2535 s relatvely small, but NMSE =.9688 s far larger than. Ths attrbutes to the fact that and are very small. Therefore, we can say that, when and are relatvely small, t s lkely that GLRAM can not obtan relatvely low NMSE. On ORL and FERET, snce the values of and M SRS are relatvely large, N M SE s very close to. In other words, J glram (L, R) s close to ts lower-bound Jglram LB (m, n) n Eq. (24). Therefore, the justfcaton of m = n (n Secton IV-A) from the vewpont of mnmzng Jglram LB (m, n) s applcable to J glram (L, R) for mage datasets such as ORL and FERET. On ORL and FERET, despte that the values of and are very hgh, when settng α to, the value of NMSE becomes typcally hgh, whch attrbutes to the fact that the value of s typcally large (over.9). Therefore, we can say that, GLRAM can not obtan relatvely low NMSE, when s relatvely large. Ffth, we explore the relatonshp between and the smlartes among subspaces crtera M SLS and M SRS. Here, = NMSE denotes the rato between the retaned nformaton and the upper-bound N M U B. Obvously, the larger s, the closer NMSE s to. To generate varyng and, we let s change from to mn(e, f). We conduct experments on RAND, ORL and FERET, and report results n the second row of Fg. 4. We can observe that, On RAND, when s =, M SLS =.758 and =.836 are very small,.e., the frst left (and rght) sngular vectors of the matrces A s dffer greatly, and therefore =.5 s qute small, whch leads to that NMSE s qute larger than. When s = 92, touches snce the V s are orthonormal matrces, but M SLS does not snce the 2 92 matrces U s s do not span the same subspace. Fnally, t s easy to get that, and consstently ncrease wth ncreasng s, and consstently ncreases when and ncrease. On ORL, when s =, =.977 and =.9844 are very large, whch are qute dfferent from those on RAND. Moreover, benefted by the large and M SRS values, =.9544 s qute close to, whch leads to that N M SE = N M LB s qute close to N M LB. Smlar observaton can be obtaned from FERET. It should be noted that, although NMSE s close to when s =, s not small enough yet, and thus the s we choose s larger than n practce. On ORL and FERET, t s nterestng to note that, when s ncreases from to about, M SLS and M SRS generally decrease; and when s ncreases from to mn(e, f), and consstently ncrease. For, t acheves the mnmum when s = 2, and ncreases wth ncreasng s when s > 2. The reason that does not decrease when s ncreases from 2 to about mght be that, the large values of and at s = beneft the compresson performance at s >. Fnally, ncreases as and ncrease when s >. Comparng the curves on RAND wth those on ORL and FERET, we can fnd that, the curve of s generally below those of and on RAND, but s generally above those of and on ORL and FERET. Moreover, the values of and are always relatvely large (over.7) on FERET and ORL, whch attrbutes to the fact that the matrces share a common characterstc,.e., face mage. In contrast, and are qute low when s s below 2, snce the matrces are random and thus do not share much common characterstcs. Summarzng the results from the above experments, we can say that, ) when does not decrease sharply wth ncreasng s, can not be low at small s, and therefore GLRAM can not perform well n terms of compresson performance; 2) when s low at small s, but and M SRS are very low, GLRAM can not work well n terms of compresson; and 3) when s low at small s and meanwhle and are large (say, >.7 as on ORL and FERET), GLRAM can obtan good compresson performance. VII. CONCLUSION In ths paper, we revst GLRAM to reveal ts propertes, answer an open problem rased by [6], and explore when and why GLRAM can perform well n terms of compresson. Our man contrbutons are that: ) We reveal the close relatonshp between GLRAM and SVD that GLRAM optmzes the same objectve functon as SVD except the mposed constrants. Based on the revealed relatonshp, we can theoretcally answer why GLRAM acheves hgher reconstructon error than SVD under the same number of reduced dmenson. Moreover, we show that the nformaton preservng abltes of the projecton vectors p glram j s are not n a non-ncreasng order as those of SVD. 2) We offer a lower-bound of GLRAM s objectve functon, based on whch we answer an open problem rased by [6],.e., gvng a theoretcal justfcaton of m = n from the vewpont of mnmzng the lower-bound of J glram (L, R) n Theorem 2. 3) We explore a fundamental problem wth the usablty of GLRAM, and argue that, when s low at small s and meanwhle and are large, GLRAM can obtan good compresson performance. The arguments made n ths paper are verfed by theoretcal proofs and emprcal evaluatons on one synthetc and two real-world face datasets. In our vewpont, the followng four aspects are worthy of future study: ) Snce the nformaton preservng abltes of GLRAM projecton vectors are not n a non-ncreasng order as SVD,

10 RAND ORL FERET NMSE NMSE NMSE α RAND α.95 ORL α.95 FERET /.8.75 /.8.75 / s s s Fg. 4. Relatonshps among dfferent crtera. Frst row: NMSE versus under fxed and. Second row: wth and. Please refer to Table I for explanaton of the notatons. s relatonshp we can generalze the GLRAM objectve functon Eq. (8) to mn J(L, R, G) = G (A Ã) 2 F, (4) L T L=I e, R T R=I f, G 2 F =k where Ã = LL T A RR T, G an e f bnary matrx, and s the element-by-element matrx multplcaton. The underlyng motvaton s to employ the k projecton vectors that have the strongest abltes n preservng nformaton. We can also extend the study n ths paper to beneft the extensons of GLRAM, the two-dmensonal dscrmnant methods [27], the tensor based methods [24], [23], etc. 2) It s worthwhle to gve some probablstc nterpretatons for the two-dmensonal and tensor based methods (one related work s [28]). It s also meanngful to derve some smplfed rules for testng when and why GLRAM can obtan good compresson performance, and explore whether there exsts a nce emprcal model wth whch GLRAM outperforms the SVD, and m = n s the optmal choce. 3) Based on the revealed relatonshp between GLRAM and SVD, t s meanngful to explore whether and how the twodmensonal and tensor-based representaton can ncorporate the pror (spatal) nformaton of the data for mage classfcaton, and a smlar study can be carred out on the MatPCA and MatFLDA methods [29], whch convert one-dmensonal data to two-dmensonal ones for dmensonalty reducton. 4) Snce the optmzaton problem n Eq. (8) s non-convex, we can not assure that the soluton to GLRAM s globally optmal. Therefore, t s worthwhle to further consder ths open problem rased n [6],.e., when GLRAM can have the global convergence property. ACKNOWLEDGEMENT We would lke to thank the assocate edtor and the four anonymous revewers for the constructve comments that mproved ths paper. Ths work was supported by the Natonal Scence Foundaton of Chna under Grant Nos , 67736, , 69535, the Natonal Fundamental Research Program of Chna under Grant No. 2CB32793, the Jangsu Scence Foundaton under Grant Nos. BK288, BK29369, and the Hgher Educaton Unversty s Doctors Foundaton under grant No REFERENCES [] G. Golub and C. Van Loan, Matrx Computatons, 3rd ed. Baltmore, MD: The Johns Hopkns Unversty Press, 996. [2] M. Turk and A. Pentland, Egenfaces for recognton, Journal of Cogntve Neuroscence, vol. 3, no., pp. 7 96, 99. [3] P. Belhumeur, J. Hespanha, and D. Kregman, Egenfaces vs. fsherfaces: Recognton usng class specfc lnear projecton, IEEE Transactons on Pattern Analyss and Machne Intellgence, vol. 9, pp. 7 72, 997. [4] M. Berry, S. Dumas, and G. O Bre, Usng lnear algebra for ntellgent nformaton retreval, SIAM Revew, vol. 37, no. 4, pp , 995. [5] S. Deerwester, S. Dumas, T. Landauer, G. Furnas, and R. Harshman, Indexng by latent semantc analyss, Journal of the Socety for Informaton Scence, vol. 4, no. 6, pp , 99. [6] J. Ye, Generalzed low rank approxmatons of matrces, Machne Learnng, vol. 6, no. -3, pp. 67 9, 25. [7] M. Brand, Incremental sngular value decomposton of uncertan data wth mssng values, n Procceedngs of the Seventh European Conference on Computer Vson, 22, pp [8] M. Gu and S. Esenstat, A fast and stable algorthm for updatng the sngular value decomposton, Department of Computer Scence, Yale Unversty, Tech. Rep., 993. [9] K. Kanth, D. Agrawal, and A. Sngh, Dmensonalty reducton for smlarty searchng n dynamc databases, Computer Vson and Image Understandng, vol. 75, no. -2, pp , 999. [] D. Achloptas and F. McSherry, Fast computaton of low rank matrx approxmatons, n Procceedngs of the Thrty-thrd annual ACM symposum on the theory of computng, 2, pp [] P. Drneas, A. Freze, R. Kannan, S. Vempala, and V. Vnay, Clusterng n large graphs and matrces, n Procceedngs of the Tenth Annual ACM- SIAM Symposum on Dscrete Algorthms, 999, pp [2] A. Freze, R. Kannan, and S. Vempala, Fast monte-carlo algorthms for fndng low-rank approxmatons, Journal of the ACM, vol. 5, no. 6, pp. 25 4, 24. [3] J. Ye, Generalzed low rank approxmatons of matrces, n Procceedngs of the Twenty-frst Internatonal Conference on Machne learnng, Banff, Canada, 24, pp

[4] J. Yang, D. Zhang, A. Frang, and J. Yang, Two-dmensonal PCA: A new approach to appearance-based face representaton and recognton, IEEE Transactons on Pattern Analyss and Machne Intellgence, vol.

22 34. [6] D. Zhang, S. Chen, and J. Lu, Representng mage matrces: Egenmages versus egenvectors, n Procceedngs of the Second Internatonal Symposum on Neural Networks, Chongqng, Chna, 25, pp. 659 664.

Lathauwer, B. Moor, and J. Vandewalle, A mult multlnear sngular value decomposton, SIAM Journal on Matrx Analyss and Applcatons, vol. 2, no. 4, pp. 253 278, 2. [9] Z. Lang and P.

Urahama, Equvalence of non-teratve algorthms for smultaneous low rank approxmatons of matrces, n Procceedngs of the IEEE Computer Socety Conference on Computer Vson and Pattern Recognton, New York,

11 [4] J. Yang, D. Zhang, A. Frang, and J. Yang, Two-dmensonal PCA: A new approach to appearance-based face representaton and recognton, IEEE Transactons on Pattern Analyss and Machne Intellgence, vol. 26, no., pp. 3 37, 24. [5] C. Dng and J. Ye, 2-dmensonal sngular value decomoston for 2d maps and mages, n Procceedngs of the SIAM Internatonal Conference on Data Mnng, Newport Beach, CA, 25, pp [6] D. Zhang, S. Chen, and J. Lu, Representng mage matrces: Egenmages versus egenvectors, n Procceedngs of the Second Internatonal Symposum on Neural Networks, Chongqng, Chna, 25, pp [7] K. Inoue and K. Urahama, DSVD: A tensor-based mage compresson and recognton method, n Procceedngs of the IEEE Internatonal Symposum on Crcuts and Systems, Kobe, Japan, 25, pp [8] L. Lathauwer, B. Moor, and J. Vandewalle, A mult multlnear sngular value decomposton, SIAM Journal on Matrx Analyss and Applcatons, vol. 2, no. 4, pp , 2. [9] Z. Lang and P. Sh, An analytcal algorthm for generalzed low-rank approxmatons of matrces, Pattern Recognton, vol. 38, no., pp , 25. [2] K. Inoue and K. Urahama, Equvalence of non-teratve algorthms for smultaneous low rank approxmatons of matrces, n Procceedngs of the IEEE Computer Socety Conference on Computer Vson and Pattern Recognton, New York, NY, 26, pp [2] J. Lu and S. Chen, Non-teratve generalzed low rank approxmaton of matrces, Pattern Recognton Letters, vol. 27, no. 9, pp. 2 8, 26. [22] H. Lu, K. N. Platanots, and V. A. N., Mpca: Multlnear prncpal component analyss of tensor objects, IEEE Transactons on Neural Networks, vol. 9, no., pp. 8 39, 28. [23] D. Xu, S. Yan, L. Zhang, H. Zhang, Z. Lu, and H. Shum, Concurrent subspaces analyss, n Procceedngs of the IEEE Computer Socety Conference on Computer Vson and Pattern Recognton, San Dego, CA, 25, pp [24] H. Wang and N. Ahuja, Rank-r approxmaton of tensors usng mageas-matrx representaton, n Procceedngs of the IEEE Computer Socety Conference on Computer Vson and Pattern Recognton, San Dego, CA, 25, pp [25] S. Yan, D. Xu, S. Ln, T. Huang, and S. Chang, Element rearrangement for tensor-based subspace learnng, n Procceedngs of the IEEE Computer Socety Conference on Computer Vson and Pattern Recognton, Mnneapols, MN, 27. [26] Z. Lang, D. Zhang, and P. Sh, The theoretcal analyss of glram and ts applcatons, Pattern Recognton, vol. 4, no. 3, pp. 32 4, 27. [27] J. Ye, R. Janardan, and Q. L, Two-dmensonal lnear dscrmnant analyss, n Advances n Neural Informaton Processng Systems 7, L. Saul, Y. Wess, and L. Bottou, Eds. Cambrdge, MA: MIT Press, 24, pp [28] S. Yu, J. B, and J. Ye, Probablstc nterpretatons and extensons for a famly of 2d pca-style algorthms, n Procceedngs of the KDD 28 Workshop on Data Mnng usng Matrces and Tensors, 28. [29] S. Chen, Y. Zhu, D. Zhang, and J. Yang, Feature extracton approaches based on matrx pattern: Matpca and matflda, Pattern Recognton Letters, vol. 26, no. 8, pp , 25. [3] P. J. Phllps, H. Wechsler, J. Huang, and P. Rauss, The feret database and evaluaton procedure for face recognton algorthms, Image and Vson Computng, vol. 6, no. 5, pp , 998. [3] J. Lu, S. Chen, and X. Tan, Fractonal order sngular value decomposton representaton for face recognton, Pattern Recognton, vol. 4, no., pp , 28. Jun Lu receved hs B.S. degree from Nantong Insttute of Technology (now Nantong Unversty) n 22, and hs Ph.D. degree from Nanjng Unversty of Aeronautcs and Astronautcs (NUAA) n November, 27. He joned the Department of Computer Scence & Engneerng, NUAA, as a Lecturer n 27. He s currently a postdoc n the Bodesgn Insttute and the Department of Computer Scence & Egneerng of Arzona State Unversty. Hs research nterests nclude Dmensonalty Reducton, Sparse Learnng, and Large-Scale Optmzaton. He has authored or coauthored over 2 scentfc papers. Songcan Chen receved the B.Sc. degree n mathematcs from Hangzhou Unversty (now merged nto Zhejang Unversty) n 983. In Dec. 985, he completed the M.Sc. degree n computer applcatons at Shangha Jaotong Unversty and then worked at NUAA n Jan. 986 as an assstant lecturer. There he receved a Ph.D. degree n communcaton and nformaton systems n 997. Snce 998, as a full professor, he has been wth the Department of Computer Scence and Engneerng at NUAA. Hs research nterests nclude pattern recognton, machne learnng and neural computng. In these felds, he has authored or coauthored over 7 scentfc journal papers. Xaoyang Tan receved hs B.S. and M.S. degree n computer applcatons from NUAA n 993 and 996, respectvely. Then he worked at NUAA n June 996 as an assstant lecturer. He receved a Ph.D. degree from Department of Computer Scence and Technology of Nanjng Unversty, Chna, n 25. From Sept.26 to OCT.27, he worked as a postdoctoral researcher n the LEAR (Learnng and Recognton n Vson) team at INRIA Rhone- Alpes n Grenoble, France. Hs research nterests are n face recognton, machne learnng, pattern recognton, and computer vson. In these felds, he has authored or coauthored over 2 scentfc papers. Zh-Hua Zhou (S -M -SM 6) receved the BSc, MSc and PhD degrees n computer scence from Nanjng Unversty, Chna, n 996, 998 and 2, respectvely, all wth the hghest honors. He joned the Department of Computer Scence & Technology at Nanjng Unversty as an assstant professor n 2, and s currently Cheung Kong Professor and Drector of the LAMDA group. Hs research nterests are n artfcal ntellgence, machne learnng, data mnng, pattern recognton, nformaton retreval, evolutonary computaton and neural computaton. In these areas he has publshed over 7 papers n leadng nternatonal journals or conference proceedngs. Dr. Zhou has won varous awards/honors ncludng the Natonal Scence & Technology Award for Young Scholars of Chna (26), the Award of Natonal Scence Fund for Dstngushed Young Scholars of Chna (23), the Mcrosoft Young Professorshp Award (26), etc. He s an Assocate Edtor-n-Chef of Chnese Scence Bulletn, Assocate Edtor of IEEE Transactons on Knowledge and Data Engneerng and ACM Transactons on Intellgent Systems and Technology, and on the edtoral boards of varous journals. He s a co-founder of ACML, Steerng Commttee member of PAKDD and PRICAI, Program Commttee Char/Co-Char of PAKDD 7, PRICAI 8 and ACML 9, vce Char or area Char of conferences ncludng IEEE ICDM 6, IEEE ICDM 8, SIAM DM 9, ACM CIKM 9, etc., and General Char/Co-Char or Program Commttee Char/Co-Char of a dozen of natve conferences n Chna. He s the char of the Machne Learnng Socety of the Chnese Assocaton of Artfcal Intellgence (CAAI), vce char of the Artfcal Intellgence & Pattern Recognton Socety of the Chna Computer Federaton (CCF), and char of the IEEE Computer Socety Nanjng Chapter.

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence