Feature Extraction by Maximizing the Average Neighborhood Margin

Size: px

Start display at page:

Download "Feature Extraction by Maximizing the Average Neighborhood Margin"

Philip Holland
5 years ago
Views:

1 Feature Extracton by Maxmzng the Average Neghborhood Margn Fe Wang, Changshu Zhang State Key Laboratory of Intellgent Technologes and Systems Department of Automaton, Tsnghua Unversty, Bejng, Chna Abstract A novel algorthm called Average Neghborhood Margn Maxmzaton () s proposed for supervsed lnear feature extracton. For each data pont, ams at pullng the neghborng ponts wth the same class label towards t as near as possble, whle smultaneously pushng the neghborng ponts wth dfferent labels away from t as far as possble. We wll show that features extracted from can separate the data from dfferent classes well, and t avods the small sample sze problem exsted n tradtonal Lnear Dscrmnant Analyss (LDA). The kernelzed (nonlnear) counterpart of s also establshed n ths paper. Moreover, as n many computer vson applcatons the data are more naturally represented by hgher order tensors (e.g. mages and vdeos), we develop a tensorzed (multlnear) form of, whch can drectly extract features from tensors. The expermental results of applyng to face recognton are presented to show the effectveness of our method. 1. Introducton Feature extracton (or dmensonalty reducton) s an mportant research topc n computer vson and pattern recognton felds, snce (1) the curse of hgh dmensonalty s usually a major cause of lmtatons of many practcal technologes; (2) the large quanttes of features may even degrade the performances of the classfers when the sze of the tranng set s small compared to the number of features [1]. In the past several decades, many feature extracton methods have been proposed, n whch the most wellknown ones are Prncpal Component Analyss () [1] and Lnear Dscrmnant Analyss (LDA). However, there are stll some lmtatons for drectly applyng them to solve vson problems. Frstly, although s a popular unsupervsed method whch ams at extractng a subspace n whch the varance of the projected data s maxmzed (or, equvalently, the reconstructon error s mnmzed), t does not take the class nformaton nto account and thus may not be relable for classfcaton tasks. On the contrary, LDA s a supervsed technque whch has been shown to be more effectve than n many applcatons. It ams to maxmze the betweenclass scatter and smultaneously mnmze the wthn-class scatter. Unfortunately, t has also been ponted out that there are some drawbacks exsted n LDA [13], such as (1) t usually suffers from the small sample sze problem [18] whch makes the wthn-class scatter matrx sngular; (2) t s only optmal for the case where the dstrbuton of the data n each class s a Gaussan wth an dentcal covarance matrx; (3) LDA can only extract at most c 1 features (where c s the number of dfferent classes), whch s suboptmal for many applcatons. Another lmtaton of and LDA s that they are all lnear methods. However, t s dscovered that many vson problems may not be lnear [7][2], whch makes these lnear approaches neffcent. Fortunately, kernel based methods [2] can handle these nonlnear cases very well. The basc dea behnd those kernel based technques s to frst map the data to a hgh-dmensonal (usually nfntedmensonal) feature space, and make the nonlnear problem n the orgnal space lnearly solvable n the feature space. It has been shown that Kernelzed [3] and Kernelzed LDA [19] can mprove the performances of orgnal and LDA sgnfcantly n many computer vson and pattern recognton problems. Fnally, and LDA take ther nputs as vectoral data, but n many real-world vson problems, the data are more naturally represented as hgher-order tensors. For example, a captured mage s a 2nd-order tensor,.e. matrx, and the sequental data, such as a vdeo sequence for event analyss, s n the form of 3rd-order tensor. Thus t s necessary to derve the multlnear forms of these tradtonal lnear feature extracton methods to handle the data as tensors drectly. Recently ths research topc has receved consderable nterests from the computer vson and pattern recognton communty [5], and the proposed methods have been shown to be much more effcent than the tradtonal vectoral methods. In ths paper, we propose a novel supervsed lnear feature extracton method called Average Neghborhood Mar- 1

2 gn Maxmzaton (). For each data pont, ams to pull the neghborng ponts wth the same class label towards t as near as possble, whle smultaneously push the neghborng ponts wth dfferent labels away from t as far as possble. Compared wth tradtonal LDA, our method has the followng advantages: 1. avods the small sample sze problem [18] snce t does not need to compute any matrx nverse; 2. can fnd the dscrmnant drectons wthout assumng the partcular form of class denstes; 3. Much more feature dmensons are avalable n, whch s not lmted to c 1 as n LDA. Moreover, we also derve the nonlnear and multlnear forms of for handlng the nonlnear and tensor data. Fnally the expermental results on face recognton are presented to show the effectveness of our method. The rest of ths paper s organzed as follows. In secton 2 we wll brefly revew some methods that are closely related to. The algorthm detals of wll be ntroduced n secton 3. In secton 4 and secton 5 we wll develop the kernelzed and tensorzed forms of. The expermental results on face recognton wll be presented n secton 6, followed by the conclusons and dscussons n secton Related Works In ths secton we wll brefly revew some lnear feature extracton methods that are closely related to. Frst let s see some notatons and problem defnton. Let {(x 1, y 1 ), (x 2, y 2 ),, (x N, y N )} be the emprcal dataset, where x R d s the -th datum represented by a d dmensonal column vector, and y L s the label of x, L = {1, 2,, c} s the label set. The goal of lnear feature extracton s to learn a d l projecton matrx W, whch can project x to y = W T x, where y R l s the projected data wth l d, such that n the projected space the data from dfferent classes can be effectvely dscrmnated. Tradtonal LDA learns W by maxmzng the followng crteron W T S b W J = W T S w W, where S b = c k=1 p k(m k m)(m k m) T s the betweenclass scatter matrx, where p k and m k are the pror and mean of class k, and m s the mean of the entre dataset. S w = c k=1 p ks k s the wthn-class scatter matrx wth S k beng the covarance matrx of class k. It has been shown that J can be maxmzed when W s consttuted by the egenvectors of S 1 w S b correspondng to ts l largest egenvalues [13]. However, when the sze of the dataset s small, S w wll become sngular. Then S 1 w does not exst and the small sample sze (SSS) problem occurs. Many approaches have been proposed to solve such a problem, such as +LDA [18], null space LDA [14], drect LDA [9], etc. L et al. [6] further proposed an effcent and robust lnear feature extracton method whch ams to maxmze the followng crteron whch was called a margn n [6] J = tr ( W T (S b S w )W ), (1) where tr( ) denotes the matrx trace. We can see that there s no need for computng any matrx nverse n optmzng the above crteron. However, such a margn s lack of geometrc ntutons. Qu et al. [23] proposed a Nonparametrc Margn Maxmzaton Crteron for learnng W, whch tres to maxmze J = N w ( δ E 2 δ I 2 ) (2) =1 n the transformed space, where δ E s the dstance between x and ts nearest neghbor n the dfferent class, δ I s the dstance between x and ts furthest neghbor n the same class. The problem s that usng just the nearest (or furthest) neghbor for defnng the margn may cause the algorthm senstve to outlers. Moreover, the stepwse procedure for maxmzng J s tme consumng. From another pont of vew lnear feature extracton can also be treated as learnng a proper Mahalanobs dstance between parwse ponts, snce y y j 2 = W T (x x j ) 2 = (x x j ) T WW T (x x j ) Let M = WW T, then y y j 2 = (x x j ) T M(x x j ). Wenberger et al. [15] proposed a large margn crteron to learn a proper M for k Nearest Neghbor classfer, and optmze t through a Semdefnte Programmng (SDP) procedure. Unfortunately, the computatonal burden of SDP s hgh, whch lmts ts potental applcatons n hghdmensonal datasets. 3. Feature Extracton by Average Neghborhood Margn Maxmzaton () In ths secton we wll ntroduce our Average Neghborhood Margn Maxmzaton () algorthm n detal. Lke other lnear feature extracton methods, ams to learn a projecton matrx W such that the data n the projected space have hgh wthn-class smlarty and betweenclass separablty. To acheve such a goal, we frst ntroduce

3 be defned as γ = γ = k:x k N e y y k 2 y y j 2 j:x j N o, (a) Neghborhood n the orgnal (b) Neghborhood n the projected space space Fgure 1. An ntutve llustraton of the crteron. The yellow dsk n the center represents x. The blue dsks are the data ponts n the homogeneous neghborhood of x, and the red squares are the data ponts n the heterogeneous neghborhood of x. (a) shows the data dstrbuton n the orgnal space, (b) shows the data dstrbuton n the projected space. two types of neghborhoods: Defnton 1(Homogeneous Neghborhoods). For a data pont x, ts ξ nearest homogeneous neghborhood N o s the set of ξ most smlar 1 data whch are n the same class wth x. Defnton 2(Heterogeneous Neghborhoods).For a data pont x, ts ζ nearest heterogeneous neghborhood N e s the set of ζ most smlar data whch are not n the same class wth x. Then the average neghborhood margn γ for x s defned as γ = k:x k N e y y k 2 y y j 2 j:x j N o, where represents the cardnalty of a set. Lterally, ths margn measures the dfference between the average dstance from x to the data ponts n ts heterogeneous neghborhood and the average dstance from t to the data ponts n ts homogeneous neghborhood. The maxmzaton of such a margn can push the data ponts whose labels are dfferent from x away from x whle pull the data ponts havng the same class label wth x towards x. Fg.1 gves us an ntutve llustraton of the crteron. Therefore, the total average neghborhood margn can 1 In ths paper two data vectors are consdered to be smlar f the Eucldean dstance between them s small, two data tensors are consdered to be smlar f the Frobenus norm of ther dfference tensor s small. and the crteron s to maxmze γ. Snce y y k 2 k:x k N e = tr (y y k ) (y y k ) T k:x k N e = tr W T (x x k ) (x x k ) T W k:x k N e = W T tr(s)w, (3) where the matrx S =,k: x k N e (x x k ) (x x k ) T, (4) s called the scatterness matrx. Smlarly, f we defne the compactness matrx as Then C = j:x j N o,j: x j N o (x x j ) (x x j ) T y y j 2. (5) = tr ( W T CW ). Therefore the average neghborhood margn can be rewrtten as γ = tr [ W T (S C)W ]. (6) If we expand W as W = (w 1, w 2,, w l ), then γ = l k=1 wt k (S C)w k. To elmnate the freedom that we can multply W wth some nonzero scalar, we add the constrant w T k w k = 1,.e. we restrct W to be consttuted of unt vectors. Thus our crteron becomes l max k=1 wt k (S C)w k s.t. wk T w k = 1. (7)

4 Table 1. Average Neghborhood Margn Maxmzaton Input: Tranng set D = {(x, y )} N =1, Testng set Z = {z 1, z 2,, z M }, Neghborhood sze,, Desred dmensonalty l; Output: l M feature matrx F extracted from Z. 1. Construct the heterogeneous neghborhood and homogeneous neghborhood for each x ; 2. Construct the scatterness matrx S and compactness matrx C usng Eq.(4) and Eq.(5) respectvely; 3. Do egenvalue decomposton on S C, construct d l matrx W whose columns are composed by the egenvectors of S C correspondng to ts largest l egenvalues; 4. Output F = W T Z wth Z = [z 1, z 2,, z N ]. Usng the Lagrangan method, we can easly fnd that the optmal W s composed of the l egenvectors correspondng to the largest l egenvalues of S C. To summarze, the man procedure of s shown n Table Nonlnearzaton va Kernelzaton In ths secton, we wll extend the algorthm to the nonlnear case va the kernel method [2]. More formally, we wll frst map the dataset from the orgnal space R d to a hgh (usually nfnte) dmensonal feature space F through a nonlnear mappng Φ : R d F, and apply lnear there. In the feature space F, the Eucldean dstance between Φ(x ) and Φ(x j ) can be computed as Φ(x ) Φ(x j ) = (Φ(x ) Φ(x j )) T (Φ(x ) Φ(x j )) = K + K jj 2K j, where K j = Φ(x ) T Φ(x j ) s the (, j)-th entry of the kernel matrx K. Thus we can use K to fnd the heterogeneous neghborhood and homogeneous neghborhood for each x n the feature space, and the total average neghborhood margn becomes where S Φ = C Φ = γ Φ = l,k: Φ(x k ) N e Φ(x ),j: Φ(x j ) N o Φ(x ) k=1 wt k (S Φ C Φ )w k, (8) (Φ(x ) Φ(x k )) (Φ(x ) Φ(x k )) T N e Φ(x ) (Φ(x ) Φ(x j )) (Φ(x ) Φ(x j )) T, N o Φ(x ) where NΦ(x e and N o ) Φ(x ) are the heterogeneous and homogeneous neghborhood of Φ(x ). It s mpossble to compute S Φ and C Φ drectly snce we usually do not know the explct form of Φ. To avod such a problem, we notce that each w k les n the span of Φ(x ), Φ(x 2 ),, Φ(x N ),.e. Therefore w T k Φ(x ) = w k = N p=1 αk pφ(x p ) N αpφ(x k p ) T Φ(x ) = (α k ) T K, p=1 where α k s a column vector wth ts p-th entry equal to α k p, K s the -th column of K. Thus Defne the matrces then γ Φ = S Φ = C Φ = = w T k (Φ(x ) Φ(x j ))(Φ(x ) Φ(x j )) T w k = (α k ) T (K K j )(K K j ) T α k.,k: Φ(x k ) N e Φ(x ),j: Φ(x j ) N o Φ(x ) l wk T (S Φ C Φ )w k = k=1 l (α k ) T ( SΦ C Φ) α k k=1 (K K k ) (K K k ) T N e Φ(x ) (9) (K K j ) (K K j ) T,(1) N o Φ(x ) l ( wk S Φ w k w k C Φ ) w k k=1 Smlar to Eq.(7), we also add the constrants that (α k ) T (α k ) = 1 (k = 1, 2,, l). Then the optmal (α k ) s are the egenvectors of S Φ C Φ correspondng to ts largest l egenvalues. For a new test pont z, ts k-th extracted feature can be computed by w T k Φ(z) = N αpφ(x k p ) T Φ(z) = (α k ) T K t z. (11) p=1 where we use K t to denote the kernel matrx between the tranng set and the testng set. The man procedure Kernel Average Neghborhood Margn Maxmzaton (K) algorthm s summarzed n Table 2.

5 Table 2. Kernel Average Neghborhood Margn Maxmzaton Input: Tranng set D = {(x, y )} N =1, Testng set Z = {z 1, z 2,, z M }, Neghborhood sze NΦ o, N Φ e, Kernel parameter θ, Desred dmensonalty l; Output: l M feature matrx F extracted from Z. 1. Construct the kernel matrx K on the tranng set; 2. Construct the heterogeneous neghborhood and homogeneous neghborhood for each Φ(x ); 3. Compute S Φ and C Φ usng Eq.(9) and Eq.(1) respectvely; 4. Do egenvalue decomposton on S Φ C Φ, store the egenvectors {α 1, α 2,, α l } correspondng to the largest l egenvalues; 5. Construct the kernel matrx between the tranng set and the testng set K t wth ts (, j)-th entry K t j = Φ(x ) T Φ(z j ). 6. Output F Φ wth F Φ j = (α ) T K t j. 5. Multlnearzaton va Tensorzaton Tll now the method we have ntroduced s based on the basc assumpton that the data are n vectorzed representatons. Therefore t s necessary to derve the tensor form of our method. Frst let s ntroduce some notatons and defntons. Let A be a tensor of d 1 d 2 d K. The order of A s K and the f-th dmenson (or mode) of A s of sze d f. A sngle entry wthn a tensor s denoted by A 1 2 K. Defnton 3 (Scalar Product). The scalar product A, B of two tensors A, B R d1 d2 d K s defned as A, B = A K B 1 2 K, K where denotes the complex conjugaton. Furthermore, the Frobenus norm of a tensor A s defned as A F = A, A, Defnton 4 (f-mode Product). The f-mode product of a tensor A R d1 d2 d K and a matrx U R d f g f s an d 1 d 2 d f 1 g f d f+1 d K tensor denoted as A f U, where the correspondng entres are gven by (A f U) 1 f 1 j f f+1 K = f A 1 f 1 f f+1 K U f j f Defnton 5 (f-mode Unfoldng). Let A be a d 1 d K tensor and (π 1,, π K 1 )be any permutaton of the entres of the set {1,, f 1, f +1,, K}. The f-mode unfoldng of the tensor A nto a d f K 1 l=1 d π l matrx, denoted by A (f), s defned as A R d1 d K f A (f) R d f K 1 l=1 dπ l, where A (f) f j = A 1 K wth j = 1 + K 1 ( π l=1 l 1) l 1 d l π =1 l. The tensor based crteron for s that, gven N data ponts X 1,, X N embedded n a tensor space R d1 d2 d K, we want to pursue K optmal nterrelated projecton matrces U R l d (l < d, = 1, 2,, K), whch maxmze the average neghborhood margn measured n the tensor metrc. That s γ = Y Y j 2 F j:x j N o Y Y k 2 F k:x k N e, where Y = X 1 U 1 2 U 2 K U K. Note that drectly maxmzng γ s almost nfeasble snce t s a hgherorder optmzaton problem. Generally such type of problems can be solved approxmately by employng an terate scheme whch was orgnally proposed by [12] for low-rank approxmaton of second-order tensors. Later [8] extended t for hgher-order tensors. In the followng we wll adopt such an teratve scheme to solve the optmzaton problem. Gven U 1, U 2,, U f 1, U f+1,, U K, let Y f = X 1 U 1 f 1 U f 1 f+1 U f+1 K U K. (12) Then, by the correspondng f-mode unfoldng, we can get Y f f Y (f) Therefore we have. Moreover, we can easly derve that Y f F ( ) T f U f = Y (f) Uf. Y Y j 2 F = X 1 U 1 K U K X j 1 U 1 K U K 2 F = Y f f U f Y f j 2 f U f F ( ) T ( ) T = Y (f) Uf Y (f) 2 j Uf F [ ( ) ( ) ] T = tr U T f Y (f) Y (f) j Y (f) Y (f) j Uf Then knowng U 1,, U f 1, U f+1,, U K, we can rewrte the compactness matrx and scatterness matrx n tensor as S = C =,k: x k N e,j: x k N o ( Y (f) ( Y (f) ) ( Y (f) k Y (f) F ) T Y (f) k,(13) ) ( Y (f) j Y (f) ) T Y (f) j,(14)

6 Table 3. Tensor Average Neghborhood Margn Maxmzaton Input: Tranng set D = {(X, y )} N =1, Testng set Z = {Z 1, Z 2,, Z M }, where X, Z j R d1 d2 d K, Neghborhood sze,, Desred dmensonalty l 1, l 2,, l K, Iteraton steps T max, Dfference ε; Output: Feature tensors {F } M =1 extracted from Z, where F R l1 l2 l K. 1. Intalze U 1 = I d1, U 2 = I d2,, U K = I d K, where I d represents the d d dentty matrx; 2. For t = 1, 2,, T max do For f = 1, 2,, K do (a). Compute Y f (b). Y f f Y (f) ; by Eq.(12); (c). Compute S and C usng Eq.(13) and Eq.(14); (d). Do egenvalue decomposton on S C: (S C)U t f = Ut f Λ f wth U t f Rd f l f ; (f). f U t f Ut 1 f < ε, break; End for. End for. 3. Output F = Z 1 U t 1 K U t K. and our optmzaton problem (wth respect to U f ) becomes max tr [ U T ] f (S C) U f (15) U f Let s expand U f as U f = (u f1, u f2,, u flf ) wth u f correspondng to the -th column of U f, then Eq.(15) can be rewrtten as max l f =1 ut f(s C)u f. (16) We also add the constrant that u T f u f = 1 to restrct the scale of U f. The man procedure of the Tensor Average Neghborhood Margn Maxmzaton (T) s summarzed n Table Experments In ths secton, we nvestgate the performance of our proposed, Kernel (K) and Tensor (T) methods for face recognton. We have done three groups of experments to acheve ths goal: 1. Lnear methods. In ths set of experments, the performance of orgnal s compared wth the tradtonal [16] method, LDA (+LDA) method [18], and three margn based methods, namely the Maxmum Margn Crteron () method [6], the Stepwse Nonparametrc Maxmum MArgn Crteron (SN) method [23] and the Margnal Fsher Analyss () method [21]; 2. Kernel methods. In ths set of experments, the performance of the K method s compared wth the K and the KDA method [17]; 3. Tensor methods. In ths set of experments, the performance of the Tensor (T) method s compared wth the Tensor (T) and the Tensor LDA (TLDA) methods [4]. In ths study, three face dataset are used: 1. The ORL face dataset 2. There are ten mages for each of the 4 human subjects, whch were taken at dfferent tmes, varyng the lghtng, facal expressons (open / closed eyes, smlng / not smlng) and facal detals (glasses / no glasses). The mages were taken wth a tolerance for some tltng and rotaton of the face up to 2 degrees. The orgnal mages (wth 256 gray levels) have sze , whch are reszed to for effcency; 2. The Yale face dataset 3. It contans 11 grayscale mages for each of the 15 ndvduals. The mages demonstrate varatons n lghtng condton (left-lght, center-lght, rght-lght), facal expresson (normal, happy, sad, sleepy, surprsed, and wnk), and wth/wthout glasses. In our experment, the mages were also reszed to 32 32; 3. The CMU PIE face dataset [22]. It contans 68 ndvduals wth 41,368 face mages as a whole. The face mages were captured by 13 synchronzed cameras and 21 flashes, under varyng pose, llumnaton, and expresson. In our experments, fve near frontal poses (C5, C7, C9, C27, C29) are selected under dfferent llumnatons, lghtng and expressons whch leaves us 17 near frontal face mages for each ndvdual, and all the mages were also reszed to The free parameters for the tested methods were determned n the followng ways: 1. For the -seres methods (ncludng, K, T), the szes of the homogeneous and heterogeneous neghborhoods for each data pont are all set to 1; 2. For the kernel methods,we all adopt the Gaussan kernel, and the varance of the Gaussan kernel were set by cross-valdaton; 3. For the tensor methods, we requre that the projected mages are also square,.e. of dmenson r r for some r

7 recognton accuracy 2 Tran SN LDA recognton accuracy.9 3 Tran SN +LDA recognton accuracy Tran SN +LDA Fgure 2. Face recognton accuraces on the ORL dataset wth 2,3,4 mages for each ndvdual randomly selected for tranng. 2 Tran 3 Tran 4 Tran recognton accuracy SN +LDA recognton accuracy SN +LDA recognton acuracy SN +LDA Fgure 3. Face recognton accuraces on the Yale dataset wth 2,3,4 mages per ndvdual randomly selected for tranng. 5 Tran 1 Tran Tran recognton accuracy SN +LDA recognton accuracy SN +LDA recognton accuracy SN +LDA Fgure 4. Face recognton accuraces on the CMU PIE dataset wth 5,1,2 mages per ndvdual randomly selected for tranng. The expermental results of the lnear methods on the three datasets are shown n Fg.2, Fg.3, Fg.4 respectvely. In all the fgures, the abscssas represent the projected dmensons, and the ordnates are the average recognton accuraces of 5 ndependent runs. From the fgures we clearly see that the performances of s better than other lnear methods on all the three datasets. Table 4 shows the expermental results of all the methods on three datasets, where the value n each entry represents the average recognton accuracy (n percentages) of 5 ndependent trals, and the number n brackets s the correspondng projected dmenson. The table shows that the -seres methods can perform better than those tradtonal methods on the three datasets. 7. Conclusons and Dscussons In ths paper we proposed a novel supervsed lnear feature extracton method named Average Neghborhood Margn Maxmzaton (). For each data pont, ams at pullng the neghborng ponts wth the same class label towards t as near as possble, whle smultaneously pushng the neghborng ponts wth dfferent labels away from t as far as possble. Moreover, as many computer vson and pattern recognton problems are ntrnscally nonlnear or multlnear, we also derve the kernelzed and tensorzed counterparts of. Fnally the expermental results on face recognton are presented to show the effectveness of our proposed approaches.

8 Table 4. Face recognton results on three datasets (%). Method ORL Yale CMU PIE 2 Tran 3 Tran 4 Tran 2 Tran 3 Tran 4 Tran 5 Tran 1 Tran 2 Tran 54.35(56) 64.71(64) 71.54(36) 45.19(37) 51.91(35) 56.3(4) 46.64(24) 54.72(213) 67.17(241) LDA 77.36(28) 86.96(39) 91.71(39) 46.4(9) 59.25(13) 68.9(12) 57.5(62) 76.75(62) 88.6(61) 77.73(54) 85.98(29) 91.26(52) 46.64(54) 58.8(56) 71.67(39) 57.5(21) 77.56(215) 85.54(195) SN 79.23(49) 87.68(54) 93.59(36) 49.5(49) 66.31(49) 78.57(47) 66.45(223) 88(213) 91.2(22) 77.34(41) 87.19(33) 92.19(33) 49.56(38) 64.6(38) 76.5(39) 63.6(21) 89(232) 88.69(25) 82.13(37) 89.13(41) 95.84(43) 55(41) 67.87(38) 89(41) 7.5(222) 82.8(23) 93.46(25) K 64.23(5) 75.25(54) 79.26(6) 49.34(45) 55.78(47) 62(54) 52.35(341) 62(384) 72.25(256) KDA 89(38) 89.13(36) 93.12(38) 52.35(14) 64.89(13) 71.95(14) 62.13(67) 81.27(66) 92.11(65) K 85.46(5) 92.21(39) 96.13(53) 54.62(54) 69.25(66) 87(62) 72.1(32) 82.41(28) 93.67(218) T 59.22(1 2 ) 71.25(12 2 ) 79.86(1 2 ) 55(7 2 ) 57.23(11 2 ) 62.3(1 2 ) 51.17(1 2 ) 56.65(13 2 ) 69.9(11 2 ) TLDA 88(9 2 ) 89.28(11 2 ) 93.37(8 2 ) 51.25(9 2 ) 66.19(1 2 ) 75.88(9 2 ) 61(12 2 ) 85(14 2 ) 92.75(8 2 ) T 85.87(1 2 ) 92.54(9 2 ) 96.22(11 2 ) 55.31(11 2 ) 73(8 2 ) 81.56(1 2 ) 73.2(12 2 ) 82.78(9 2 ) 94.32(11 2 ) As we mentoned n secton 2, lnear feature extracton methods can also be vewed as learnng a proper Mahalanobs dstance n the orgnal data space. Thus can also be used for dstance metrc learnng. From such a vewpont, our algorthm s more effcent n that t only needs to learn the transformaton matrx, but not the whole covarance matrx as n tradtonal metrc learnng algorthms[15]. References [1] A. K. Jan, B. Chandrasekaran. Dmensonalty and Sample Sze Consderatons n Pattern Recognton Practce. In Handbook of Statstcs. Amsterdam, North Holland [2] B. Schölkopf, A. Smola. Learnng wth Kernels. The MIT Press. Cambrdge, Massachusetts. London, England , 4 [3] B. Schölkopf, A. Smola, K.-R. Müller. Nonlnear Component Analyss as a Kernel Egenvalue Problem. Neural Computaton, 1: [4] D. Ca, X. He, J. Han. Subspace Learnng Based on Tensor Analyss. Department of Computer Scence Techncal Report No. 2572, Unversty of Illnos at Urbana-Champagn (UIUCDCS-R ) [5] Fernando De la Torre, M. Alex O. Vaslescu. Lnear and Multlnear (Tensor) Methods for Vson, Graphcs, and Sgnal Processng. IEEE CVPR Tutoral [6] H. L, T. Jang, K. Zhang. Effcent and Robust Feature Extracton by Maxmum Margn Crteron. In NIPS , 6 [7] H. S. Seung, D. D. Lee. The manfold ways of percepton. Scence, [8] H. Wang, Q., Wu, L., Sh, Y., Yu, N., Ahuja. Out-of-Core Tensor Approxmaton of Mult-Dmensonal Matrces of Vsual Data. In Proceedngs of ACM SIGGRAPH [9] H. Yu, J. Yang. A Drect LDA Algorthm for Hgh Dmensonal Data wth Applcaton to Face Recognton. Pattern Recognton [1] I. T. Jollffe. Prncpal Component Analyss. Sprnger- Verlag, New York [11] J. Yang, D. Zhang, Alejandro F. Frang, J. Yang. Two- Dmensonal : A New Approach to Appearance-Based Face Representaton and Recognton. IEEE TPAMI. 24. [12] J. Ye. Generalzed Low Rank Approxmatons of Matrces. In Proceedngs of ICML [13] K. Fukunaga. Introducton to Statstcal Pattern Recognton. Academc Press, New York, 2nd edton , 2 [14] K. Lu, Y. Cheng, J. Yang. A Generalzed Optmal Set of Dscrmnant Vectors. Pattern Recognton [15] K. Q. Wenberger, J. Bltzer, L. K. Saul. Dstance Metrc Learnng for Large Margn Nearest Neghbor Classfcaton In NIPS , 8 [16] M. A. Turk and A. P. Pentland. Egenfaces for recognton. Journal of Cogntve Neuroscence, 3(1): 71-96, [17] M. -H. Yang. Kernel Egenfaces vs. Kernel Fsherfaces: Face Recognton Usng Kernel Methods. InProceedngs of the Ffth IEEE Internatonal Conference on Automatc Face and Gesture Recognton [18] P.N. Belhumeur, J. Hespanda, D. Kregeman. Egenfaces vs. Fsherfaces: Recognton Usng Class Specfc Lnear Projecton. IEEE Trans. on PAMI , 2, 6 [19] S. Mka, G. Rätsch, J. Weston, B. Schölkopf, K.-R. Müller. Fsher Dscrmnant Analyss wth Kernels. Neural Networks for Sgnal Processng IX, IEEE [2] S. T. Rowes, L. K. Saul. Nonlnear dmensonalty reducton by locally lnear embeddng. Scence, [21] S. Yan, D. Xu, B. Zhang and H. Zhang. Graph Embeddng: A General Framework for Dmensonalty Reducton. In Proceedngs of IEEE CVPR [22] T. Sm, S. Baker, and M. Bsat. The CMU pose, llumnlaton, and expresson database. IEEE Trans. on PAMI [23] X. Qu, L. Wu. Face Recognton by Stepwse Nonparametrc Margn Maxmum Crteron. In Proc. ICCV , 6

Tensor Subspace Analysis

Tensor Subspace Analysis Tensor Subspace Analyss Xaofe He 1 Deng Ca Partha Nyog 1 1 Department of Computer Scence, Unversty of Chcago {xaofe, nyog}@cs.uchcago.edu Department of Computer Scence, Unversty of Illnos at Urbana-Champagn