Pattern Recognition 42 (2009) Contents lists available at ScienceDirect. Pattern Recognition. journal homepage:

Size: px

Start display at page:

Download "Pattern Recognition 42 (2009) Contents lists available at ScienceDirect. Pattern Recognition. journal homepage:"

Jesse Morgan
5 years ago
Views:

Pattern Recognton 4 (9) 764 -- 779 Contents lsts avalable at ScenceDrect Pattern Recognton ournal homepage: www.elsever.

L e a School of Mathematcs and Computatonal Scence, Sun Yat-sen Unversty, Guangzhou, PR Chna b Department of Electroncs and Communcaton Engneerng, School of Informaton Scence and Technology, Sun

1 Pattern Recognton 4 (9) Contents lsts avalable at ScenceDrect Pattern Recognton ournal homepage: Perturbaton LDA: Learnng the dfference between the class emprcal mean and ts expectaton We-Sh Zheng a,c, J.H. La b,c,, Pong C. Yuen d, Stan Z. L e a School of Mathematcs and Computatonal Scence, Sun Yat-sen Unversty, Guangzhou, PR Chna b Department of Electroncs and Communcaton Engneerng, School of Informaton Scence and Technology, Sun Yat-sen Unversty, Guangzhou, PR Chna c Guangdong Provnce Key Laboratory of Informaton Securty, PR Chna d Department of Computer Scence, Hong Kong Baptst Unversty, Hong Kong e Center for Bometrcs and Securty Research and atonal Laboratory of Pattern Recognton, Insttute of Automaton, Chnese Academy of Scences, Beng, PR Chna A R T I C L E I F O A B S T R A C T Artcle hstory: Receved 4 September 6 Receved n revsed form 9 July 8 Accepted September 8 Keywords: Fsher crteron Perturbaton analyss Face recognton Fsher's lnear dscrmnant analyss (LDA) s popular for dmenson reducton and extracton of dscrmnant features n many pattern recognton applcatons, especally bometrc learnng. In dervng the Fsher's LDA formulaton, there s an assumpton that the class emprcal mean s equal to ts expectaton. However, ths assumpton may not be vald n practce. In ths paper, from the perturbaton perspectve, we develop a new algorthm, called perturbaton LDA (P-LDA), n whch perturbaton random vectors are ntroduced to learn the effect of the dfference between the class emprcal mean and ts expectaton n Fsher crteron. Ths perturbaton learnng n Fsher crteron would yeld new forms of wthn-class and between-class covarance matrces ntegrated wth some perturbaton factors. Moreover, a method s proposed for estmaton of the covarance matrces of perturbaton random vectors for practcal mplementaton. The proposed P-LDA s evaluated on both synthetc data sets and real face mage data sets. Expermental results show that P-LDA outperforms the popular Fsher's LDA-based algorthms n the undersampled case. 8 Elsever Ltd. All rghts reserved.. Introducton Data n some applcatons such as bometrc learnng are of hgh dmenson, whle avalable samples for each class are always lmted. In vew of ths, dmenson reducton s always desrable, and at the same tme t s also expected that data of dfferent classes can be more easly separated n the lower-dmensonal subspace. Among the developed technques for ths purpose, Fsher's lnear dscrmnant analyss (LDA) [ 4] has been wdely and popularly used as a powerful tool for extracton of dscrmnant features. The basc prncple of Fsher's LDA s to fnd a proecton matrx such that the rato between the between-class varance and wthn-class varance s maxmzed n a lower-dmensonal feature subspace. * Correspondng author at: Department of Electroncs and Communcaton Engneerng, School of Informaton Scence and Technology, Sun Yat-sen Unversty, Guangzhou, Guangdong 575, PR Chna. Tel.: E-mal addresses: wszheng@eee.org (W.-S. Zheng), stslh@mal.sysu.edu.cn (J.H. La), pcyuen@comp.hbu.edu.h (Pong C. Yuen), szl@nlpr.a.ac.cn (Stan Z. L). LDA n ths paper refers to Fsher's LDA. It s not a classfer but a feature extractor learnng low-ran dscrmnant subspace, n whch any classfer can be used to perform classfcaton. Due to the curse of hgh dmensonalty and the lmt of tranng samples, wthn-class scatter matrx S w s always sngular, so that classcal Fsher's LDA wll fal. Ths nd of sngularty problem s always called the small sample sze problem [5,6] n Fsher's LDA. So far, some well-nown varants of Fsher's LDA have been developed to overcome ths problem. Among them, Fsherface (PCA+LDA) [5], nullspace LDA (-LDA) [6 8] and regularzed LDA (R-LDA) [9 3] are three representatve algorthms. In PCA+LDA, Fsher's LDA s performed n a prncpal component subspace, n whch wthn-class covarance matrx wll be of full ran. In -LDA, the nullspace of wthn-class covarance matrx S w s frst extracted, and then data are proected onto that subspace and fnally a dscrmnant transform s found there for maxmzaton of the varance among between-class data. In R-LDA, a regularzed term, such as λ I where λ >, s added to S w. Some other approaches, such as Drect LDA [4], LDA/QR [5] and some constraned LDA [6,7], are also developed. Recently, some efforts are made for development of two-dmensonal LDA technques (D-LDA) [8 ], whch perform drectly on matrx-form data. A recent study [] conducts comprehensve theoretcal and expermental comparsons between the tradtonal Fsher's LDA technques and some representatve D-LDA algorthms n the undersampled case. It s expermentally shown 3-33/$ - see front matter 8 Elsever Ltd. All rghts reserved. do:.6/.patcog.8.9.

2 W.-S. Zheng et al. / Pattern Recognton 4 (9) that some two-dmensonal LDA may perform better than Fsherface and some other tradtonal Fsher's LDA approaches n some cases, but R-LDA always performs better. However, estmaton of the regularzed parameter n R-LDA s hard. Though cross-valdaton (CV) s popularly used, t s tme consumng. Moreover, t s stll hard to fully nterpret the mpact of ths regularzed term. From the geometrcal vew, Fsher's LDA maes dfferent class means scatter and data of the same class close to ther correspondng class means. However, snce the number of samples for each class s always lmted n some applcatons such as bometrc learnng, the estmates of class means are not accurate, and ths would degrade the power of Fsher crteron. To specfy ths problem, we frst re-vst the dervaton of Fsher's LDA. Consder the classfcaton problem of L classes C,...,C L. Suppose the data space X ( R n ) s a compact vector space and {(x, y ),...,(x, y ),...,(x L, yl ),...,(xl, y L )} s L L a set of fnte samples. All data x,...,x,...,x L,...,xL are d, and L x ( X) denotes the th sample of class C wth class label y (.e., y = C )and s the number of samples of class C.Theemprcal mean of each class s then gven by û = = x and the total sample mean s gven by û = L = û, where = L = s the number of total tranng samples. The goal of LDA under Fsher crteron s to fnd an optmal proecton matrx by optmzng the followng Eq. (): Ŵ opt = arg max W trace(wt Ŝ b W)/trace(W T Ŝ w W), () where Ŝb and Ŝw are between-class covarance (scatter) matrx and wthn-class covarance (scatter) matrx, respectvely, defned as follows: Ŝ b = Ŝ w = = = (û û)(û û)t, () Ŝ, Ŝ = (x û )(x û ) T. (3) = It has been proved n [] that Eq. () could be wrtten equvalently as follows: Ŝ b = = = (û û )(û û )T. (4) For formulaton of Fsher's LDA, two basc assumptons are always used. Frst, the class dstrbuton s assumed to be Gaussan. Second, the class emprcal mean s n practce used to approxmate ts expectaton. Although Fsher's LDA has been gettng ts attracton for more than thrty years, as far as we now, there s lttle research wor addressng the second assumpton and nvestgatng the effect of the dfference between the class emprcal mean and ts expectaton value n Fsher crteron. As we now, û s the estmate of E x C [x ] based on the maxmum lelhood crteron, where E x C [x ] s the expectaton of class C. The substtuton of expectaton E x C [x ]wth ts emprcal mean û s based on the assumpton that the sample sze for estmaton s large enough to reflect the data dstrbuton of each class. Unfortunately, ths assumpton s not always true n some applcatons, especally the bometrc learnng. Hence the mpact of the dfference between those two terms should not be gnored. In vew of ths, ths paper wll study the effect of the dfference between the class emprcal mean and ts expectaton n Fsher crteron. We note that such dfference s almost mpossble to be specfed, snce E x C [x ] s usually hard (f not mpossble) to be determned. Hence, from the perturbaton perspectve, we ntroduce the perturbaton random vectors to stochastcally descrbe such dfference. Based on the proposed perturbaton model, we then analyze how perturbaton random vectors tae effect n Fsher crteron. Fnally, perturbaton learnng wll yeld new forms of wthn-class and between-class covarance matrces by ntegratng some perturbaton factors, and therefore a new Fsher's LDA formulaton based on these two new estmated covarance matrces s called perturbaton LDA (P-LDA). In addton, a sem-perturbaton LDA, whch gves a novel vew to R-LDA, wll be fnally dscussed. Although there are some related wor on covarance matrx estmaton for desgnng classfer such as RDA [3] and ts smlar wor [4], and EDDA [5], however, the obectve of P-LDA s dfferent from thers. RDA and EDDA are not based on Fsher crteron and they are classfers, whle P-LDA s a feature extractor and does not predct class label of any data as output. P-LDA would exact a subspace for dmenson reducton but RDA and EDDA do not. Moreover, the perturbaton model used n P-LDA has not been consdered n RDA and EDDA. Hence the methodology of P-LDA s dfferent from the ones of RDA and EDDA. Ths paper focuses on Fsher crteron, whle classfer analyss s beyond our scope. To the best of our nowledge, there s no smlar wor addressng Fsher crteron usng the proposed perturbaton model. The remander of ths paper s outlned as follows. The proposed P-LDA wll be ntroduced n Secton. The mplementaton detals wll be presented n Secton 3. Then P-LDA s evaluated usng three synthetc data sets and three large human face data sets n Secton 4. Dscussons and concluson of ths paper are then gven n Sectons 5 and 6, respectvely.. P-LDA: a new formulaton The proposed method s developed based on the dea of perturbaton analyss. A theoretcal analyss s gven and a new formulaton s proposed by learnng the dfference between the class emprcal mean and ts expectaton as well as ts mpact to the estmaton of covarance matrces s Fsher crteron. In Secton., we frst consder the case when data of each class follow sngle Gaussan dstrbuton. The theory s then extended to the mxture of Gaussan dstrbuton case and reported n Secton.. The mplementaton detals of the proposed new formulaton wll be gven In Secton 3... P-LDA under sngle Gaussan dstrbuton Assume data of each class are normally dstrbuted. Gven a specfc nput (x,y), where sample x X and class label y {C,...,C L }, we frst try to study the dfference between a sample x and E x y [x ] the expectaton of class y n Fsher crteron. However, E x y [x ] s usually hard (f not mpossble) to be determned, so t may be mpossble to specfc such dfference. Therefore, our strategy s to stochastcally characterze (smulate) the dfference between x and E x y [x ] by a random vector and then model a random mean for class y to stochastcally descrbe E x y [x ]. Defne n x ( R n )asaperturbaton random vector for stochastc descrpton (smulaton) of the dfference between x and E x y [x ]. When data of each class follow normal dstrbuton, we can model n x as a random vector from the normal dstrbuton wth mean and covarance matrx X y,.e., n x (, X y ), X y R n n. (5) We call X y the perturbaton covarance matrx of n x. The above model assumes that the covarance matrces X y of n x are the same for any sample x wth the same class label y. ote that t would be natural that an deal value of X y can be the expected covarance matrx of class y,.e., E x y [(x E x y [x ])(x E x y [x ]) T ]. However, ths value

3 766 W.-S. Zheng et al. / Pattern Recognton 4 (9) s usually hard to be determned, snce E x y [x ] and the true densty functon are not avalable. Actually ths nd of estmaton needs not be our goal. ote that the perturbaton random vector n x s only used for stochastc smulaton of the dfference between the specfc sample x and ts expectaton E x y [x ]. Therefore, n our study, X y only needs to be properly estmated for performng such smulaton based on the perturbaton model specfed by the followng Eqs. (6) and (7), fnally resultng n some proper correctngs (perturbatons) on the emprcal between-class and wthn-class covarance matrces as shown later. For ths goal, a random vector s frst formulated for any sample x to stochastcally approxmate E x y [x ]below: x = x + n x. (6) The stochastc approxmaton of x to E x y [x ] means there exsts a specfc estmate ˆn x of the random vector n x wth respect to the correspondng dstrbuton such that x + ˆn x = E x y [x ]. (7) Formally we call equalty Eqs. (6) and (7) the perturbaton model. Its not hard to see such perturbaton model s always satsfed. The man problem s how to model X y properly. For ths purpose, a technque wll be suggested n the next secton. ow, for any tranng sample x, we could formulate ts correspondng perturbaton random vector n (, X C ) and the random vector x =x +n to stochastcally approxmate ts expectaton E x C [x ]. By consderng the perturbaton mpact, E x C [x ] could be stochastcally approxmated on average by: ũ = x = û + n. (8) = = ote that ũ can only stochastcally but not exactly descrbe E x C [x ], so t s called the random mean of class C n our study. After ntroducng the random mean of each class, a new form of Fsher's LDA s developed below by ntegratng the factors of the perturbaton between the class emprcal mean and ts expectaton nto the supervsed learnng process, so that new forms of the betweenclass and wthn-class covarance matrces are obtaned. Snce ũ and ũ are both random vectors, we tae the expectaton wth respect to the probablty measure on ther probablty spaces, respectvely. To have a clear presentaton, we denote some sets of random vectors as n ={n,...,n }, =,...,L, andn ={n,...,n,...,n L,...,nL }. L Snce x,...,x,...,x L,...,xL are d, t s reasonable to assume that L n,...,n,...,n L,...,nL L are also ndependent. A new wthn-class covarance matrx of class C s then formed below: S = E n (x ũ )(x ũ ) T = Ŝ + X = C (9) So a new wthn-class covarance matrx s establshed by: S w = = S = Ŝw + X C = Ŝw + S Δ w () = In ths paper the notaton salwaysaddedoverheadtothecorrespondng random vector to ndcate that t s an estmate of that random vector. As analyzed later, ˆn x does not need to be estmated drectly, but a technque wll be ntroduced to estmate the nformaton about ˆn x. where S Δ w = L = X C. ext, followng equaltes () and (4), we get = = = = (ũ ũ )(ũ ũ )T (ũ ũ)(ũ ũ)t, where ũ = L = ũ = û + L = = n. Then a new betweenclass covarance matrx s gven by: S b = E n = Ŝb + SΔ b = = (ũ ũ )(ũ ũ )T () where S Δ b = L ( ) X = 3 C + L L = 3 s=,s ( sx Cs ). The detals of the dervaton of Eq. (9) and () can be found n Appendx A. From the above analyss, a new formulaton of Fsher's LDA called perturbaton LDA (P-LDA) s gven by the followng theorem. Theorem. (P-LDA) Under the Gaussan dstrbuton of wthn-class data, perturbaton LDA (P-LDA) fnds a lnear proecton matrx W opt such that: W opt = arg max W = arg max W trace(w T S b W) trace(w T S w W) trace(w T (Ŝb + SΔ b )W) trace(w T (Ŝw + S Δ. () w )W) Here, S Δ b and SΔ w are called between-class perturbaton covarance matrx and wthn-class perturbaton covarance matrx, respectvely. Fnally, we further nterpret the effects of covarance matrces S w and S b based on Eq. (). Suppose W = (w,...,w ) n Eq. (), where w m ( R n ) s a feature vector. Then for any W and random vectors n ={n }=,...,L =,...,, we defne: f b (W, n) = = = l (wm T (ũ ũ )), (3) m= f w (W, n) = l (wm T (x ũ )). (4) = = m= otng that ũ = û + = n s the random mean of class C,so f b (W, n) s the average parwse dstance between random means of dfferent classes and f w (W, n) s the average dstance between any sample and the random mean of ts correspondng class n a lowerdmensonal space. Defne the followng model: W opt (n) = arg max f b (W, n)/f w(w, n). W Gven specfc estmates ˆn={ˆn }=,...,L, we then can get a proecton =,..., W opt (ˆn). In practce, t would be hard to fnd the proper estmate ˆn that can accurately descrbe the dfference between x and ts expectaton E x C [x ]. Rather than accurately estmatng such ˆn, we nstead consder fndng the proecton by maxmzng the rato

4 W.-S. Zheng et al. / Pattern Recognton 4 (9) between the expectaton values of f b (W, n)andf w (W, n)wthrespect to n such that the uncertanty s consdered to be over the doman of n. Thats: W opt = arg max E n [f b (W, n)]/e n [f w(w, n)] W = arg max f b (W)/f w(w) W It can be verfed that f b (W) = E n [f b (W, n)] = trace(w T Sb W) (5) f w (W) = E n [f w (W, n)] = trace(w T Sw W) (6) So, t s exactly the optmzaton model formulated n Eq. (). Ths gves an more ntutve understandng of the effects of covarance matrces S w and S b.thoughnp-ldaŝw and Ŝb are perturbated by S Δ w and SΔ b, respectvely, however n Secton 5 we wll show S w and S b wll converge to the precse wthn-class and between-class covarance matrces, respectvely. Ths wll show the ratonalty of P-LDA, snce the class emprcal mean s almost ts expectaton value when sample sze s large enough and then the perturbaton effect could be gnored. In detals, suppose there are I GCs (clusters) n class C and out of all samples are n the th GC of class C.LetC denote the th GC of class C.Ifwedenotex,s as the sth sample of C, s =,...,, then a perturbaton random vector n,s for x can be modeled, where,s n,s (, X C ), X C R n n, so that x,s = x,s + n,s s a random vector stochastcally descrbes the expectaton of subclass C,.e., u. Then P-LDA can be extended to the mxture of Gaussans case by classfyng the subclasses {C }=,...,L. Thus we get the followng =,...,I theorem 3 a straghtforward extenson of Theorem and the proof s omtted. Theorem. Under the Gaussan mxture dstrbuton of data wthn each class, the proecton matrx of perturbaton LDA (P-LDA), W opt, can be found as follows: W opt trace(w T S = arg max b W) W trace(w T S w W) trace(w T = arg max (Ŝ b + S Δ b )W) W trace(w T (Ŝ w + S Δ w )W) (8) where S b = E n [ L L I I = = = s= s (ũ ũs )(ũ ũs )T ] = Ŝ b + S Δ b, S Δ = L I ( ) X b = = 3 C + L I L I = = 3 = s=,(,s) (,) (s X C s ), Ŝ b = L L I I = = = s= S w = L I S = = = Ŝ w + S Δ w, S = E n [, s= (x,s ũ )(x,s ũ )T ], S Δ w = L I X = = C, Ŝ w = L I = = s (û ûs )(û ûs )T, s= (x,s û )(x,s û )T, û = s= x,s, ũ = û + n, ={n,,...,n, }, n ={n,,...,n,i,...,n L,,...,n L,I }. L s= n,s, =,...,I, =,...,L,.. P-LDA under mxture of Gaussan dstrbuton Ths secton extends Theorem by alterng the class dstrbuton from sngle Gaussan to mxture of Gaussans [3]. Therefore, the probablty densty functon of a sample x n class C s: p(x C ) = I = P( )(x u, ), (7) where u s the expectaton of x n the th Gaussan component (GC) (x u, )ofclassc, s ts covarance matrx and P( ) s the pror probablty of the th GC of class C. Such densty functon ndcates that any sample x n class C manly dstrbutes n one of the GC. Therefore, Theorem under sngle Gaussan dstrbuton can be extended to learnng perturbaton n each GC. To do so, the clusters wthn each class should be frst determned such that data n each cluster are approxmately normally dstrbuted. Then those clusters are labeled as subclasses, respectvely. Fnally P-LDA s used to learn the dscrmnant nformaton of all those subclasses. It s smlar to the dea of Zhu and Martnez [6] who extended classcal Fsher's LDA to the mxture of Gaussan dstrbuton case. 3. Estmaton of perturbaton covarance matrces For mplementaton of P-LDA, we need to properly estmate two perturbaton covarance matrces S Δ b and SΔ w. Parameter estmaton s challengng, snce t s always ll-posed [3,3] due to lmted sample sze and the curse of hgh dmensonalty. A more robust and tractable way to overcome ths problem s to perform some regularzed estmaton. It s ndeed the motvaton here. A method wll be suggested to mplement P-LDA wth parameter estmaton n an entre PCA subspace wthout dscardng any nonzero prncpal component. Unle the covarance matrx estmaton on sample data, we wll ntroduce an ndrect way for estmaton of the covarance matrces of perturbaton random vectors, snce the observaton values of the perturbaton random vectors are hard to be found drectly. For dervaton, parameter estmaton would focus on P-LDA under sngle Gaussan dstrbuton, and t could be easly generalzed to the Gaussan mxture dstrbuton case by Theorem. Ths secton 3 The desgns of S and S b w n the crteron are not restrcted to the presented forms. The goal here s ust to present a way how to generalze the analyss under sngle Gaussan case.

5 768 W.-S. Zheng et al. / Pattern Recognton 4 (9) s dvded nto two parts. The frst part suggests regularzed models for estmaton of the parameters, and then a method for parameter estmaton s presented n the second part. 3.. Smplfed models for regularzed estmaton In ths paper, we restrct our attenton to the data that are not much heteroscedastc,.e., class covarance matrces are approxmately equal 4 (or not dffer too much). It s also n lne wth one of the condtons when Fsher crteron s optmal [3]. Under ths condton, we consder the case when perturbaton covarance matrces of all classes are approxmately equal. Therefore, the perturbaton covarance matrces can be replaced by ther average, a pooled perturbaton covarance matrx defned n Eq. (9). We obtan Lemma wth ts proof provded n Appendx B. Lemma. If the covarance matrces of all perturbaton random vectors are replaced by ther average,.e., a pooled perturbaton covarance matrx as follows X C = X C = =X CL = X, (9) then S Δ b and SΔ w can be rewrtten as: S Δ b = L X, SΔ w = L X. () ote that when class covarance matrces of data do not dffer too much, utlzng pooled covarance matrx to replace ndvdual covarance matrx has been wdely used and expermentally suggested to attenuate the ll-posed estmaton n many exstng algorthms [,3,4,7 3]. To develop a more smplfed model n the entre prncpal component space, we perform prncpal component analyss [3] n X wthout dscardng any nonzero prncpal component. In practce, the prncpal components can be acqured from the egenvectors of the total-class covarance matrx Ŝt(=Ŝw +Ŝb ). When the data dmenson s much larger than the total sample sze, the ran of Ŝt s at most [5,3],.e., ran(ŝt). In general, ran(ŝt) s always equal to. For convenence of analyss, we assume ran(ŝt). It also mples that no nformaton s lost for Fsher's LDA, snce all postve prncpal components are retaned [33]. Suppose gven the decorrelated data space X, the entre PCA space of dmenson n =. Based on Eq. (6) and Lemma, for any gven nput sample x = (x,...,x n ) T X, ts correspondng perturbaton random vector s n x = (ξ x,...,ξn x )T R n, where n x (, X). Snce X s decorrelated, the coeffcents x,...,x n are approxmately uncorrelated. ote that the perturbaton varables ξ x,...,ξn x are apparently only correlated to ther correspondng uncorrelated coeffcents x,...,x n, respectvely. Therefore t s able to model X by assumng these random varables ξ x,...,ξn x are uncorrelated each other. 5 Based on ths prncple, X can be modeled by X = K, K = dag(σ,...,σ n ), () where σ s the varance of ξ x. Furthermore, f the average varance σ = n= n σ s used to replace each ndvdual varance σ, 4 Dscussng varants of Fsher's LDA under unequal class covarance matrces s not the scope of ths paper. It s another research topc [39]. 5 It mght be n theory a suboptmal strategy. However ths assumpton s practcally useful and reasonable to allevate the ll-posed estmaton problem for hghdmensonal data by reducng the number of estmated parameters. In Appendx-D, we show ts practcal ratonalty by demonstratng an expermental verfcaton for ths assumpton on face data sets used n the experment. =,...,n, a specal model s then acqured by X = σ I, σ, I s the n n dentty matrx. () From the statstcal pont of vew, the above smplfed models could be nterpreted as regularzed estmatons [5] of X on the perturbaton random vectors. It s nown that when the dmensonalty of data s hgh, the estmaton would become ll-posed (poorly posed) f the number of parameters to be estmated s larger than (comparable to) the number of samples [3,3]. Moreover, estmaton of X relates to the nformaton of some expectaton value, whch, however, s hard to be specfed n practce. Hence, regularzed estmaton of X would be preferred to allevate the ll-posed problem and obtan a stable estmate n applcatons. To ths end, estmaton based on Eq. () may be more stable than estmatng K, snce Eq. () can apparently reduce the number of estmated parameters. Ths would be demonstrated and ustfed by synthetc data n the experment. Fnally, ths smplfed perturbaton model s stll n lne wth the perturbaton LDA model, snce the perturbaton matrces X C as well as ther average X need not to be the accurate expected class covarance matrces but only need to follow the perturbaton model gven below Eq. (5). 3.. Estmatng parameters An mportant ssue left s to estmate the varance parameters σ,...,σ n and σ. The dea s straghtforward that the parameters are learned from the generated observaton values of perturbaton random vectors usng maxmum lelhood. However, an ndrect way s desrable, snce t s mpossble to fnd the realzatons of perturbaton random vectors drectly. Hence, our dea turns to fnd some sums of perturbaton random vectors based on the perturbaton model and then generate ther realzatons for estmaton Inferrng the sum of perturbaton random vectors Suppose, the number of tranng samples for class C, s larger than. Defne the average of observed samples n class C by excludng x as û = =, x, =,...,. (3) It s actually feasble to treat û as another emprcal mean of class C. Then, another random mean of class C s able to be formulated by: ũ = =, x = û + =, n. (4) Comparng wth ũ the random mean of class C n terms of Eq. (8), based on the perturbaton model, we now ũ and ũ can both stochastcally approxmate to E x C [x ] by the followng specfc estmates, respectvely: ˆũ = ˆ x = E x C [x ], (5) = ˆũ = =, ˆ x = E x C [x ], (6) where ˆ x = x + ˆn, ˆn s an estmate of n such that x + ˆn = E x C [x ] based on the perturbaton model. Hence, we can have the

6 W.-S. Zheng et al. / Pattern Recognton 4 (9) x ^ ξ α E x' C [x'] ^ ξ x 3... Inferrng estmates of σ,..., σ n and σ The estmates of σ,...,σ n and σ are gven below based on Eq. (3) and the generated {ˆn } =,...,L. Frst we denote =,..., û Δ = û û = (ûδ (),...,ûδ (n))t. (33) Then we defne ˆσ (, ) satsfyng ( ) ˆσ (, ) = (ûδ ()). (34) relaton below: ˆũ = Orgn Fg.. Geometrc nterpretaton: α = x x = ˆn ˆn. ˆũ. (7) A geometrc nterpretaton of Eq. (7) can be provded by Fg..ote that ˆũ = ˆũ = ˆũ,. It therefore yelds x x = ˆn ˆn. Accordng to Eq. (7), ths s obvously true because ˆ x = x + ˆn = E x C [x ], =,...,. ow return bac to the methodology. Based on Eq. (7) we then have ( ) =, ˆn ˆn = û û. (8) Defne a new random vector as: n = n ( ) n. (9) =, Based on Lemma, we now that the pooled perturbaton covarance matrx to be estmated for all {n } s X. It s therefore easy to verfy the followng result: n ( ), ( ) X. (3) Actually n s ust the sum of perturbaton random vectors we am to fnd. Moreover, Eq. (8) could provde an estmate of n by: ˆn = û û. (3) It therefore avods the dffculty n fndng the observaton values ˆn drectly. Moreover t s nown that {ˆn } =,..., follow the same dstrbuton wthn class C,.e., (, X), so t s feasble to ( ) generate observaton values {ˆn, ˆn,...,ˆn } from ths dstrbuton. In fact, the emprcal mean of the observaton values concdes wth ther expectaton wth respect to the dstrbuton because of the followng equalty: ˆn = (û û ) =. (3) = = In the uncorrelated space, X s modeled by X = K = dag(σ,...,σ n ) for approxmaton, so σ,...,σ n are estmated as ˆσ,..., ˆσ n by usng maxmum lelhood as follows: ˆσ = = = ˆσ (, ), =,...,n. (35) As suggested by Eq. (), an average varance of σ,...,σ n s used, so the estmate ˆσ of σ s obtaned below: ˆσ = n n ˆσ. (36) = Extensve experments n Secton 4 wll ustfy ths estmaton. 4. Expermental results The proposed P-LDA algorthm wll be evaluated by both synthetc data and face mage data. Face mages are typcal bometrc data. Always, the number of avalable face tranng samples for each class s very small whle the data dmensonalty s very hgh. Ths secton s dvded nto three parts. The frst and second parts report the experment results on synthetc data and face data, respectvely. In the thrd part, we verfy our parameter estmaton strategy on hgh-dmensonal face mage data. Through the experments, two popular classfers, namely nearest class mean classfer (CMC) and nearest neghbor classfer (C) are selected to evaluate the algorthms. These two nds of classfers have been wdely used for Fsher's LDA n exstng publcatons. All programs are mplemented usng Matlab and run on PC wth Intel Pentum (R) D CPU 3.4 GHz processor. 4.. Synthetc data Ths secton s to ustfy the performances of the proposed P-LDA under Theorems and, and show the effects of Eqs. () and () n modelng P-LDA. Three types of synthetc data followng sngle Gaussan and mxture of Gaussan dstrbutons n each class, respectvely are generated n a three-dmensonal space. As shown n Tables and, for sngle Gaussan dstrbuton, we consder two cases, n whch the covarance matrces are () dentty covarance matrces multpled by a constant.5 and () equal dagonal covarance matrces, respectvely. For each class, samples are generated. For mxture of Gaussan dstrbuton, each class conssts of three GC wth Table Overvew of the synthetc data (sngle Gaussan dstrbuton) Class Id Mean Covarance matrx I Covarance matrx II Class (.3,.5,.) T.5.9 Class (.,.,.5) T.5.7 Class 3 (.9,.7,.) T.5.38

6) T (.3,.5,.) T.98 Class (,.5, ) T (.,.,.5) T (,.9,) T.6593 Class 3 (.9,.7,.) T (.5,.6,.6) T (,.5,.) T.557 4 z z.5 z - - 3 Y X 4.5 Y X 4-4 4 Y X Fg.

7 77 W.-S. Zheng et al. / Pattern Recognton 4 (9) Table Overvew of the synthetc data (Gaussan mxture dstrbuton) Class Id Mean of frst GC Mean of second GC Mean of thrd GC Covarance matrx Class (,.5, ) T (.,,.6) T (.3,.5,.) T.98 Class (,.5, ) T (.,.,.5) T (,.9,) T.6593 Class 3 (.9,.7,.) T (.5,.6,.6) T (,.5,.) T z z.5 z Y X 4.5 Y X Y X Fg.. Illustraton of synthetc data: (a) s wth equal dentty covarance matrces multpled by.5, (b) s wth equal dagonal covarance matrces and (c) s wth Gaussan mxture dstrbuton. Table 3 Average accuracy results (equal dentty covarance matrces) Method Classfer: CMC Classfer: C p = (%) p = 5(%) p = (%) p = (%) p = 5(%) p = (%) P-LDA, Eq. () P-LDA, Eq. () Classcal Fsher's LDA Table 4 Average accuracy results (equal dagonal covarance matrces) Method Classfer: CMC Classfer: C p = (%) p = 5(%) p = (%) p = (%) p = 5(%) p = (%) P-LDA, Eq. () P-LDA, Eq. () Classcal Fsher's LDA Table 5 Average accuracy results (Gaussan mxture dstrbuton) Method Classfer: CMC Classfer: C p = 6()(%) p = 9(3)(%) p = 8 (6) (%) p = 6 () (%) p = 6()(%) p = 9(3)(%) p = 8 (6) (%) p = 6 () (%) P-LDA (GMM), Eq. () P-LDA (GMM), Eq. () Classcal Fsher's LDA (GMM) equal covarance matrces. For each GC, there are 4 samples randomly generated and there are samples for each class. Informaton about the synthetc data s tabulated n Tables and, and the data dstrbutons are llustrated n Fg.. In Tables 3 5, the accuraces wth respect to dfferent numbers of tranng samples for each class are shown, where p ndcates the number of tranng samples for each class. In the mxture of Gaussan dstrbuton case, the braceted number s the number of tranng samples from one GC of each class (e.g. p = 9 (3) means every three samples out of nne tranng samples of each class are from one of ts GCs). For each synthetc data set, we repeat the experments ten tmes and the average accuraces are obtaned. Snce fndng GC s not our focus, we assume that those GCs are nown for mplementaton of P-LDA based on Theorem. In addton, P-LDA (GMM), Eq. () means P-LDA s mplemented under Gaussan mxture model (GMM) based on Theorem wth parameter estmated by Eq. (); LDA (GMM) means classcal Fsher's LDA s mplemented usng a smlar scheme to Eq. (8) wthout the perturbaton factors. ote that no sngular problem n Fsher's LDA happens n the experment on synthetc data. In the sngle Gaussan dstrbuton case, we fnd that P-LDA usng Eq. () outperforms P-LDA usng Eq. () and classcal Fsher's LDA, especally when only two samples for each class are used for tranng. When the number of tranng samples for each class ncreases,

W.-S. Zheng et al. / Pattern Recognton 4 (9) 764 -- 779 77 Fg. 3. Some mages from the subset of FERET. Fg. 4. Some mages of one subect from the subset of CMU PIE. Fg. 5.

, theoretcal analyss would confrm ths scenaro. Smlar results are obtaned n the mxture of Gaussan case. These results show that when the number of tranng samples s small, P-LDA usng Eq.

8 W.-S. Zheng et al. / Pattern Recognton 4 (9) Fg. 3. Some mages from the subset of FERET. Fg. 4. Some mages of one subect from the subset of CMU PIE. Fg. 5. Images of one subect from the subset of AR. P-LDA wll converge to classcal Fsher's LDA, as the class means wll be more accurately estmated when more samples are avalable. In Secton 5., theoretcal analyss would confrm ths scenaro. Smlar results are obtaned n the mxture of Gaussan case. These results show that when the number of tranng samples s small, P-LDA usng Eq. () can gve a more stable and better estmate of the parameter and therefore provde better results. 4.. Face mage data Fsher's LDA based algorthms are popularly used for dmenson reducton of hgh-dmensonal data, especally the face mages n bometrc learnng. In ths secton, the proposed method s appled to face recognton. Snce face mages are of hgh dmensonalty and only lmted samples are avalable for each person, we mplement P- LDA based on Theorem and Eq. () wth ts parameter estmated by Eq. (36). Three popular face databases, namely FERET [34] database, CMU PIE [35] database and AR database [3], are selected for evaluaton. For FERET, a subset conssts of 55 persons wth four faces for each ndvdual s establshed. All mages are extracted from four dfferent sets, namely Fa, Fb, Fc and the duplcate. Face mages n ths FERET subset are undergong llumnaton varaton, age varaton and some slght expresson varaton. For CMU PIE, a subset s establshed by selectng face mages under all llumnaton condtons wth flash n door [35] from the frontal pose, /4 left/rght profle and below/above n frontal vew. There are totally 74 mages and 5 face mages for each person n ths subset. For AR database, a subset s establshed by selectng 9 persons, where there are eght mages for each person. Face mages n ths subset are undergong notable expresson varatons. All face mages are algned accordng to ther coordnates of the eyes and face centers, respectvely. Each mage s lnearly stretched to the full range of [,] and ts sze s smply normalzed to 4 5. Some mages are llustrated n Fgs In order to evaluate the proposed model, P-LDA s compared wth some Fsher's LDA-based methods ncludng Fsherface [5], nullspace LDA (-LDA) [8], Drect LDA [4] and regularzed LDA wth CV Table 6 Average recognton accuracy on subset of FERET (p = 3) Method Classfer: CMC (%) Classfer: C (%) P-LDA R-LDA (CV) [3] LDA [8] Drect LDA [4] Fsherface [5] Table 7 Average recognton accuracy on subset of CMU PIE Method Classfer: CMC Classfer: C p = 5(%) p= (%) p = 5(%) p = (%) P-LDA R-LDA (CV) [3] LDA [8] Drect LDA [4] Fsherface [5] Table 8 Average recognton accuracy on subset of AR Method Classfer: CMC Classfer: C p = 3(%) p = 6 (%) p = 3(%) p = 6(%) P-LDA R-LDA (CV) [3] LDA [8] Drect LDA [4] Fsherface [5] CR-LDA (CV) [3], whch are popular used for solvng the small sample sze problem n Fsher's LDA for face recognton. On each data set, the experments are repeated tmes. For each tme, p mages for each person are randomly selected for tranng and the rest are for testng. In the tables, the value of p s ndcated. Fnally, the average recognton accuraces are obtaned. The results are tabulated n Tables 6 8. We see that P-LDA acheves at least 6% and 3% mprovements over Drect LDA and

9 77 W.-S. Zheng et al. / Pattern Recognton 4 (9) Table 9 Expense of R-LDA (CV) Method FERET, p = 3 CMU PIE, p = 5 CMU PIE, p = AR, p = 3 AR, p = 6 Tme/run (C/CMC) 9 hours hours 7.5 hours. hours hours Table Average recognton accuracy of P-LDA on FERET data set: P-LDA wth manually selected optmal parameter vs. P-LDA wth parameter estmaton Method Classfer: CMC Classfer: C Ran (%) Ran (%) Ran 3 (%) Ran (%) Ran (%) Ran 3 (%) P-LDA wth manually selected optmal parameter P-LDA wth parameter estmaton Table Average recognton accuracy of P-LDA on CMU PIE data set: P-LDA wth manually selected optmal parameter vs. P-LDA wth parameter estmaton Method Classfer: CMC Classfer: C Ran (%) Ran (%) Ran 3 (%) Ran (%) Ran (%) Ran 3 (%) P-LDA wth manually selected optmal parameter P-LDA wth parameter estmaton LDA, respectvely, on FERET database, and acheves more than 4% mprovement over Fsherface, Drect LDA and -LDA on CMU PIE database. On AR subset, P-LDA also gets sgnfcant mprovements over Fsherface and Drect LDA and gets more than % mprovement over -LDA. ote that no matter usng C or CMC, the results of -LDA are the same, because -LDA wll map all tranng samples of the same class nto the correspondng class emprcal mean n the reduce space [7]. In addton, a related method R-LDA wth CV parameter 6 s also conducted for comparson. On FERET, P-LDA gets more than one percent mprovement when usng C and gets about.6% mprovement when usng CMC. On CMU, when p = 5, P-LDA gets.4% mprovement over R-LDA usng C and.5% mprovement usng CMC; when p =, P-LDA and R-LDA gets almost the same performances. On AR subset, the performances of P-LDA and R-LDA are also smlar. Though R-LDA gets smlar performance to P-LDA n some cases, however, as reported n Table 9, R-LDA s extremely computatonally expensve due to the CV process. In our experments, P-LDA can fnsh n much less than one mnute for each run, whle R-LDA usng CV technque taes more than one hour. More comparson between P-LDA and R-LDA could be found n Secton 5.. It wll be analyzed later that R-LDA can be seen as a sem-perturbaton LDA, whch gves a novel understandng to R-LDA. It would also be explored that the proposed perturbaton model actually can suggest an effectve and effcent way for the regularzed parameter estmaton n R-LDA. Therefore, P-LDA s much more effcent and stll performs better. Although Fsherface, Drect LDA, -LDA and R-LDA are also proposed for extracton of dscrmnant features n the undersampled case, they manly address the sngularty problem of the wthn-class matrx, whle P-LDA addresses the perturbaton problem n Fsher crteron due to the dfference between a class emprcal mean and ts expectaton value. otng that P-LDA usng model () and () can also solve the sngularty problem, ths suggests allevatng the 6 On FERET, three-fold CV s performed; On CMU, fve-fold CV s performed when p = 5 and -fold CV s performed when p = ; On AR, three-fold CV s performed when p = 3 and sx-fold CV s performed when p = 6. The canddates of the regularzaton parameter λ are sampled from.5 to wth step.5. In the experment, the three-fold CV s repeated ten tmes on FERET. On CMU, the fve-fold and -fold CV are repeated sx and three tmes, respectvely; on AR, the three-fold and sx-fold CV are repeated and 5 tmes, respectvely. So, each CV parameter s determned va ts correspondng 3 round CV classfcaton. perturbaton problem s useful to further enhance the Fsher crteron. In addton, the above results as well as the results on synthetc data sets also ndcate that when the number of tranng samples s large, the dfferences between P-LDA and the compared LDA based algorthms become small. Ths s true accordng to the perturbaton analyss gven n ths paper, snce the estmates of the class means wll be more accurate when tranng samples for each class become more suffcent. otng also that the dfference between P-LDA and R-LDAssmallwhenp s large on CMU and AR, t mples the mpact of the perturbaton model n estmaton of the between-class covarance nformaton wll become mnor as the number of tranng samples ncreases. In Secton 5., we would gve more theoretcal analyss Parameter verfcaton In the last two subsectons, we show that P-LDA usng Eq. () gves good results on both synthetc and face mage data, partcularly when the number of tranng samples s small. In ths secton, we wll have extensve statstcs of the performances of P-LDA on FERET and CMU PIE f the parameter σ s set to be other values. We compare the proposed P-LDA wth parameter estmaton wth the best scenaro selected manually. The detaled procedure of the experments s lsted as follows. Step (): Pror values of σ are extensvely sampled. We let σ = η η,< η <, so that σ (,+ ). Then 999 ponts are sampled for η between.5 and.9995 wth nterval.5. Fnally, 999 sampled values of σ are obtaned. Step (): Evaluate the performance of P-LDA wth respect to each sampled value of σ. We call each P-LDA wth respect to a sampled value of σ a model. Step (3): We compare the P-LDA model wth parameter σ estmated by the methodology suggested n Secton 3. aganst the best one among all models of P-LDA got at step (). The average recognton rate of each model of P-LDA s obtaned by usng the same procedure run on FERET and CMU PIE databases. We consder the case when p, the number of tranng samples for each class, s equal to three on FERET and equal to fve on CMU. For clear descrpton, the P-LDA model wth parameter estmated usng the methodology suggested n Secton 3. s called P-LDA wth parameter estmaton, whereas we call the P-LDA model wth

10 W.-S. Zheng et al. / Pattern Recognton 4 (9) Accuracy 97% 95% 94% 9% 9% 89% 88% 86% 97% 95% 94% 9% P-LDA wth manually selected 9% optmal parameter 89% P-LDA wth parameter estmaton 88% 86% Ran Accuracy P-LDA wth manually selected optmal parameter P-LDA wth parameter estmaton Ran Fg. 6. P-LDA wth manually selected optmal parameter vs. P-LDA wth parameter estmaton on FERET. Accuracy 96% 95% 93% 9% 9% 89% 87% 86% 84% 83% 8% 8% 78% 95% 94% 9% 9% 89% 88% P-LDA wth manually selected 86% optmal parameter 85% P-LDA wth parameter estmaton 83% 8% 8% Ran Accuracy P-LDA wth manually selected optmal parameter P-LDA wth parameter estmaton Ran Fg. 7. P-LDA wth manually selected optmal parameter vs. P-LDA wth parameter estmaton on CMU. Senstvty, FERET, CMC Senstvty, FERET, CMC Average Recognton Rate (%) Varance Average Recognton Rate (%) Varance Fg. 8. Classfer: CMC. (a) The performance of P-LDA as a functon of σ (x-axs) on FERET, where the horzontal axs s scaled logarthmcally and (b) the enlarged part of (a) near the pea of the curve where σ s small. Senstvty, FERET,C Senstvty, FERET, C Average Recognton Rate (%) Varance Average Recognton Rate (%) Varance Fg. 9. Classfer: C. (a) The performance of P-LDA as a functon of σ (x-axs) on FERET, where the horzontal axs s scaled logarthmcally; (b) the enlarged part of (a) near the pea of the curve where σ s small.

11 774 W.-S. Zheng et al. / Pattern Recognton 4 (9) Average Recognton Rate (%) Senstvty, CMU, CMC Average Recognton Rate (%) Senstvty, CMU, CMC Varance Varance Fg.. Classfer: CMC. (a) The performance of P-LDA as a functon of σ (x-axs) on CMU PIE, where the horzontal axs s scaled logarthmcally and (b) the enlarged part of (a) near the pea of the curve where σ s small. Senstvty, CMU, C Senstvty, CMU, C Average Recognton Rate (%) Varance Average Recognton Rate (%) Varance Fg.. Classfer: C. (a) The performance of P-LDA as a functon of σ (x-axs) on CMU PIE, where the horzontal axs s scaled logarthmcally; (b) the enlarged part of (a) near the pea of the curve where σ s small. respect to the best σ selected from the 999 sampled values P-LDA wth manually selected optmal parameter. Comparson results of the ran 3 accuraces are reported n Tables and. Fgs. 6 and 7 show the ranng accuraces of these two models. It shows that the dfference of ran accuraces between two models s less than.% n general. To evaluate the senstvty of P-LDA on σ, the performance of P-LDA as a functon of σ s shown from Fg. 8 to Fg. 9 usng CMC and C classfers, respectvely. The overall senstvty of P-LDA on σ for FERET data set s descrbed n Fg. 8(a), where the horzontal axs s on a logarthmc scale. Fg. 8(b) shows the enlarged part of Fg. 8(a) near the pea of the curve where σ s small. Smlarly, Fgs. and show the result on CMU PIE. They show t may be hard to obtan an optmal estmate of σ, but nterestngly t s shown n Tables and and Fgs. 6 and 7 that the suggested methodology n Secton 3. wors well. It s apparent that selectng the best parameter manually usng an extensve search would be tme consumng, whle P-LDA usng the proposed methodology for parameter estmaton costs much less than one mnute. So the suggested methodology s computatonally effcent. 5. Dscusson As shown n the experment, the number of tranng samples for each class s really an mpact of the performance of P-LDA. In ths secton, we explore some theoretcal propertes of P-LDA and the convergence of P-LDA wll be shown. We also dscuss P-LDA wth some related methods. 5.. Admssble condton of P-LDA Suppose L s fxed. Snce the entres of all perturbaton covarance matrces are bounded, 7 t s easy to obtan S Δ b =O( )andsδ w =O( ),.e., the perturbaton factor S Δ b O, SΔ w O when, where O s the zero matrx. Here, for any matrx A = A(β) of whch each nonzero entry depends on β,wesaya = O(β) f the degree 8 of A O s comparable to the degree of β. However, f L s a varant,.e., the ncrease of the sample sze may be partly due to the ncrease of the amount of classes, then S Δ b O( ) and SΔ w O( ). Suppose any covarance matrx X C s lower (upper) bounded by X lower f and only f X lower (, ) X C (, )(X C (, ) X upper (, )) for any (,). Then the followng lemma gves an essental vew, and ts proof s gven n Appendx C. Lemma. If all nonzero perturbaton covarance matrces X C, =,...,L, are lower bounded by X lower and upper bounded by 7 We say a matrx s bounded f and only f all entres of ths matrx are bounded. 8 ThedegreeofA = A(β) O dependng on β s defned to be the smallest degree for A(,) dependng on β, where A(,) s any nonzero entry of A. For example, A = [β β ], then the degree of A O s and A = O(β).

12 W.-S. Zheng et al. / Pattern Recognton 4 (9) Table Average recognton accuracy of R-LDA on FERET data set: R-LDA wth manually selected optmal parameter vs. R-LDA usng perturbaton model (p = 3) Method Classfer: CMC Classfer: C Ran (%) Ran (%) Ran 3 (%) Ran (%) Ran (%) Ran 3 (%) R-LDA wth manually selected optmal parameter R-LDA (CV) R-LDA usng perturbaton model Table 3 Average recognton accuracy of R-LDA on CMU PIE data set: R-LDA wth manually selected optmal parameter vs. R-LDA usng perturbaton model (p = 5) Method Classfer: CMC Classfer: C Ran (%) Ran (%) Ran 3 (%) Ran (%) Ran (%) Ran 3 (%) R-LDA wth manually selected optmal parameter R-LDA (CV) R-LDA usng perturbaton model X upper, where X lower and X upper are ndependent of L and, then t s true that S Δ b = O( L ) and SΔ w = O( L ). The condton of Lemma s vald n practce, because the data space s always compact and moreover t s always a Eucldean space of fnte dmenson. In partcular, from Eq. (), t could be found that the perturbaton matrces depend on the average sample sze for each class. Based on Theorem, we fnally have the followng proposton. Proposton. (Admssble condton of P-LDA) P-LDA depends on the average number of samples for each class. That s S Δ b = O( L ) and S Δ w = O( L ),.e., SΔ b O, SΔ w O when L. It s ntutve that some estmated class means are unstable when the average sample sze for each class s small. 9 Ths also shows what P-LDA targets for s dfferent from the sngularty problem n Fsher's LDA, whch wll be solved f the total sample sze s large enough. Moreover the experments on synthetc data n Secton 4. could provde the support to Proposton, as the dfference between P-LDA and classcal Fsher's LDA become smaller when the average sample sze for each class becomes larger. 5.. Dscusson wth related approaches 5... P-LDA vs. R-LDA Regularzed LDA (R-LDA) s always modeled by the followng crteron: W opt = arg max W trace(w TŜ b W) trace(w T, λ >. (37) (Ŝw + λi)w) Sometmes, a postve dagonal matrx s used to replace λi n the above equalty. Generally, the formulaton of P-LDA n Secton s dfferent from the form of R-LDA. Although the formulaton of R-LDA loos smlar 9 Wth sutable tranng samples, the class means may be well estmated, but selecton of tranng samples s beyond the scope of ths paper. to the smplfed model of P-LDA n Secton 3, the motvaton and obectve are totally dfferent. Detals are dscussed as follows.. P-LDA s proposed by learnng the dfference between a class emprcal mean and ts correspondng expectaton value as well as ts mpact to Fsher crteron, whereas R-LDA s orgnally proposed for the sngularty problem [9,,3] because Ŝw + λi s postve wth λ >.. In P-LDA, the effects of S Δ b and SΔ w are nown based on the perturbaton analyss n theory. In contrast, R-LDA stll does not clearly tell how λi has effect on Ŝw n a pattern recognton sense. Although Zhang et al. [] presented a connecton between the regularzaton networ algorthms and R-LDA from a least square vew, t stll lacs nterpretaton how regularzaton can has effect on wthn-class and between-class covarance matrces smultaneously and also lacs parameter estmaton. 3. P-LDA tells the convergence of perturbaton factors by Proposton. However, R-LDA does not tell t n theory. The sngularty problem R-LDA addresses s n nature an mplementaton problem and t would be solved when the total sample sze s suffcently large, whle t does not mply the average sample sze for each class s also suffcently large n ths stuaton. 4. P-LDA s developed when data of each class follow ether sngle Gaussan dstrbuton or Gaussan mxture dstrbuton, but R-LDA has not consdered the effect of data dstrbuton. 5. In P-LDA, scheme for parameter estmaton s an ntrnsc methodology derved from the perturbaton model tself. For R-LDA, a separated algorthm s requred, such as the CV method, whch s so far popular. However, CV serously les on a dscrete set of canddate parameters. In general, CV s always tme consumng. Interestngly, f the proposed perturbaton model s mposed on R- LDA,.e., R-LDA s treated as a sem-perturbaton Fsher's LDA, where only wthn-class perturbaton S Δ w s consdered and the factor SΔ b s gnored, then the methodology n Secton 3 may provde an nterpretaton how the term λi has ts effect n the entre PCA space. Ths novel vew to R-LDA can gve the advantage n applyng the proposed perturbaton model for an effcent and effectve estmaton of the regularzed parameter λ n R-LDA. To ustfy ths, smlar comparsons on FERET and CMU subsets between R-LDA wth manually selected optmal parameter and R-LDA usng perturbaton model are performed n Tables and 3, where R-LDA wth manually selected optmal parameter s mplemented smlarly to P-LDA wth manually selected optmal parameter as demonstrated n Secton 4.3. For reference, the results of R-LDA (CV) are also shown. We fnd that R-LDA usng perturbaton model extremely approxmates to R-LDA wth manually selected optmal parameter and acheves

Unified Subspace Analysis for Face Recognition

Unified Subspace Analysis for Face Recognition Unfed Subspace Analyss for Face Recognton Xaogang Wang and Xaoou Tang Department of Informaton Engneerng The Chnese Unversty of Hong Kong Shatn, Hong Kong {xgwang, xtang}@e.cuhk.edu.hk Abstract PCA, LDA