Pattern Recognition 42 (2009) Contents lists available at ScienceDirect. Pattern Recognition. journal homepage:

Size: px
Start display at page:

Download "Pattern Recognition 42 (2009) Contents lists available at ScienceDirect. Pattern Recognition. journal homepage:"

Transcription

1 Pattern Recognton 4 (9) Contents lsts avalable at ScenceDrect Pattern Recognton ournal homepage: Perturbaton LDA: Learnng the dfference between the class emprcal mean and ts expectaton We-Sh Zheng a,c, J.H. La b,c,, Pong C. Yuen d, Stan Z. L e a School of Mathematcs and Computatonal Scence, Sun Yat-sen Unversty, Guangzhou, PR Chna b Department of Electroncs and Communcaton Engneerng, School of Informaton Scence and Technology, Sun Yat-sen Unversty, Guangzhou, PR Chna c Guangdong Provnce Key Laboratory of Informaton Securty, PR Chna d Department of Computer Scence, Hong Kong Baptst Unversty, Hong Kong e Center for Bometrcs and Securty Research and atonal Laboratory of Pattern Recognton, Insttute of Automaton, Chnese Academy of Scences, Beng, PR Chna A R T I C L E I F O A B S T R A C T Artcle hstory: Receved 4 September 6 Receved n revsed form 9 July 8 Accepted September 8 Keywords: Fsher crteron Perturbaton analyss Face recognton Fsher's lnear dscrmnant analyss (LDA) s popular for dmenson reducton and extracton of dscrmnant features n many pattern recognton applcatons, especally bometrc learnng. In dervng the Fsher's LDA formulaton, there s an assumpton that the class emprcal mean s equal to ts expectaton. However, ths assumpton may not be vald n practce. In ths paper, from the perturbaton perspectve, we develop a new algorthm, called perturbaton LDA (P-LDA), n whch perturbaton random vectors are ntroduced to learn the effect of the dfference between the class emprcal mean and ts expectaton n Fsher crteron. Ths perturbaton learnng n Fsher crteron would yeld new forms of wthn-class and between-class covarance matrces ntegrated wth some perturbaton factors. Moreover, a method s proposed for estmaton of the covarance matrces of perturbaton random vectors for practcal mplementaton. The proposed P-LDA s evaluated on both synthetc data sets and real face mage data sets. Expermental results show that P-LDA outperforms the popular Fsher's LDA-based algorthms n the undersampled case. 8 Elsever Ltd. All rghts reserved.. Introducton Data n some applcatons such as bometrc learnng are of hgh dmenson, whle avalable samples for each class are always lmted. In vew of ths, dmenson reducton s always desrable, and at the same tme t s also expected that data of dfferent classes can be more easly separated n the lower-dmensonal subspace. Among the developed technques for ths purpose, Fsher's lnear dscrmnant analyss (LDA) [ 4] has been wdely and popularly used as a powerful tool for extracton of dscrmnant features. The basc prncple of Fsher's LDA s to fnd a proecton matrx such that the rato between the between-class varance and wthn-class varance s maxmzed n a lower-dmensonal feature subspace. * Correspondng author at: Department of Electroncs and Communcaton Engneerng, School of Informaton Scence and Technology, Sun Yat-sen Unversty, Guangzhou, Guangdong 575, PR Chna. Tel.: E-mal addresses: wszheng@eee.org (W.-S. Zheng), stslh@mal.sysu.edu.cn (J.H. La), pcyuen@comp.hbu.edu.h (Pong C. Yuen), szl@nlpr.a.ac.cn (Stan Z. L). LDA n ths paper refers to Fsher's LDA. It s not a classfer but a feature extractor learnng low-ran dscrmnant subspace, n whch any classfer can be used to perform classfcaton. Due to the curse of hgh dmensonalty and the lmt of tranng samples, wthn-class scatter matrx S w s always sngular, so that classcal Fsher's LDA wll fal. Ths nd of sngularty problem s always called the small sample sze problem [5,6] n Fsher's LDA. So far, some well-nown varants of Fsher's LDA have been developed to overcome ths problem. Among them, Fsherface (PCA+LDA) [5], nullspace LDA (-LDA) [6 8] and regularzed LDA (R-LDA) [9 3] are three representatve algorthms. In PCA+LDA, Fsher's LDA s performed n a prncpal component subspace, n whch wthn-class covarance matrx wll be of full ran. In -LDA, the nullspace of wthn-class covarance matrx S w s frst extracted, and then data are proected onto that subspace and fnally a dscrmnant transform s found there for maxmzaton of the varance among between-class data. In R-LDA, a regularzed term, such as λ I where λ >, s added to S w. Some other approaches, such as Drect LDA [4], LDA/QR [5] and some constraned LDA [6,7], are also developed. Recently, some efforts are made for development of two-dmensonal LDA technques (D-LDA) [8 ], whch perform drectly on matrx-form data. A recent study [] conducts comprehensve theoretcal and expermental comparsons between the tradtonal Fsher's LDA technques and some representatve D-LDA algorthms n the undersampled case. It s expermentally shown 3-33/$ - see front matter 8 Elsever Ltd. All rghts reserved. do:.6/.patcog.8.9.

2 W.-S. Zheng et al. / Pattern Recognton 4 (9) that some two-dmensonal LDA may perform better than Fsherface and some other tradtonal Fsher's LDA approaches n some cases, but R-LDA always performs better. However, estmaton of the regularzed parameter n R-LDA s hard. Though cross-valdaton (CV) s popularly used, t s tme consumng. Moreover, t s stll hard to fully nterpret the mpact of ths regularzed term. From the geometrcal vew, Fsher's LDA maes dfferent class means scatter and data of the same class close to ther correspondng class means. However, snce the number of samples for each class s always lmted n some applcatons such as bometrc learnng, the estmates of class means are not accurate, and ths would degrade the power of Fsher crteron. To specfy ths problem, we frst re-vst the dervaton of Fsher's LDA. Consder the classfcaton problem of L classes C,...,C L. Suppose the data space X ( R n ) s a compact vector space and {(x, y ),...,(x, y ),...,(x L, yl ),...,(xl, y L )} s L L a set of fnte samples. All data x,...,x,...,x L,...,xL are d, and L x ( X) denotes the th sample of class C wth class label y (.e., y = C )and s the number of samples of class C.Theemprcal mean of each class s then gven by û = = x and the total sample mean s gven by û = L = û, where = L = s the number of total tranng samples. The goal of LDA under Fsher crteron s to fnd an optmal proecton matrx by optmzng the followng Eq. (): Ŵ opt = arg max W trace(wt Ŝ b W)/trace(W T Ŝ w W), () where Ŝb and Ŝw are between-class covarance (scatter) matrx and wthn-class covarance (scatter) matrx, respectvely, defned as follows: Ŝ b = Ŝ w = = = (û û)(û û)t, () Ŝ, Ŝ = (x û )(x û ) T. (3) = It has been proved n [] that Eq. () could be wrtten equvalently as follows: Ŝ b = = = (û û )(û û )T. (4) For formulaton of Fsher's LDA, two basc assumptons are always used. Frst, the class dstrbuton s assumed to be Gaussan. Second, the class emprcal mean s n practce used to approxmate ts expectaton. Although Fsher's LDA has been gettng ts attracton for more than thrty years, as far as we now, there s lttle research wor addressng the second assumpton and nvestgatng the effect of the dfference between the class emprcal mean and ts expectaton value n Fsher crteron. As we now, û s the estmate of E x C [x ] based on the maxmum lelhood crteron, where E x C [x ] s the expectaton of class C. The substtuton of expectaton E x C [x ]wth ts emprcal mean û s based on the assumpton that the sample sze for estmaton s large enough to reflect the data dstrbuton of each class. Unfortunately, ths assumpton s not always true n some applcatons, especally the bometrc learnng. Hence the mpact of the dfference between those two terms should not be gnored. In vew of ths, ths paper wll study the effect of the dfference between the class emprcal mean and ts expectaton n Fsher crteron. We note that such dfference s almost mpossble to be specfed, snce E x C [x ] s usually hard (f not mpossble) to be determned. Hence, from the perturbaton perspectve, we ntroduce the perturbaton random vectors to stochastcally descrbe such dfference. Based on the proposed perturbaton model, we then analyze how perturbaton random vectors tae effect n Fsher crteron. Fnally, perturbaton learnng wll yeld new forms of wthn-class and between-class covarance matrces by ntegratng some perturbaton factors, and therefore a new Fsher's LDA formulaton based on these two new estmated covarance matrces s called perturbaton LDA (P-LDA). In addton, a sem-perturbaton LDA, whch gves a novel vew to R-LDA, wll be fnally dscussed. Although there are some related wor on covarance matrx estmaton for desgnng classfer such as RDA [3] and ts smlar wor [4], and EDDA [5], however, the obectve of P-LDA s dfferent from thers. RDA and EDDA are not based on Fsher crteron and they are classfers, whle P-LDA s a feature extractor and does not predct class label of any data as output. P-LDA would exact a subspace for dmenson reducton but RDA and EDDA do not. Moreover, the perturbaton model used n P-LDA has not been consdered n RDA and EDDA. Hence the methodology of P-LDA s dfferent from the ones of RDA and EDDA. Ths paper focuses on Fsher crteron, whle classfer analyss s beyond our scope. To the best of our nowledge, there s no smlar wor addressng Fsher crteron usng the proposed perturbaton model. The remander of ths paper s outlned as follows. The proposed P-LDA wll be ntroduced n Secton. The mplementaton detals wll be presented n Secton 3. Then P-LDA s evaluated usng three synthetc data sets and three large human face data sets n Secton 4. Dscussons and concluson of ths paper are then gven n Sectons 5 and 6, respectvely.. P-LDA: a new formulaton The proposed method s developed based on the dea of perturbaton analyss. A theoretcal analyss s gven and a new formulaton s proposed by learnng the dfference between the class emprcal mean and ts expectaton as well as ts mpact to the estmaton of covarance matrces s Fsher crteron. In Secton., we frst consder the case when data of each class follow sngle Gaussan dstrbuton. The theory s then extended to the mxture of Gaussan dstrbuton case and reported n Secton.. The mplementaton detals of the proposed new formulaton wll be gven In Secton 3... P-LDA under sngle Gaussan dstrbuton Assume data of each class are normally dstrbuted. Gven a specfc nput (x,y), where sample x X and class label y {C,...,C L }, we frst try to study the dfference between a sample x and E x y [x ] the expectaton of class y n Fsher crteron. However, E x y [x ] s usually hard (f not mpossble) to be determned, so t may be mpossble to specfc such dfference. Therefore, our strategy s to stochastcally characterze (smulate) the dfference between x and E x y [x ] by a random vector and then model a random mean for class y to stochastcally descrbe E x y [x ]. Defne n x ( R n )asaperturbaton random vector for stochastc descrpton (smulaton) of the dfference between x and E x y [x ]. When data of each class follow normal dstrbuton, we can model n x as a random vector from the normal dstrbuton wth mean and covarance matrx X y,.e., n x (, X y ), X y R n n. (5) We call X y the perturbaton covarance matrx of n x. The above model assumes that the covarance matrces X y of n x are the same for any sample x wth the same class label y. ote that t would be natural that an deal value of X y can be the expected covarance matrx of class y,.e., E x y [(x E x y [x ])(x E x y [x ]) T ]. However, ths value

3 766 W.-S. Zheng et al. / Pattern Recognton 4 (9) s usually hard to be determned, snce E x y [x ] and the true densty functon are not avalable. Actually ths nd of estmaton needs not be our goal. ote that the perturbaton random vector n x s only used for stochastc smulaton of the dfference between the specfc sample x and ts expectaton E x y [x ]. Therefore, n our study, X y only needs to be properly estmated for performng such smulaton based on the perturbaton model specfed by the followng Eqs. (6) and (7), fnally resultng n some proper correctngs (perturbatons) on the emprcal between-class and wthn-class covarance matrces as shown later. For ths goal, a random vector s frst formulated for any sample x to stochastcally approxmate E x y [x ]below: x = x + n x. (6) The stochastc approxmaton of x to E x y [x ] means there exsts a specfc estmate ˆn x of the random vector n x wth respect to the correspondng dstrbuton such that x + ˆn x = E x y [x ]. (7) Formally we call equalty Eqs. (6) and (7) the perturbaton model. Its not hard to see such perturbaton model s always satsfed. The man problem s how to model X y properly. For ths purpose, a technque wll be suggested n the next secton. ow, for any tranng sample x, we could formulate ts correspondng perturbaton random vector n (, X C ) and the random vector x =x +n to stochastcally approxmate ts expectaton E x C [x ]. By consderng the perturbaton mpact, E x C [x ] could be stochastcally approxmated on average by: ũ = x = û + n. (8) = = ote that ũ can only stochastcally but not exactly descrbe E x C [x ], so t s called the random mean of class C n our study. After ntroducng the random mean of each class, a new form of Fsher's LDA s developed below by ntegratng the factors of the perturbaton between the class emprcal mean and ts expectaton nto the supervsed learnng process, so that new forms of the betweenclass and wthn-class covarance matrces are obtaned. Snce ũ and ũ are both random vectors, we tae the expectaton wth respect to the probablty measure on ther probablty spaces, respectvely. To have a clear presentaton, we denote some sets of random vectors as n ={n,...,n }, =,...,L, andn ={n,...,n,...,n L,...,nL }. L Snce x,...,x,...,x L,...,xL are d, t s reasonable to assume that L n,...,n,...,n L,...,nL L are also ndependent. A new wthn-class covarance matrx of class C s then formed below: S = E n (x ũ )(x ũ ) T = Ŝ + X = C (9) So a new wthn-class covarance matrx s establshed by: S w = = S = Ŝw + X C = Ŝw + S Δ w () = In ths paper the notaton salwaysaddedoverheadtothecorrespondng random vector to ndcate that t s an estmate of that random vector. As analyzed later, ˆn x does not need to be estmated drectly, but a technque wll be ntroduced to estmate the nformaton about ˆn x. where S Δ w = L = X C. ext, followng equaltes () and (4), we get = = = = (ũ ũ )(ũ ũ )T (ũ ũ)(ũ ũ)t, where ũ = L = ũ = û + L = = n. Then a new betweenclass covarance matrx s gven by: S b = E n = Ŝb + SΔ b = = (ũ ũ )(ũ ũ )T () where S Δ b = L ( ) X = 3 C + L L = 3 s=,s ( sx Cs ). The detals of the dervaton of Eq. (9) and () can be found n Appendx A. From the above analyss, a new formulaton of Fsher's LDA called perturbaton LDA (P-LDA) s gven by the followng theorem. Theorem. (P-LDA) Under the Gaussan dstrbuton of wthn-class data, perturbaton LDA (P-LDA) fnds a lnear proecton matrx W opt such that: W opt = arg max W = arg max W trace(w T S b W) trace(w T S w W) trace(w T (Ŝb + SΔ b )W) trace(w T (Ŝw + S Δ. () w )W) Here, S Δ b and SΔ w are called between-class perturbaton covarance matrx and wthn-class perturbaton covarance matrx, respectvely. Fnally, we further nterpret the effects of covarance matrces S w and S b based on Eq. (). Suppose W = (w,...,w ) n Eq. (), where w m ( R n ) s a feature vector. Then for any W and random vectors n ={n }=,...,L =,...,, we defne: f b (W, n) = = = l (wm T (ũ ũ )), (3) m= f w (W, n) = l (wm T (x ũ )). (4) = = m= otng that ũ = û + = n s the random mean of class C,so f b (W, n) s the average parwse dstance between random means of dfferent classes and f w (W, n) s the average dstance between any sample and the random mean of ts correspondng class n a lowerdmensonal space. Defne the followng model: W opt (n) = arg max f b (W, n)/f w(w, n). W Gven specfc estmates ˆn={ˆn }=,...,L, we then can get a proecton =,..., W opt (ˆn). In practce, t would be hard to fnd the proper estmate ˆn that can accurately descrbe the dfference between x and ts expectaton E x C [x ]. Rather than accurately estmatng such ˆn, we nstead consder fndng the proecton by maxmzng the rato

4 W.-S. Zheng et al. / Pattern Recognton 4 (9) between the expectaton values of f b (W, n)andf w (W, n)wthrespect to n such that the uncertanty s consdered to be over the doman of n. Thats: W opt = arg max E n [f b (W, n)]/e n [f w(w, n)] W = arg max f b (W)/f w(w) W It can be verfed that f b (W) = E n [f b (W, n)] = trace(w T Sb W) (5) f w (W) = E n [f w (W, n)] = trace(w T Sw W) (6) So, t s exactly the optmzaton model formulated n Eq. (). Ths gves an more ntutve understandng of the effects of covarance matrces S w and S b.thoughnp-ldaŝw and Ŝb are perturbated by S Δ w and SΔ b, respectvely, however n Secton 5 we wll show S w and S b wll converge to the precse wthn-class and between-class covarance matrces, respectvely. Ths wll show the ratonalty of P-LDA, snce the class emprcal mean s almost ts expectaton value when sample sze s large enough and then the perturbaton effect could be gnored. In detals, suppose there are I GCs (clusters) n class C and out of all samples are n the th GC of class C.LetC denote the th GC of class C.Ifwedenotex,s as the sth sample of C, s =,...,, then a perturbaton random vector n,s for x can be modeled, where,s n,s (, X C ), X C R n n, so that x,s = x,s + n,s s a random vector stochastcally descrbes the expectaton of subclass C,.e., u. Then P-LDA can be extended to the mxture of Gaussans case by classfyng the subclasses {C }=,...,L. Thus we get the followng =,...,I theorem 3 a straghtforward extenson of Theorem and the proof s omtted. Theorem. Under the Gaussan mxture dstrbuton of data wthn each class, the proecton matrx of perturbaton LDA (P-LDA), W opt, can be found as follows: W opt trace(w T S = arg max b W) W trace(w T S w W) trace(w T = arg max (Ŝ b + S Δ b )W) W trace(w T (Ŝ w + S Δ w )W) (8) where S b = E n [ L L I I = = = s= s (ũ ũs )(ũ ũs )T ] = Ŝ b + S Δ b, S Δ = L I ( ) X b = = 3 C + L I L I = = 3 = s=,(,s) (,) (s X C s ), Ŝ b = L L I I = = = s= S w = L I S = = = Ŝ w + S Δ w, S = E n [, s= (x,s ũ )(x,s ũ )T ], S Δ w = L I X = = C, Ŝ w = L I = = s (û ûs )(û ûs )T, s= (x,s û )(x,s û )T, û = s= x,s, ũ = û + n, ={n,,...,n, }, n ={n,,...,n,i,...,n L,,...,n L,I }. L s= n,s, =,...,I, =,...,L,.. P-LDA under mxture of Gaussan dstrbuton Ths secton extends Theorem by alterng the class dstrbuton from sngle Gaussan to mxture of Gaussans [3]. Therefore, the probablty densty functon of a sample x n class C s: p(x C ) = I = P( )(x u, ), (7) where u s the expectaton of x n the th Gaussan component (GC) (x u, )ofclassc, s ts covarance matrx and P( ) s the pror probablty of the th GC of class C. Such densty functon ndcates that any sample x n class C manly dstrbutes n one of the GC. Therefore, Theorem under sngle Gaussan dstrbuton can be extended to learnng perturbaton n each GC. To do so, the clusters wthn each class should be frst determned such that data n each cluster are approxmately normally dstrbuted. Then those clusters are labeled as subclasses, respectvely. Fnally P-LDA s used to learn the dscrmnant nformaton of all those subclasses. It s smlar to the dea of Zhu and Martnez [6] who extended classcal Fsher's LDA to the mxture of Gaussan dstrbuton case. 3. Estmaton of perturbaton covarance matrces For mplementaton of P-LDA, we need to properly estmate two perturbaton covarance matrces S Δ b and SΔ w. Parameter estmaton s challengng, snce t s always ll-posed [3,3] due to lmted sample sze and the curse of hgh dmensonalty. A more robust and tractable way to overcome ths problem s to perform some regularzed estmaton. It s ndeed the motvaton here. A method wll be suggested to mplement P-LDA wth parameter estmaton n an entre PCA subspace wthout dscardng any nonzero prncpal component. Unle the covarance matrx estmaton on sample data, we wll ntroduce an ndrect way for estmaton of the covarance matrces of perturbaton random vectors, snce the observaton values of the perturbaton random vectors are hard to be found drectly. For dervaton, parameter estmaton would focus on P-LDA under sngle Gaussan dstrbuton, and t could be easly generalzed to the Gaussan mxture dstrbuton case by Theorem. Ths secton 3 The desgns of S and S b w n the crteron are not restrcted to the presented forms. The goal here s ust to present a way how to generalze the analyss under sngle Gaussan case.

5 768 W.-S. Zheng et al. / Pattern Recognton 4 (9) s dvded nto two parts. The frst part suggests regularzed models for estmaton of the parameters, and then a method for parameter estmaton s presented n the second part. 3.. Smplfed models for regularzed estmaton In ths paper, we restrct our attenton to the data that are not much heteroscedastc,.e., class covarance matrces are approxmately equal 4 (or not dffer too much). It s also n lne wth one of the condtons when Fsher crteron s optmal [3]. Under ths condton, we consder the case when perturbaton covarance matrces of all classes are approxmately equal. Therefore, the perturbaton covarance matrces can be replaced by ther average, a pooled perturbaton covarance matrx defned n Eq. (9). We obtan Lemma wth ts proof provded n Appendx B. Lemma. If the covarance matrces of all perturbaton random vectors are replaced by ther average,.e., a pooled perturbaton covarance matrx as follows X C = X C = =X CL = X, (9) then S Δ b and SΔ w can be rewrtten as: S Δ b = L X, SΔ w = L X. () ote that when class covarance matrces of data do not dffer too much, utlzng pooled covarance matrx to replace ndvdual covarance matrx has been wdely used and expermentally suggested to attenuate the ll-posed estmaton n many exstng algorthms [,3,4,7 3]. To develop a more smplfed model n the entre prncpal component space, we perform prncpal component analyss [3] n X wthout dscardng any nonzero prncpal component. In practce, the prncpal components can be acqured from the egenvectors of the total-class covarance matrx Ŝt(=Ŝw +Ŝb ). When the data dmenson s much larger than the total sample sze, the ran of Ŝt s at most [5,3],.e., ran(ŝt). In general, ran(ŝt) s always equal to. For convenence of analyss, we assume ran(ŝt). It also mples that no nformaton s lost for Fsher's LDA, snce all postve prncpal components are retaned [33]. Suppose gven the decorrelated data space X, the entre PCA space of dmenson n =. Based on Eq. (6) and Lemma, for any gven nput sample x = (x,...,x n ) T X, ts correspondng perturbaton random vector s n x = (ξ x,...,ξn x )T R n, where n x (, X). Snce X s decorrelated, the coeffcents x,...,x n are approxmately uncorrelated. ote that the perturbaton varables ξ x,...,ξn x are apparently only correlated to ther correspondng uncorrelated coeffcents x,...,x n, respectvely. Therefore t s able to model X by assumng these random varables ξ x,...,ξn x are uncorrelated each other. 5 Based on ths prncple, X can be modeled by X = K, K = dag(σ,...,σ n ), () where σ s the varance of ξ x. Furthermore, f the average varance σ = n= n σ s used to replace each ndvdual varance σ, 4 Dscussng varants of Fsher's LDA under unequal class covarance matrces s not the scope of ths paper. It s another research topc [39]. 5 It mght be n theory a suboptmal strategy. However ths assumpton s practcally useful and reasonable to allevate the ll-posed estmaton problem for hghdmensonal data by reducng the number of estmated parameters. In Appendx-D, we show ts practcal ratonalty by demonstratng an expermental verfcaton for ths assumpton on face data sets used n the experment. =,...,n, a specal model s then acqured by X = σ I, σ, I s the n n dentty matrx. () From the statstcal pont of vew, the above smplfed models could be nterpreted as regularzed estmatons [5] of X on the perturbaton random vectors. It s nown that when the dmensonalty of data s hgh, the estmaton would become ll-posed (poorly posed) f the number of parameters to be estmated s larger than (comparable to) the number of samples [3,3]. Moreover, estmaton of X relates to the nformaton of some expectaton value, whch, however, s hard to be specfed n practce. Hence, regularzed estmaton of X would be preferred to allevate the ll-posed problem and obtan a stable estmate n applcatons. To ths end, estmaton based on Eq. () may be more stable than estmatng K, snce Eq. () can apparently reduce the number of estmated parameters. Ths would be demonstrated and ustfed by synthetc data n the experment. Fnally, ths smplfed perturbaton model s stll n lne wth the perturbaton LDA model, snce the perturbaton matrces X C as well as ther average X need not to be the accurate expected class covarance matrces but only need to follow the perturbaton model gven below Eq. (5). 3.. Estmatng parameters An mportant ssue left s to estmate the varance parameters σ,...,σ n and σ. The dea s straghtforward that the parameters are learned from the generated observaton values of perturbaton random vectors usng maxmum lelhood. However, an ndrect way s desrable, snce t s mpossble to fnd the realzatons of perturbaton random vectors drectly. Hence, our dea turns to fnd some sums of perturbaton random vectors based on the perturbaton model and then generate ther realzatons for estmaton Inferrng the sum of perturbaton random vectors Suppose, the number of tranng samples for class C, s larger than. Defne the average of observed samples n class C by excludng x as û = =, x, =,...,. (3) It s actually feasble to treat û as another emprcal mean of class C. Then, another random mean of class C s able to be formulated by: ũ = =, x = û + =, n. (4) Comparng wth ũ the random mean of class C n terms of Eq. (8), based on the perturbaton model, we now ũ and ũ can both stochastcally approxmate to E x C [x ] by the followng specfc estmates, respectvely: ˆũ = ˆ x = E x C [x ], (5) = ˆũ = =, ˆ x = E x C [x ], (6) where ˆ x = x + ˆn, ˆn s an estmate of n such that x + ˆn = E x C [x ] based on the perturbaton model. Hence, we can have the

6 W.-S. Zheng et al. / Pattern Recognton 4 (9) x ^ ξ α E x' C [x'] ^ ξ x 3... Inferrng estmates of σ,..., σ n and σ The estmates of σ,...,σ n and σ are gven below based on Eq. (3) and the generated {ˆn } =,...,L. Frst we denote =,..., û Δ = û û = (ûδ (),...,ûδ (n))t. (33) Then we defne ˆσ (, ) satsfyng ( ) ˆσ (, ) = (ûδ ()). (34) relaton below: ˆũ = Orgn Fg.. Geometrc nterpretaton: α = x x = ˆn ˆn. ˆũ. (7) A geometrc nterpretaton of Eq. (7) can be provded by Fg..ote that ˆũ = ˆũ = ˆũ,. It therefore yelds x x = ˆn ˆn. Accordng to Eq. (7), ths s obvously true because ˆ x = x + ˆn = E x C [x ], =,...,. ow return bac to the methodology. Based on Eq. (7) we then have ( ) =, ˆn ˆn = û û. (8) Defne a new random vector as: n = n ( ) n. (9) =, Based on Lemma, we now that the pooled perturbaton covarance matrx to be estmated for all {n } s X. It s therefore easy to verfy the followng result: n ( ), ( ) X. (3) Actually n s ust the sum of perturbaton random vectors we am to fnd. Moreover, Eq. (8) could provde an estmate of n by: ˆn = û û. (3) It therefore avods the dffculty n fndng the observaton values ˆn drectly. Moreover t s nown that {ˆn } =,..., follow the same dstrbuton wthn class C,.e., (, X), so t s feasble to ( ) generate observaton values {ˆn, ˆn,...,ˆn } from ths dstrbuton. In fact, the emprcal mean of the observaton values concdes wth ther expectaton wth respect to the dstrbuton because of the followng equalty: ˆn = (û û ) =. (3) = = In the uncorrelated space, X s modeled by X = K = dag(σ,...,σ n ) for approxmaton, so σ,...,σ n are estmated as ˆσ,..., ˆσ n by usng maxmum lelhood as follows: ˆσ = = = ˆσ (, ), =,...,n. (35) As suggested by Eq. (), an average varance of σ,...,σ n s used, so the estmate ˆσ of σ s obtaned below: ˆσ = n n ˆσ. (36) = Extensve experments n Secton 4 wll ustfy ths estmaton. 4. Expermental results The proposed P-LDA algorthm wll be evaluated by both synthetc data and face mage data. Face mages are typcal bometrc data. Always, the number of avalable face tranng samples for each class s very small whle the data dmensonalty s very hgh. Ths secton s dvded nto three parts. The frst and second parts report the experment results on synthetc data and face data, respectvely. In the thrd part, we verfy our parameter estmaton strategy on hgh-dmensonal face mage data. Through the experments, two popular classfers, namely nearest class mean classfer (CMC) and nearest neghbor classfer (C) are selected to evaluate the algorthms. These two nds of classfers have been wdely used for Fsher's LDA n exstng publcatons. All programs are mplemented usng Matlab and run on PC wth Intel Pentum (R) D CPU 3.4 GHz processor. 4.. Synthetc data Ths secton s to ustfy the performances of the proposed P-LDA under Theorems and, and show the effects of Eqs. () and () n modelng P-LDA. Three types of synthetc data followng sngle Gaussan and mxture of Gaussan dstrbutons n each class, respectvely are generated n a three-dmensonal space. As shown n Tables and, for sngle Gaussan dstrbuton, we consder two cases, n whch the covarance matrces are () dentty covarance matrces multpled by a constant.5 and () equal dagonal covarance matrces, respectvely. For each class, samples are generated. For mxture of Gaussan dstrbuton, each class conssts of three GC wth Table Overvew of the synthetc data (sngle Gaussan dstrbuton) Class Id Mean Covarance matrx I Covarance matrx II Class (.3,.5,.) T.5.9 Class (.,.,.5) T.5.7 Class 3 (.9,.7,.) T.5.38

7 77 W.-S. Zheng et al. / Pattern Recognton 4 (9) Table Overvew of the synthetc data (Gaussan mxture dstrbuton) Class Id Mean of frst GC Mean of second GC Mean of thrd GC Covarance matrx Class (,.5, ) T (.,,.6) T (.3,.5,.) T.98 Class (,.5, ) T (.,.,.5) T (,.9,) T.6593 Class 3 (.9,.7,.) T (.5,.6,.6) T (,.5,.) T z z.5 z Y X 4.5 Y X Y X Fg.. Illustraton of synthetc data: (a) s wth equal dentty covarance matrces multpled by.5, (b) s wth equal dagonal covarance matrces and (c) s wth Gaussan mxture dstrbuton. Table 3 Average accuracy results (equal dentty covarance matrces) Method Classfer: CMC Classfer: C p = (%) p = 5(%) p = (%) p = (%) p = 5(%) p = (%) P-LDA, Eq. () P-LDA, Eq. () Classcal Fsher's LDA Table 4 Average accuracy results (equal dagonal covarance matrces) Method Classfer: CMC Classfer: C p = (%) p = 5(%) p = (%) p = (%) p = 5(%) p = (%) P-LDA, Eq. () P-LDA, Eq. () Classcal Fsher's LDA Table 5 Average accuracy results (Gaussan mxture dstrbuton) Method Classfer: CMC Classfer: C p = 6()(%) p = 9(3)(%) p = 8 (6) (%) p = 6 () (%) p = 6()(%) p = 9(3)(%) p = 8 (6) (%) p = 6 () (%) P-LDA (GMM), Eq. () P-LDA (GMM), Eq. () Classcal Fsher's LDA (GMM) equal covarance matrces. For each GC, there are 4 samples randomly generated and there are samples for each class. Informaton about the synthetc data s tabulated n Tables and, and the data dstrbutons are llustrated n Fg.. In Tables 3 5, the accuraces wth respect to dfferent numbers of tranng samples for each class are shown, where p ndcates the number of tranng samples for each class. In the mxture of Gaussan dstrbuton case, the braceted number s the number of tranng samples from one GC of each class (e.g. p = 9 (3) means every three samples out of nne tranng samples of each class are from one of ts GCs). For each synthetc data set, we repeat the experments ten tmes and the average accuraces are obtaned. Snce fndng GC s not our focus, we assume that those GCs are nown for mplementaton of P-LDA based on Theorem. In addton, P-LDA (GMM), Eq. () means P-LDA s mplemented under Gaussan mxture model (GMM) based on Theorem wth parameter estmated by Eq. (); LDA (GMM) means classcal Fsher's LDA s mplemented usng a smlar scheme to Eq. (8) wthout the perturbaton factors. ote that no sngular problem n Fsher's LDA happens n the experment on synthetc data. In the sngle Gaussan dstrbuton case, we fnd that P-LDA usng Eq. () outperforms P-LDA usng Eq. () and classcal Fsher's LDA, especally when only two samples for each class are used for tranng. When the number of tranng samples for each class ncreases,

8 W.-S. Zheng et al. / Pattern Recognton 4 (9) Fg. 3. Some mages from the subset of FERET. Fg. 4. Some mages of one subect from the subset of CMU PIE. Fg. 5. Images of one subect from the subset of AR. P-LDA wll converge to classcal Fsher's LDA, as the class means wll be more accurately estmated when more samples are avalable. In Secton 5., theoretcal analyss would confrm ths scenaro. Smlar results are obtaned n the mxture of Gaussan case. These results show that when the number of tranng samples s small, P-LDA usng Eq. () can gve a more stable and better estmate of the parameter and therefore provde better results. 4.. Face mage data Fsher's LDA based algorthms are popularly used for dmenson reducton of hgh-dmensonal data, especally the face mages n bometrc learnng. In ths secton, the proposed method s appled to face recognton. Snce face mages are of hgh dmensonalty and only lmted samples are avalable for each person, we mplement P- LDA based on Theorem and Eq. () wth ts parameter estmated by Eq. (36). Three popular face databases, namely FERET [34] database, CMU PIE [35] database and AR database [3], are selected for evaluaton. For FERET, a subset conssts of 55 persons wth four faces for each ndvdual s establshed. All mages are extracted from four dfferent sets, namely Fa, Fb, Fc and the duplcate. Face mages n ths FERET subset are undergong llumnaton varaton, age varaton and some slght expresson varaton. For CMU PIE, a subset s establshed by selectng face mages under all llumnaton condtons wth flash n door [35] from the frontal pose, /4 left/rght profle and below/above n frontal vew. There are totally 74 mages and 5 face mages for each person n ths subset. For AR database, a subset s establshed by selectng 9 persons, where there are eght mages for each person. Face mages n ths subset are undergong notable expresson varatons. All face mages are algned accordng to ther coordnates of the eyes and face centers, respectvely. Each mage s lnearly stretched to the full range of [,] and ts sze s smply normalzed to 4 5. Some mages are llustrated n Fgs In order to evaluate the proposed model, P-LDA s compared wth some Fsher's LDA-based methods ncludng Fsherface [5], nullspace LDA (-LDA) [8], Drect LDA [4] and regularzed LDA wth CV Table 6 Average recognton accuracy on subset of FERET (p = 3) Method Classfer: CMC (%) Classfer: C (%) P-LDA R-LDA (CV) [3] LDA [8] Drect LDA [4] Fsherface [5] Table 7 Average recognton accuracy on subset of CMU PIE Method Classfer: CMC Classfer: C p = 5(%) p= (%) p = 5(%) p = (%) P-LDA R-LDA (CV) [3] LDA [8] Drect LDA [4] Fsherface [5] Table 8 Average recognton accuracy on subset of AR Method Classfer: CMC Classfer: C p = 3(%) p = 6 (%) p = 3(%) p = 6(%) P-LDA R-LDA (CV) [3] LDA [8] Drect LDA [4] Fsherface [5] CR-LDA (CV) [3], whch are popular used for solvng the small sample sze problem n Fsher's LDA for face recognton. On each data set, the experments are repeated tmes. For each tme, p mages for each person are randomly selected for tranng and the rest are for testng. In the tables, the value of p s ndcated. Fnally, the average recognton accuraces are obtaned. The results are tabulated n Tables 6 8. We see that P-LDA acheves at least 6% and 3% mprovements over Drect LDA and

9 77 W.-S. Zheng et al. / Pattern Recognton 4 (9) Table 9 Expense of R-LDA (CV) Method FERET, p = 3 CMU PIE, p = 5 CMU PIE, p = AR, p = 3 AR, p = 6 Tme/run (C/CMC) 9 hours hours 7.5 hours. hours hours Table Average recognton accuracy of P-LDA on FERET data set: P-LDA wth manually selected optmal parameter vs. P-LDA wth parameter estmaton Method Classfer: CMC Classfer: C Ran (%) Ran (%) Ran 3 (%) Ran (%) Ran (%) Ran 3 (%) P-LDA wth manually selected optmal parameter P-LDA wth parameter estmaton Table Average recognton accuracy of P-LDA on CMU PIE data set: P-LDA wth manually selected optmal parameter vs. P-LDA wth parameter estmaton Method Classfer: CMC Classfer: C Ran (%) Ran (%) Ran 3 (%) Ran (%) Ran (%) Ran 3 (%) P-LDA wth manually selected optmal parameter P-LDA wth parameter estmaton LDA, respectvely, on FERET database, and acheves more than 4% mprovement over Fsherface, Drect LDA and -LDA on CMU PIE database. On AR subset, P-LDA also gets sgnfcant mprovements over Fsherface and Drect LDA and gets more than % mprovement over -LDA. ote that no matter usng C or CMC, the results of -LDA are the same, because -LDA wll map all tranng samples of the same class nto the correspondng class emprcal mean n the reduce space [7]. In addton, a related method R-LDA wth CV parameter 6 s also conducted for comparson. On FERET, P-LDA gets more than one percent mprovement when usng C and gets about.6% mprovement when usng CMC. On CMU, when p = 5, P-LDA gets.4% mprovement over R-LDA usng C and.5% mprovement usng CMC; when p =, P-LDA and R-LDA gets almost the same performances. On AR subset, the performances of P-LDA and R-LDA are also smlar. Though R-LDA gets smlar performance to P-LDA n some cases, however, as reported n Table 9, R-LDA s extremely computatonally expensve due to the CV process. In our experments, P-LDA can fnsh n much less than one mnute for each run, whle R-LDA usng CV technque taes more than one hour. More comparson between P-LDA and R-LDA could be found n Secton 5.. It wll be analyzed later that R-LDA can be seen as a sem-perturbaton LDA, whch gves a novel understandng to R-LDA. It would also be explored that the proposed perturbaton model actually can suggest an effectve and effcent way for the regularzed parameter estmaton n R-LDA. Therefore, P-LDA s much more effcent and stll performs better. Although Fsherface, Drect LDA, -LDA and R-LDA are also proposed for extracton of dscrmnant features n the undersampled case, they manly address the sngularty problem of the wthn-class matrx, whle P-LDA addresses the perturbaton problem n Fsher crteron due to the dfference between a class emprcal mean and ts expectaton value. otng that P-LDA usng model () and () can also solve the sngularty problem, ths suggests allevatng the 6 On FERET, three-fold CV s performed; On CMU, fve-fold CV s performed when p = 5 and -fold CV s performed when p = ; On AR, three-fold CV s performed when p = 3 and sx-fold CV s performed when p = 6. The canddates of the regularzaton parameter λ are sampled from.5 to wth step.5. In the experment, the three-fold CV s repeated ten tmes on FERET. On CMU, the fve-fold and -fold CV are repeated sx and three tmes, respectvely; on AR, the three-fold and sx-fold CV are repeated and 5 tmes, respectvely. So, each CV parameter s determned va ts correspondng 3 round CV classfcaton. perturbaton problem s useful to further enhance the Fsher crteron. In addton, the above results as well as the results on synthetc data sets also ndcate that when the number of tranng samples s large, the dfferences between P-LDA and the compared LDA based algorthms become small. Ths s true accordng to the perturbaton analyss gven n ths paper, snce the estmates of the class means wll be more accurate when tranng samples for each class become more suffcent. otng also that the dfference between P-LDA and R-LDAssmallwhenp s large on CMU and AR, t mples the mpact of the perturbaton model n estmaton of the between-class covarance nformaton wll become mnor as the number of tranng samples ncreases. In Secton 5., we would gve more theoretcal analyss Parameter verfcaton In the last two subsectons, we show that P-LDA usng Eq. () gves good results on both synthetc and face mage data, partcularly when the number of tranng samples s small. In ths secton, we wll have extensve statstcs of the performances of P-LDA on FERET and CMU PIE f the parameter σ s set to be other values. We compare the proposed P-LDA wth parameter estmaton wth the best scenaro selected manually. The detaled procedure of the experments s lsted as follows. Step (): Pror values of σ are extensvely sampled. We let σ = η η,< η <, so that σ (,+ ). Then 999 ponts are sampled for η between.5 and.9995 wth nterval.5. Fnally, 999 sampled values of σ are obtaned. Step (): Evaluate the performance of P-LDA wth respect to each sampled value of σ. We call each P-LDA wth respect to a sampled value of σ a model. Step (3): We compare the P-LDA model wth parameter σ estmated by the methodology suggested n Secton 3. aganst the best one among all models of P-LDA got at step (). The average recognton rate of each model of P-LDA s obtaned by usng the same procedure run on FERET and CMU PIE databases. We consder the case when p, the number of tranng samples for each class, s equal to three on FERET and equal to fve on CMU. For clear descrpton, the P-LDA model wth parameter estmated usng the methodology suggested n Secton 3. s called P-LDA wth parameter estmaton, whereas we call the P-LDA model wth

10 W.-S. Zheng et al. / Pattern Recognton 4 (9) Accuracy 97% 95% 94% 9% 9% 89% 88% 86% 97% 95% 94% 9% P-LDA wth manually selected 9% optmal parameter 89% P-LDA wth parameter estmaton 88% 86% Ran Accuracy P-LDA wth manually selected optmal parameter P-LDA wth parameter estmaton Ran Fg. 6. P-LDA wth manually selected optmal parameter vs. P-LDA wth parameter estmaton on FERET. Accuracy 96% 95% 93% 9% 9% 89% 87% 86% 84% 83% 8% 8% 78% 95% 94% 9% 9% 89% 88% P-LDA wth manually selected 86% optmal parameter 85% P-LDA wth parameter estmaton 83% 8% 8% Ran Accuracy P-LDA wth manually selected optmal parameter P-LDA wth parameter estmaton Ran Fg. 7. P-LDA wth manually selected optmal parameter vs. P-LDA wth parameter estmaton on CMU. Senstvty, FERET, CMC Senstvty, FERET, CMC Average Recognton Rate (%) Varance Average Recognton Rate (%) Varance Fg. 8. Classfer: CMC. (a) The performance of P-LDA as a functon of σ (x-axs) on FERET, where the horzontal axs s scaled logarthmcally and (b) the enlarged part of (a) near the pea of the curve where σ s small. Senstvty, FERET,C Senstvty, FERET, C Average Recognton Rate (%) Varance Average Recognton Rate (%) Varance Fg. 9. Classfer: C. (a) The performance of P-LDA as a functon of σ (x-axs) on FERET, where the horzontal axs s scaled logarthmcally; (b) the enlarged part of (a) near the pea of the curve where σ s small.

11 774 W.-S. Zheng et al. / Pattern Recognton 4 (9) Average Recognton Rate (%) Senstvty, CMU, CMC Average Recognton Rate (%) Senstvty, CMU, CMC Varance Varance Fg.. Classfer: CMC. (a) The performance of P-LDA as a functon of σ (x-axs) on CMU PIE, where the horzontal axs s scaled logarthmcally and (b) the enlarged part of (a) near the pea of the curve where σ s small. Senstvty, CMU, C Senstvty, CMU, C Average Recognton Rate (%) Varance Average Recognton Rate (%) Varance Fg.. Classfer: C. (a) The performance of P-LDA as a functon of σ (x-axs) on CMU PIE, where the horzontal axs s scaled logarthmcally; (b) the enlarged part of (a) near the pea of the curve where σ s small. respect to the best σ selected from the 999 sampled values P-LDA wth manually selected optmal parameter. Comparson results of the ran 3 accuraces are reported n Tables and. Fgs. 6 and 7 show the ranng accuraces of these two models. It shows that the dfference of ran accuraces between two models s less than.% n general. To evaluate the senstvty of P-LDA on σ, the performance of P-LDA as a functon of σ s shown from Fg. 8 to Fg. 9 usng CMC and C classfers, respectvely. The overall senstvty of P-LDA on σ for FERET data set s descrbed n Fg. 8(a), where the horzontal axs s on a logarthmc scale. Fg. 8(b) shows the enlarged part of Fg. 8(a) near the pea of the curve where σ s small. Smlarly, Fgs. and show the result on CMU PIE. They show t may be hard to obtan an optmal estmate of σ, but nterestngly t s shown n Tables and and Fgs. 6 and 7 that the suggested methodology n Secton 3. wors well. It s apparent that selectng the best parameter manually usng an extensve search would be tme consumng, whle P-LDA usng the proposed methodology for parameter estmaton costs much less than one mnute. So the suggested methodology s computatonally effcent. 5. Dscusson As shown n the experment, the number of tranng samples for each class s really an mpact of the performance of P-LDA. In ths secton, we explore some theoretcal propertes of P-LDA and the convergence of P-LDA wll be shown. We also dscuss P-LDA wth some related methods. 5.. Admssble condton of P-LDA Suppose L s fxed. Snce the entres of all perturbaton covarance matrces are bounded, 7 t s easy to obtan S Δ b =O( )andsδ w =O( ),.e., the perturbaton factor S Δ b O, SΔ w O when, where O s the zero matrx. Here, for any matrx A = A(β) of whch each nonzero entry depends on β,wesaya = O(β) f the degree 8 of A O s comparable to the degree of β. However, f L s a varant,.e., the ncrease of the sample sze may be partly due to the ncrease of the amount of classes, then S Δ b O( ) and SΔ w O( ). Suppose any covarance matrx X C s lower (upper) bounded by X lower f and only f X lower (, ) X C (, )(X C (, ) X upper (, )) for any (,). Then the followng lemma gves an essental vew, and ts proof s gven n Appendx C. Lemma. If all nonzero perturbaton covarance matrces X C, =,...,L, are lower bounded by X lower and upper bounded by 7 We say a matrx s bounded f and only f all entres of ths matrx are bounded. 8 ThedegreeofA = A(β) O dependng on β s defned to be the smallest degree for A(,) dependng on β, where A(,) s any nonzero entry of A. For example, A = [β β ], then the degree of A O s and A = O(β).

12 W.-S. Zheng et al. / Pattern Recognton 4 (9) Table Average recognton accuracy of R-LDA on FERET data set: R-LDA wth manually selected optmal parameter vs. R-LDA usng perturbaton model (p = 3) Method Classfer: CMC Classfer: C Ran (%) Ran (%) Ran 3 (%) Ran (%) Ran (%) Ran 3 (%) R-LDA wth manually selected optmal parameter R-LDA (CV) R-LDA usng perturbaton model Table 3 Average recognton accuracy of R-LDA on CMU PIE data set: R-LDA wth manually selected optmal parameter vs. R-LDA usng perturbaton model (p = 5) Method Classfer: CMC Classfer: C Ran (%) Ran (%) Ran 3 (%) Ran (%) Ran (%) Ran 3 (%) R-LDA wth manually selected optmal parameter R-LDA (CV) R-LDA usng perturbaton model X upper, where X lower and X upper are ndependent of L and, then t s true that S Δ b = O( L ) and SΔ w = O( L ). The condton of Lemma s vald n practce, because the data space s always compact and moreover t s always a Eucldean space of fnte dmenson. In partcular, from Eq. (), t could be found that the perturbaton matrces depend on the average sample sze for each class. Based on Theorem, we fnally have the followng proposton. Proposton. (Admssble condton of P-LDA) P-LDA depends on the average number of samples for each class. That s S Δ b = O( L ) and S Δ w = O( L ),.e., SΔ b O, SΔ w O when L. It s ntutve that some estmated class means are unstable when the average sample sze for each class s small. 9 Ths also shows what P-LDA targets for s dfferent from the sngularty problem n Fsher's LDA, whch wll be solved f the total sample sze s large enough. Moreover the experments on synthetc data n Secton 4. could provde the support to Proposton, as the dfference between P-LDA and classcal Fsher's LDA become smaller when the average sample sze for each class becomes larger. 5.. Dscusson wth related approaches 5... P-LDA vs. R-LDA Regularzed LDA (R-LDA) s always modeled by the followng crteron: W opt = arg max W trace(w TŜ b W) trace(w T, λ >. (37) (Ŝw + λi)w) Sometmes, a postve dagonal matrx s used to replace λi n the above equalty. Generally, the formulaton of P-LDA n Secton s dfferent from the form of R-LDA. Although the formulaton of R-LDA loos smlar 9 Wth sutable tranng samples, the class means may be well estmated, but selecton of tranng samples s beyond the scope of ths paper. to the smplfed model of P-LDA n Secton 3, the motvaton and obectve are totally dfferent. Detals are dscussed as follows.. P-LDA s proposed by learnng the dfference between a class emprcal mean and ts correspondng expectaton value as well as ts mpact to Fsher crteron, whereas R-LDA s orgnally proposed for the sngularty problem [9,,3] because Ŝw + λi s postve wth λ >.. In P-LDA, the effects of S Δ b and SΔ w are nown based on the perturbaton analyss n theory. In contrast, R-LDA stll does not clearly tell how λi has effect on Ŝw n a pattern recognton sense. Although Zhang et al. [] presented a connecton between the regularzaton networ algorthms and R-LDA from a least square vew, t stll lacs nterpretaton how regularzaton can has effect on wthn-class and between-class covarance matrces smultaneously and also lacs parameter estmaton. 3. P-LDA tells the convergence of perturbaton factors by Proposton. However, R-LDA does not tell t n theory. The sngularty problem R-LDA addresses s n nature an mplementaton problem and t would be solved when the total sample sze s suffcently large, whle t does not mply the average sample sze for each class s also suffcently large n ths stuaton. 4. P-LDA s developed when data of each class follow ether sngle Gaussan dstrbuton or Gaussan mxture dstrbuton, but R-LDA has not consdered the effect of data dstrbuton. 5. In P-LDA, scheme for parameter estmaton s an ntrnsc methodology derved from the perturbaton model tself. For R-LDA, a separated algorthm s requred, such as the CV method, whch s so far popular. However, CV serously les on a dscrete set of canddate parameters. In general, CV s always tme consumng. Interestngly, f the proposed perturbaton model s mposed on R- LDA,.e., R-LDA s treated as a sem-perturbaton Fsher's LDA, where only wthn-class perturbaton S Δ w s consdered and the factor SΔ b s gnored, then the methodology n Secton 3 may provde an nterpretaton how the term λi has ts effect n the entre PCA space. Ths novel vew to R-LDA can gve the advantage n applyng the proposed perturbaton model for an effcent and effectve estmaton of the regularzed parameter λ n R-LDA. To ustfy ths, smlar comparsons on FERET and CMU subsets between R-LDA wth manually selected optmal parameter and R-LDA usng perturbaton model are performed n Tables and 3, where R-LDA wth manually selected optmal parameter s mplemented smlarly to P-LDA wth manually selected optmal parameter as demonstrated n Secton 4.3. For reference, the results of R-LDA (CV) are also shown. We fnd that R-LDA usng perturbaton model extremely approxmates to R-LDA wth manually selected optmal parameter and acheves

Unified Subspace Analysis for Face Recognition

Unified Subspace Analysis for Face Recognition Unfed Subspace Analyss for Face Recognton Xaogang Wang and Xaoou Tang Department of Informaton Engneerng The Chnese Unversty of Hong Kong Shatn, Hong Kong {xgwang, xtang}@e.cuhk.edu.hk Abstract PCA, LDA

More information

Regularized Discriminant Analysis for Face Recognition

Regularized Discriminant Analysis for Face Recognition 1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one) Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

A Novel Biometric Feature Extraction Algorithm using Two Dimensional Fisherface in 2DPCA subspace for Face Recognition

A Novel Biometric Feature Extraction Algorithm using Two Dimensional Fisherface in 2DPCA subspace for Face Recognition A Novel ometrc Feature Extracton Algorthm usng wo Dmensonal Fsherface n 2DPA subspace for Face Recognton R. M. MUELO, W.L. WOO, and S.S. DLAY School of Electrcal, Electronc and omputer Engneerng Unversty

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN

MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN S. Chtwong, S. Wtthayapradt, S. Intajag, and F. Cheevasuvt Faculty of Engneerng, Kng Mongkut s Insttute of Technology

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 1, July 2013

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 1, July 2013 ISSN: 2277-375 Constructon of Trend Free Run Orders for Orthogonal rrays Usng Codes bstract: Sometmes when the expermental runs are carred out n a tme order sequence, the response can depend on the run

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Foundations of Arithmetic

Foundations of Arithmetic Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS M. Krshna Reddy, B. Naveen Kumar and Y. Ramu Department of Statstcs, Osmana Unversty, Hyderabad -500 007, Inda. nanbyrozu@gmal.com, ramu0@gmal.com

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

Time-Varying Systems and Computations Lecture 6

Time-Varying Systems and Computations Lecture 6 Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy

More information

Power law and dimension of the maximum value for belief distribution with the max Deng entropy

Power law and dimension of the maximum value for belief distribution with the max Deng entropy Power law and dmenson of the maxmum value for belef dstrbuton wth the max Deng entropy Bngy Kang a, a College of Informaton Engneerng, Northwest A&F Unversty, Yanglng, Shaanx, 712100, Chna. Abstract Deng

More information

Tensor Subspace Analysis

Tensor Subspace Analysis Tensor Subspace Analyss Xaofe He 1 Deng Ca Partha Nyog 1 1 Department of Computer Scence, Unversty of Chcago {xaofe, nyog}@cs.uchcago.edu Department of Computer Scence, Unversty of Illnos at Urbana-Champagn

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo

More information

The lower and upper bounds on Perron root of nonnegative irreducible matrices

The lower and upper bounds on Perron root of nonnegative irreducible matrices Journal of Computatonal Appled Mathematcs 217 (2008) 259 267 wwwelsevercom/locate/cam The lower upper bounds on Perron root of nonnegatve rreducble matrces Guang-Xn Huang a,, Feng Yn b,keguo a a College

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

CHAPTER 14 GENERAL PERTURBATION THEORY

CHAPTER 14 GENERAL PERTURBATION THEORY CHAPTER 4 GENERAL PERTURBATION THEORY 4 Introducton A partcle n orbt around a pont mass or a sphercally symmetrc mass dstrbuton s movng n a gravtatonal potental of the form GM / r In ths potental t moves

More information

Multigradient for Neural Networks for Equalizers 1

Multigradient for Neural Networks for Equalizers 1 Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT

More information

Formulas for the Determinant

Formulas for the Determinant page 224 224 CHAPTER 3 Determnants e t te t e 2t 38 A = e t 2te t e 2t e t te t 2e 2t 39 If 123 A = 345, 456 compute the matrx product A adj(a) What can you conclude about det(a)? For Problems 40 43, use

More information

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

A Local Variational Problem of Second Order for a Class of Optimal Control Problems with Nonsmooth Objective Function

A Local Variational Problem of Second Order for a Class of Optimal Control Problems with Nonsmooth Objective Function A Local Varatonal Problem of Second Order for a Class of Optmal Control Problems wth Nonsmooth Objectve Functon Alexander P. Afanasev Insttute for Informaton Transmsson Problems, Russan Academy of Scences,

More information

One-sided finite-difference approximations suitable for use with Richardson extrapolation

One-sided finite-difference approximations suitable for use with Richardson extrapolation Journal of Computatonal Physcs 219 (2006) 13 20 Short note One-sded fnte-dfference approxmatons sutable for use wth Rchardson extrapolaton Kumar Rahul, S.N. Bhattacharyya * Department of Mechancal Engneerng,

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

Research Article Green s Theorem for Sign Data

Research Article Green s Theorem for Sign Data Internatonal Scholarly Research Network ISRN Appled Mathematcs Volume 2012, Artcle ID 539359, 10 pages do:10.5402/2012/539359 Research Artcle Green s Theorem for Sgn Data Lous M. Houston The Unversty of

More information

Explaining the Stein Paradox

Explaining the Stein Paradox Explanng the Sten Paradox Kwong Hu Yung 1999/06/10 Abstract Ths report offers several ratonale for the Sten paradox. Sectons 1 and defnes the multvarate normal mean estmaton problem and ntroduces Sten

More information

2.3 Nilpotent endomorphisms

2.3 Nilpotent endomorphisms s a block dagonal matrx, wth A Mat dm U (C) In fact, we can assume that B = B 1 B k, wth B an ordered bass of U, and that A = [f U ] B, where f U : U U s the restrcton of f to U 40 23 Nlpotent endomorphsms

More information

Computing MLE Bias Empirically

Computing MLE Bias Empirically Computng MLE Bas Emprcally Kar Wa Lm Australan atonal Unversty January 3, 27 Abstract Ths note studes the bas arses from the MLE estmate of the rate parameter and the mean parameter of an exponental dstrbuton.

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

On a direct solver for linear least squares problems

On a direct solver for linear least squares problems ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Games of Threats. Elon Kohlberg Abraham Neyman. Working Paper

Games of Threats. Elon Kohlberg Abraham Neyman. Working Paper Games of Threats Elon Kohlberg Abraham Neyman Workng Paper 18-023 Games of Threats Elon Kohlberg Harvard Busness School Abraham Neyman The Hebrew Unversty of Jerusalem Workng Paper 18-023 Copyrght 2017

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models

More information

x i1 =1 for all i (the constant ).

x i1 =1 for all i (the constant ). Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by

More information

Uncertainty and auto-correlation in. Measurement

Uncertainty and auto-correlation in. Measurement Uncertanty and auto-correlaton n arxv:1707.03276v2 [physcs.data-an] 30 Dec 2017 Measurement Markus Schebl Federal Offce of Metrology and Surveyng (BEV), 1160 Venna, Austra E-mal: markus.schebl@bev.gv.at

More information

Lecture 6 More on Complete Randomized Block Design (RBD)

Lecture 6 More on Complete Randomized Block Design (RBD) Lecture 6 More on Complete Randomzed Block Desgn (RBD) Multple test Multple test The multple comparsons or multple testng problem occurs when one consders a set of statstcal nferences smultaneously. For

More information

LECTURE 9 CANONICAL CORRELATION ANALYSIS

LECTURE 9 CANONICAL CORRELATION ANALYSIS LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of

More information

Convexity preserving interpolation by splines of arbitrary degree

Convexity preserving interpolation by splines of arbitrary degree Computer Scence Journal of Moldova, vol.18, no.1(52), 2010 Convexty preservng nterpolaton by splnes of arbtrary degree Igor Verlan Abstract In the present paper an algorthm of C 2 nterpolaton of dscrete

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES BÂRZĂ, Slvu Faculty of Mathematcs-Informatcs Spru Haret Unversty barza_slvu@yahoo.com Abstract Ths paper wants to contnue

More information

Lecture 3 Stat102, Spring 2007

Lecture 3 Stat102, Spring 2007 Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture

More information

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION Advanced Mathematcal Models & Applcatons Vol.3, No.3, 2018, pp.215-222 ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EUATION

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Joint Statistical Meetings - Biopharmaceutical Section

Joint Statistical Meetings - Biopharmaceutical Section Iteratve Ch-Square Test for Equvalence of Multple Treatment Groups Te-Hua Ng*, U.S. Food and Drug Admnstraton 1401 Rockvlle Pke, #200S, HFM-217, Rockvlle, MD 20852-1448 Key Words: Equvalence Testng; Actve

More information

Report on Image warping

Report on Image warping Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

Automatic Object Trajectory- Based Motion Recognition Using Gaussian Mixture Models

Automatic Object Trajectory- Based Motion Recognition Using Gaussian Mixture Models Automatc Object Trajectory- Based Moton Recognton Usng Gaussan Mxture Models Fasal I. Bashr, Ashfaq A. Khokhar, Dan Schonfeld Electrcal and Computer Engneerng, Unversty of Illnos at Chcago. Chcago, IL,

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information