Algebraic-Geometric and Probabilistic Approaches for Clustering and Dimension Reduction of Mixtures of Principle Component Subspaces

Size: px

Start display at page:

Download "Algebraic-Geometric and Probabilistic Approaches for Clustering and Dimension Reduction of Mixtures of Principle Component Subspaces"

Sheena Lamb
6 years ago
Views:

1 Algebrac-Geometrc ad Probablstc Approaches for Clusterg ad Dmeso Reducto of Mxtures of Prcple Compoet Subspaces ECE842 Course Project Report Chagfag Zhu Dec. 4, 2004

2 Algebrac-Geometrc ad Probablstc Approach for Clusterg ad Dmeso Reducto of Mxtures of Prcple Compoet Subspaces ECE842 Course Project Report Chagfag Zhu Abstract Geeralzed Prcpal Compoet Aalyss (GPCA) ad Probablstc Prcpal Compoet Aalyss (PPCA) are two extesos of PCA approaches to the mxtures of prcpal subspaces. GPCA s a algebrac geometrc framewor whch the collecto of lear subspaces s represeted by a set of homogeeous polyomals whose degree correspods to the umber of subspaces ad whose factors (roots) ecode the subspace parameter. PPCA s a probablstc approach where the prcpal compoet aalyss s vewed as a maxmum-lelhood procedure based o a probablty desty model of the observed data. Both techques are capable of estmatg a mxture of subspaces from sample data pots, thus useful for data clusterg ad dmeso reducto problems multvarate data mg. The prmary goal of ths project s to carry out a coceptual study, to explore the prcples ad features of the algebrac-geometrcal ad probablstc approaches to mxtures of prcpal compoet subspaces, ad lear from had-o experece though computatoal mplemetato of these techques. A polyomal factorzato algorthm (PFA) for GPCA ad a expectato-maxmzato (EM) for PPCA were mplemeted usg MATLAB codes. The mplemeted algorthms have bee tested o sythetc data sets. It was show that the PFA algorthm for GPCA ca successfully detfy the umber of subspaces the mxture, ad estmate the ormal vectors of the subspaces, f successful, wth a relatve hgh correlato. However, the mplemeted algorthm s ot robust as t s data depedet. The potetal problems of ths mplemetato were dscussed the report. The mplemeted EM algorthm for PPCA showed that a probablstc mxture model ca detfy the clusters, ad assg the cluster assocato of each data pot correctly. Both techques estmated the compoet subspaces of lower dmesoalty, thus data dmesos ca be reduced ad uderlyg clusters ca be recovered. I ths project, the mplemeted algorthms were oly tested o sythetc 3-dmesoal data ad ot yet tested o hgher dmesoal data or real data, ad the algorthms are far from comprehesve for practcal use. However, the computatoal mplemetato helped a lot the uderstadg of the two approaches for mxtures of prcpal compoet subspaces.

3 . Itroducto I the aalyss of multvarate (mult-dmesoal) data sets, group segmetato ad cluster formato ofte reveals sght that s useful owledge dscovery from the complex data set, whch are ofte hgh dmeso, mult-model, ad lac of pror owledge. Clusterg decomposto may eable the use of relatvely smple models for each of the local clusterg structures, offerg great ease of terpretato as well as the beefts of aalytcal ad computatoal smplfcato []. O the other had, although t s ow possble to aalyze large amouts of hgh-dmesoal data through the use of hgh-performace computers, geeral, however, several problems occur whe the umber of dmesos becomes hgh. These problems clude the exploso of executo tme, dffculty the selecto of explaatory varables [2]. Therefore, data clusterg ad dmeso reducto are mportat problems multvarate data mg. Data clusterg ad dmesoal reducto are correlated to each other. Usually ot all the data are useful for producg a desred clusterg,.e. some features may be redudat, ad some may be rrelevat. May clusterg algorthms fal whe dealg wth hgh dmesoal data. I ths case, detfyg ad retag oly those features that are most relevat to the desred clusterg would facltate the mult-dmesoal data aalyss. If the data clusters ca be vsualzed a lower dmesoal subspace, t wll allow better terpretato ad less computatoal commad. Prcple Compoet Aalyss (PCA) [2][3] s a very popular method used dmeso reducto, data vsualzato ad exploratory data aalyss. The dea s that a d-dmesoal data set ca be reduced to a set of q-dmesoal data usg q lear combato of bases of d-dmeso. The lear combato s cosdered as lear projecto or lear trasformato. The orgal d-dmesoal feature space s trasformed to a ew q-dmesoal (q < d) feature subspace. The ew feature spaces are called prcpal compoet subspace. The advatage of PCA s twofold: ) the orgal data s represeted by fewer varables wth mmal mea square error, whch reduces the dmesoalty of the data set; 2) the trasformato maxmzed the separato of data clusters. However, oe of the lmtatos of PCA s that t oly defes a sgle global projecto of the data. For more complex data, dfferet clusters may requre dfferet projecto drectos. The other lmtato of PCA s that the orgal data should have a lear or ear-lear structure, to esure the sgularty of the data matrx. If the data have a o-lear structure, the lear PCA may ot be adequate explorg the data. May extesos of PCA have bee developed to determe the prcpal subspace. I ths project, we studed the two extesos of PCA to the mxtures of subspaces: Geeralzed Prcpal Compoet Aalyss (GPCA) [4][5] ad Probablstc Prcpal Aalyss (PPCA) [6][7]. Geeralzed prcpal compoet aalyss s a algebracgeometrc approach, whch has bee proposed the computer vso commuty, prmarly the cotext of 3-D moto segmetato. Extesve wor o GPCA has bee carred out by Vdal, et al. [5] ad two algorthms, the polyomal factorzato algorthm (PFA) ad the polyomal dfferetato algorthm (PDA) have bee proposed. Probablstc prcpal compoet aalyss s uderstood a probablstc formulato of 2

4 PCA from a Gaussa latet varable model, whch s closely related to statstcal factor aalyss [6]. The prmary goal of ths project s to explore the prcples ad features of the algebrac-geometrcal ad probablstc approaches for clusterg ad dmeso reducto of mxtures of prcpal compoet subspaces, ad lear from had-o experece though computatoal mplemetato of these techques. The polyomal factorzato algorthm (PFA) for GPCA ad a expectato-maxmzato (EM) algorthm for PPCA were mplemeted MATLAB code. 2. Geometrc approach to mxtures of prcpal compoet subspaces: GPCA 2.. Prcples of GPCA I the geeralzed prcpal compoet aalyss, the sample data pots {x j R K }, j =,2, N, are draw from -dmesoal lear subspace of R K, {S }, =,. The problem s to detfy each subspace wthout owg whch sample pots belog to whch subspace. The uo of these lear subspaces of R K ca be vewed as correspodg to the projectve algebrac set defed by oe or more homogeeous polyomals of degree K varables. Hece, estmatg a collecto of subspace s equvalet to estmatg the algebrac varety defed by such a set of polyomals. I the case whe the subspace has dmesoalty of = K. Vdal, et al. [4][5] has show that the uo of such subspace s defed by a uque homogeeous polyomal p (x). The degree of p (x) s the the umber of hyperplaes ad each oe of the factors of p (x) correspods to each oe of the hyperplaes. Therefore the problem of detfyg a collecto of hyperplaes s reduced to estmatg ad factorg p (x). Sce every sample pot x R K must le o oe of the subspaces, S, every x must also satsfy p (x) = 0. The oe ca retreve p (x) drectly from the gve data samples wthout owg the segmetato of the data pot. Vdal [5] also showed that fact the umber of subspaces s exactly the lowest degree of p (x) such that p (x) = 0 for all sample pots. Ths leads to a smple matrx ra codto whch determes the umber of hyperplaes. Gve, the polyomal s determed from the soluto of a set of lear equatos. Gve p (x), the estmato of the hyperplaes s essetally equvalet to factorg p (x) to a product of lear factors Represetg mxtures of subspace as algebrac sets ad varetes Oe of the mportat cocept uderlyg GPCA problem s represetg the mxture of subspace as algebrac sets ad varetes. Notced that every (K-)-dmesoal space S R K ca be represeted by a ozero ormal vector b R K as: S = {x R K : b T x = 0}. Sce the subspaces S are all dstct from each other, the ormal vectors {b }, =, are parwse learly depedet. Gve that every sample pot x R K lyg o oe of the subspaces S, such a pot satsfes the formula: (b T x = 0) (b 2 T x = 0) (b 3 T x = 0).. (b T x = 0), whch s equvalet to the followg homogeeous polyomal of degree x wth real coeffcets: 3

5 p ( x ) = ( b x) = 0 = Ths olear equato s the multplcato of lear equatos x (or order multvarate polyomal), ad ca be expressed a lear formula as: T 2 K p ( x) = v ( x) c = c x x... x = 0, where T 2 K T v :[ x,..., xk ] [..., x x2... xk,...] T, 2,... K a s called Veroese map ad the tem of x... The coeffcets of the form 2 K x2 xk s a moomal wth, 2, K chose from the degree-lexcographc order. c, 2,... K are fuctos of the etres of {b }, =. The problem of GPCA s the to recover {b }, gve the coeffcets of c of the polyomal p (x). The olear Veroese map maps the orgal data { x j } j =,2, N wth dmeso of K to a embedded data space wth hgher dmeso of M + K + K ( M = = ), whch s very smlar to the commoly used erel K approach. But the mert s that t trasforms the olear equato of p (x) to a lear equato o the vectors of coeffcets c. Whe the umber of subspace s uow, t ca be determed from the ra of the Veroese map matrx L of the form T 2 T N T [ v ( x ), v ( x ),..., v ( x ) ] T. The moomals 2 x... K 2 K x2 xk ca be calculated from the gve data samples, the solvg for c s actually a problem of solvg a set of N lear equatos, where N s the total umber of sample pots. The remag problems s to factorze the polyomal p (x) wth coeffcets of c to fd the etres of {b }, =,,. Each factor wll gve a estmato of a subspace (hyperplae) Polyomal factorzato algorthm (PFA) for GPCA Vdal, et al. descrbed the polyomal factorzato algorthm for GPCA detal [4]. I ths project, the algorthm for the case the absece of ose ad each subspace has dmeso of = K has bee mplemeted. The algorthm mplemeted ths project s summarzed as followg: Gve sample pots { x j } j =,2, N lyg o a collecto of hyperplaes {S R K }, =,, fd the umber of hyperplaes ad the ormal vector to each hyperplae {b R K }, =,, as follows: ) Apply the Veroese map of order, for =,2,, to the vectors { x j } j =,2, N ad form the matrx L. Calculate the ra of each obtaed L. Whe ra (L ) = M, stop the Veroese mappg, ad the umber of hyperplaes s set to be curret. The solve for c from L c = 0 ad ormalze so that c =. 2) Get the coeffcets of the uvarate polyomal q (t) from the last + etres of c. 4

6 3) If the frst l (0 <= l <= ) coeffcets of q (t) are equal to zero, set (b K-,b K ) = (0,) for =,,l. The solve a order polyomal equato q (t) = 0, ad set (b K-,b K ) = (, - t j ) for j = l+, from the - l roots of q (t). 4) If all the coeffcets of q (t) are zero, just set (b K-,b K ) = (0,0) for =,,. 5) After obtag (b K-,b K ) for =,,, solve for {b J } =,, for J = K - 2,,, by solvg a lear system. A practcal PFA algorthm wll have to cosder the cases such as () the dmeso of subspace s smaller tha K ( < K ); (2) degeerate cases whch vectors (b rj+,b rj+2,,,b rk ) are ot parwse learly depedet; ad (3) presece of ose. However, these were ot explored ths course project. 3. Probablstc approach to mxtures of prcpal compoet subspaces: PPCA 3.. Prcples of PPCA Covetoal PCA sees a q-dmesoal (q < d) lear projecto that best represets the data a least-square sese. For a gve data set D of observed d-dmesoal vector D = {t }, =,,N, the sample covarace matrx S s frst calculated, whch s used for Sgular Value Decomposto (SVD) or ege-aalyss to fd a set of egevalues ad correspodg egevectors. The, the q domat egevectors u j ca be used to fathfully represet the orgal data wth mmal loss of formato, ad provde the q prcpal projecto axes. The projected data x s gve by x = U q T (t µ), where U q = [u, u 2,, u q ]. Ths s a lear projecto ad t maxmzes the varace the projected space. Probablstc PCA defes a probablty model [6][7], where the observatos t s defed as a lear trasformato of a latet varable x wth probablty dstrbuto of p(x), wth addtoal ose e: t = Wx + µ+ e. W s a d q lear trasformato matrx, µ s a d-dmesoal vector that allows t to have a o-zero mea. I most studes, x ad e are assumed to have Gaussa dstrbuto p(x) ~ N(0, I q ) ad p(e) ~ N(0, s 2 I d ). The the dstrbuto of t s also Gaussa of the dstrbuto p(t) ~ N(µ, WW T + s 2 I d ). Gve the above probablstc model of the data, oe ca always compute the maxmum-lelhood estmator for the parameters µ, s 2 ad W from the data samples D, ad the maxmum-lelhood estmates of these parameters are: N µ = σ ML t N = d 2 ML = λ d q = q+ W ML = U q (? q - s 2 MLI) /2 R where? q+,,? d are the smallest egevalues of the sample covarace matrx S, the q colums the d q orthogoal matrx U q are the q domat egevectors of S, dagoal 5

7 matrx? q cotas the correspodg q largest egevalues, ad R s a arbtrary q q orthogoal matrx. To smplfy the problem, R ca be chose as detty matrx I Mxture of PPCA Usually data ca be geerated from a mxture of compoets of dfferet probablty desty. I the clusterg usg fte mxture models, each compoet desty fucto p(t ) represets a cluster. Wth the probablstc model defed PPCA, oe ca model each mxture compoet as a sgle PPCA. The observed data the has a probablstc dstrbuto, ad the probablty desty of the observed data s modeled as the weghted sum of a umber of Gaussa dstrbutos ad expressed as: p( t) = 0 = π p( t µ, σ, W ), where p (t µ,s 2,W ) deotes a PPCA desty fucto for compoet, 0 s the total umber of compoets, ad p s the mxg proporto (weght) of the mxture compoets (subject to the costrats: p >= 0 ad sum(p, =,, 0 ) = ). Therefore, the maxmum- lelhood estmato of the model parameters should maxmze the loglelhood of the observed data, whch s gve by: L = N = log( p( t)) = N = log{ 0 = 2 2 π p( t µ, σ, W )}. Usg a Expectato-Maxmzato (EM) algorthm [7][8], we ca compute the maxmum-lelhood estmato for parameters p, µ, s 2 ad W, recursvely. Ths the gves the mxture compoets ad the mxg weght of each compoet the mxture. Oce the model parameters are determed, the lear relato betwee observato ad model compoets gve by t = Wx + µ+ e s completely defed. The the observed data t ca be projected to x space, as x = z W T (t µ ), whch s a q-dmesoal reduced represetato of th -cluster focused vector t. Plot the vector x wll create a th - cluster focused projecto -subspace, ad z gves the proporto of cotrbuto the pot t has to the -subspace EM algorthm for mxture of PPCA Expectato-Maxmzato (EM) refers to a teratve optmzato method to estmate some uow parameters T, gve measuremet data U [8]. I the mxture of PPCA problem, we wat to estmate the set of {p, µ, s 2, W }, =,, 0, usg the observed data D. So EM would be a dea method to solve the problem. The schematc summary of the algorthm s as follows: ) Italzato: I ths step, the tal estmate of the parameters {p 0, µ 0, s 0 2, W 0 } are radomly selected. 2) Usg EM to compute the estmato of parameters that maxmzes the loglelhood of the observed data D. 3) For =,2,. 6

8 E-step: Usg the curret estmato of parameters, calculate the posteror probablty (R ) of data t belogg to the th compoet gve by: 2 π p( t µ,, σ,, W, ) R,, =, =,, 0, =,,N p( t ) M-step: Usg the posteror probablty obtaed from E-step, calculate the ew estmato of parameters as followg: N π R µ, + =,, N = N R =,,, + = N R t =,, The usg the ew estmato of µ,+, =,, 0, compute the weghted sample covarace matrces as: S T R,, ( t µ, + )( t µ, + ) = N R =,, the compute the egevalues ad egevectors of S, ad update the estmate of s 2 ad W as: σ 2, + = d λ j d q j= q+ W,+ = U q (? q - s 2,+I q ) /2 4) Whe terato completes, calculate th -cluster focused projecto -subspace, x, of each sample t : x = R W T (t µ ) 4. Computatoal expermets I the computatoal expermets, we ) mplemeted the polyomal factorzato algorthm (PFA) for GPCA ad a expectato-maxmzato (EM) algorthm for PPCA MATLAB codes; ad2) valdated the capablty of these methods dscoverg the clusters the subspaces. 4.. Sythetc data sets The mplemeted algorthms were tested o a smple sythetc data set. Fgure (a) shows the data set cosstg of dmesoal data pots geerated for the GPCA test (referred to as Set ). The data were geerated from a lear combato of 3 2- dmesoal lear subspaces. Each subspace s represeted by a radomly selected ormal vector. I order to test whether the algorthm ca detfy the umber of subspaces correctly, data were geerated from lear combato of radomly selected = 2,3,4,6 subspaces. I all the cases tested ths study, o ose s added to the geerated data. Fgure (b) dsplays the data set geerated for PPCA test (referred to as Set 2). The data set cossts of 240 data pots geerated from a mxture of three 7

9 Gaussas 3-dmesoal space. Two of the clusters are closely spaced ad the thrd s well separated from the frst two. 8

10 (a) (b) Fgure (a) Sythetc data set for GPCA test. Data were geerated from a combato of 4 lear subspcaces; (b) Sythetc data set for PPCA test. Data were geerated from a mxture of 3 Gaussas. 9

11 4.2. Applyg GPCA to data Set The mplemeted PFA algorthm for GPCA was appled to the sythetc data set. It showed that for all the cases wth = 2,3,4,6, the algorthm ca fd the umber of subspaces correctly. However, fdg the ormal vector of each subspace s ot a trval wor. The dffcultes may come from two facts: ) the algorthm volves solvg for the roots of polyomal equatos. It s lely that for some cases, complex roots are obtaed; ad 2) the algorthm volves solvg multvarate lear systems,.e. solvg for x Ax = b, the successful estmato of the ormal vector compoets the depeds o the codto umber of matrx A. Whe the matrx s ll-codtoed, we could obta a correct soluto to x. If the radomly geerated data does ot mpose these llcodtoed problems to the GPCA procedure, the ormal vector of each subspace ca be estmated. As a example, the 4 radomly selected ormal vectors {b } =,2,3,4, of the subspaces from whch data Set geerates are: ad the estmated ormal vectors { bˆ} are: Note the estmated ormal vectors are ot the same order of the actual ormal vectors, ad the ormal vectors ca be dfferet wth a factor of (-). Table lsted the correlatos (corr) betwee the actual ormal vector {b } ad the estmated ormal vector { bˆ} 5 successful estmatos of subspaces, for the four cases wth the total umber of subspaces = 2,3,4,6, respectvely. The average ad stadard devatos of the absolute values of correlatos were also lsted the table. The correlato betwee the actual ormal vector {b } ad the estmated ormal vector { bˆ} s calculated as T corr = b b ˆ = A mus sg dcates the estmated ormal vector s the opposte drecto (or symmetrc about the org) relatve to the actual ormal vector. 0

12 Table Correlato (corr) betwee the actual ormal vector {b } ad the estmated ormal vector { bˆ} 5 successful estmatos of subspaces, for the four cases wth the total umber of subspaces = 2,3,4,6, respectvely. = 2 = 3 = 4 = AVG ( corr ) STD ( corr ) Expermet o the sythetc data showed that the algorthm mplemeted here ca successfully detfy the umber of subspaces the mxture, ad also estmate the ormal vectors of the subspaces, f successful, wth a relatve hgh correlato (~ 0.7 ths study). Oce the ormal vectors of the subspaces are determed, the orgal data ca be represeted the lower dmesoal subspaces, ad further aalyss ca be carred out o each subspace separately. However, the mplemetato s yet ot robust, sce t does deped o the radomly geerated data Applyg PPCA to Set 2 The geerated data Set 2 s a mxture of three Gaussas, wth two clusters closely placed ad oe cluster placed separately. We appled the EM algorthm to ths data set, frst assumg there are oly two clusters (subspaces), ad the assumg there are three clusters (subspaces). Fgure 2 shows the projected data x-space for the cases (a) assumg 2 clusters; ad (b) assumg 3 clusters. Dfferet colors ad marers are used to dcate the group assocato of each data pot to the subspaces. It s show that the probablstc mxture model ca fd out the clusters, ad assg the cluster assocato of each data pot correctly. Also the orgal data t ca be reduced to a 2-dmesoal data set x. I ths computatoal expermet, we have assumed the umber of subspaces. However, ths formato usually s uow ad caot be assumed arbtrarly. I a practcal usupervsed cluster decomposto, t would be desrable to select the structural parameter 0 of the model automatcally ad correctly. Wag, et al [] proposed usg two formato theoretc crtera,.e. the Aae formato crtero (AIC) ad mmum descrpto legth crtero (MDL), to gude the model selecto. Ths allows a optmal model to be selected from several competg model caddates such that the selected model best fts the observed data D. Ths techque s ot mplemeted ths project.

13 (a) (b) Fgure 2 Projected data x-space of the observatos t for the cases (a) assumg 2 clusters; ad (b) assumg 3 clusters. 2

14 5. Dscussos I ths project, we explored the prcples ad features of the algebrac-geometrcal (GPCA) ad probablstc approaches (PPCA) for clusterg ad dmeso reducto of mxtures of prcpal compoet subspaces, ad mplemeted these two techques MATLAB codes for had-o experece. I the absece of ose, the GPCA ca be casted a algebrac geometrc framewor whch the collecto of subspaces s represeted by a set of homogeeous polyomals whose degree correspods to the umber of subspaces ad whose factors (roots) ecode the subspace parameter [5]. The umber of subspaces ca be determed from the ra codto of the Veroese map matrx of the orgal data, ad the estmato of the hyperplaes s equvalet to factorg the polyomal of degree to a product of lear factors. The polyomal factorzato algorthm (PFA) proposed by Vdal et al. [4][5] s mplemeted the project. There s aother algorthm also proposed by Vdal [5], whch s called polyomal dfferetato algorthm (PDA). The PDA algorthm s desged for subspaces of arbtrary dmesos ad obtas a bass for each subspace by evaluatg the dervatve of the set of polyomals represetg the subspaces at a collecto of pots each oe of the subspaces. Vdal et al. have show that PDA algorthm gves about half of the error of the PFA algorthm, ad also mproves the performace of teratve techques, such as K-subspace ad EM, by about 50% wth respect to radom talzato. However, ths algorthm was ot mplemeted ths study. The expermet o the sythetc data shows that the PFA algorthm mplemeted ths study ca successfully detfy the umber of subspaces the mxture, ad estmate the ormal vectors of the subspaces, f successful, wth a relatve hgh correlato (~ 0.7 ths study). Oce the ormal vectors of the subspaces are determed, the orgal data ca be represeted the lower dmesoal subspaces, ad further aalyss ca be carred out o each subspace separately. However, the mplemetato s yet ot robust sce t s data depedet. Ths may be due to two facts: ) the algorthm volves solvg for the roots of polyomal equatos. It s lely that for some cases, complex roots are obtaed; ad 2) the algorthm volves solvg multvarate lear systems,.e. solvg for x Ax = b, the successful estmato of the ormal vector compoets the depeds o the codto umber of matrx A. Whe the matrx s ll-codtoed, we could obta a correct soluto to x. I PPCA, the prcpal compoet aalyss s vewed as a maxmum-lelhood procedure based o a probablty desty model of the observed data. The probablty model s Gaussa, ad determato of model parameters oly requres the computg of the egevectors ad egevalues of the sample covarace matrx. A mxture model of PPCA s cosdered whe multple clusters (subspaces) preset. I ths case, a EM algorthm s used to fd the prcpal subspaces by teratvely maxmzg the lelhood fucto. 3

15 The EM algorthm s mplemeted ad tested o sythetc data set the study. It s show that the probablstc mxture model ca fd out the clusters, ad assg the cluster assocato of each data pot correctly. The PPCA approach however, has some dsadvatages [5]: ) It s hard to aalyze the exstece ad uqueess of a soluto to the problem; 2) The approach s restrcted to certa classes of dstrbutos or depedece assumptos; ad 3) The covergece of EM s geeral very sestve to talzato, thus there s o guaratee that t wll coverge to the optmal soluto. As a coceptual study, the PPCA decomposto mplemeted ths study s oly completed o a sgle level. May groups [][7] have exteded the mxture of PPCA models to a herarchcal mxture model. I ther method, each PPCA compoet the lower level ca be exteded to a group g j j =,,J of PPCA compoets the ext hgher level. The EM algorthm ca be appled aga to the decomposto the hgher level. I ths way, the multple clustered ca be separated recursvely to geerate a herarchy of mxtures of PPCA wth a umber of levels. Ths herarchcal model wll allow the clusters to be vsualzed dfferet perceptual level, thus s very useful mult-dmesoal data vsualzato. The GPCA ad PPCA are two dfferet vews of the mxtures of prcpal compoets. It s ot easy to compare these two methods drectly, but both techques have the capablty of detfyg clusters ad subspaces, so that the orgal data ca be represeted the subspaces wth lower dmesoalty. They ca be appled a varety of estmato problems, such as 3-D moto segmetato computer vso, ad dmeso reducto problems such as data compresso ad feature extracto. I ths project, the mplemeted algorthms were oly tested o sythetc 3-dmesoal data ad ot yet tested o hgher dmesoal data or real data, ad the algorthms are far from comprehesve for practcal use. However, the computatoal mplemetato helped a lot the uderstadg of the two approaches for mxtures of prcpal compoet subspaces. 6. Refereces [] Y. Wag, L. Luo, M.T. Freedma ad S.Y. Kug, Probablstc Prcple Compoet Subspaces: a Herarchcal Fte Mxture Model for Data Vsualzato, IEEE Trasactos o Neural etwors, pp , Vol., No.3, May 2000 [2] M. Mzuta, Dmeso Reducto Methods, [3] R.A. Johso ad D.W. Wcher, Appled Multvarate Statstcal Aalyss, pp.x, 594, Pretce-Hall, Eglewood Clffs, N.J. (982) [4] R. Vdal, Y. Ma ad S. Sastry, Geeralzed Prcple Compoet Aalyss (GPCA), 2003 IEEE Computer Socety Coferece o Computer Vso ad Patter Recogto (CVPR 03), pp , vol., Jue 8-20, 2003, Madso WI 4

16 [5] R. Vdal, Geeralzed Prcpal Compoet Aalyss (GPCA): a Algebrac Geometrc Approach to Subspace Clusterg ad Moto Segmetato, PhD thess, Uversty of Calfora at Bereley, 2003 [6] M.E. Tppg ad C.M. Bshop, Probablstc Prcpal Compoet Aalyss, Techcal Report NCRG/97/00, Neural Computg Research Group, Asto Uversty, September 997. [7] T. Su ad J. Dy, Automated Herarchcal Mxtures of Probablstc Prcpal Compoet Aalyzers, Proceedgs of the 2 st Iteratoal Coferece o Mache Learg, Artcle No.98, Baff, Caada, July 04-08, 2004 [8] S. Rowes, EM algorthms for PCA ad SPCA, Proceedgs of the 997 Coferece o Advaces Neural Iformato Processg System, pp , Dever, Colorado, 998 5

Point Estimation: definition of estimators

Point Estimation: definition of estimators Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.