Introduction to local (nonparametric) density estimation. methods

Itroducto to local (oparametrc) desty estmato methods A slecture by Yu Lu for ECE 66 Sprg 014

1. Itroducto Ths slecture troduces two local desty estmato methods whch are Parze desty estmato ad k-earest eghbor desty estmato. Local desty estmato s also referred to as o-parametrc desty estmato. To make thgs clear, let s frst look at parametrc desty estmato. I parametrc desty estmato, we ca assume that there exsts a desty fucto whch ca be determed by a set of parameters. The set of parameters are estmated from the sample data ad are later used desgg the classfer. However, some practcal stuatos the assumpto that there exsts a parametrc form of the desty fucto does ot hold true. For example, t s very hard to ft a multmodal probablty dstrbuto wth a smple fucto. I ths case, we eed to estmate the desty fucto the oparametrc way, whch meas that the desty fucto s estmated locally based o a small set of eghborg samples. Because of ths localty, local (oparametrc) desty estmato s less accurate tha parametrc desty estmato. I the followg text the word local s preferred over oparametrc. It s oteworthy that t s very dffcult to obta a accurate local desty estmato, especally whe the dmeso of the feature space s hgh. So why do we bother usg local desty estmato? Ths s because our goal s ot to get a accurate estmato, but rather to use the estmato to desg a well performed classfer. The accuracy of local desty estmato does ot ecessarly lead to a poor decso rule.. Geeral Prcple I local desty estmato the desty fucto p (x) ca be approxmated by k p ( x) (1) v where v s the volume of a small rego R aroud pot x, s the total umber of samples x ( =1,, ) draw accordg to p (x), ad k s the umber of x s whch fall to rego R. The reaso why p (x) ca be calculated ths way s that p (x) does ot vary much wth a relatvely small rego, thus the probablty mass of rego R ca be approxmated by p (x)v, whch equals k /. Some examples of rego R dfferet dmesos: ) le segmet oe-dmeso, ) crcle or rectagle two-dmeso, ) sphere or cube three-dmeso, v) hyper sphere or hypercube d-dmeso (d > 3). Three codtos we eed to pay atteto to whe usg formula (1) are: ) lm v 0. Ths s because f v s fxed, the p (x) oly represets the average probablty desty as grows larger, but what we eed s the pot probablty desty,

so we should have v 0 whe. ) lm k. Ths s to make sure that we do ot get zero probablty desty. ) lm k / 0. Ths s to make sure that p (x) does ot dverge. 3. Parze Desty Estmato I Parze desty estmato v s drectly determed by whle k s a radom varable whch deotes the umber of samples that fall to v. Assume that the rego R s a d-dmesoal hypercube wth ts edge legth h, thus v = (h ) d The equvalet codtos whch meet the aforemetoed three codtos are: lm v 0 ad lm v Therefore v ca be chose as v h / or v h / l, where h s a adjustable costat. Now that the relatoshp betwee v ad s defed, the ext step s to determe k. To determe k, we defe a wdow fucto as follows: x x 1 h 0 x x h else 1 where x s ( = 1,,, ) are the gve samples ad x s the pot where the desty s to be estmated. Thus we have k x x 1 h k 1 x x p ( x) v v 1 h The fucto s called a Parze wdow fucto, whch eables us to cout the umber of sample pots the hypercube wth ts edge legth h. Accordg to [], usg hypercube as the wdow fucto may lead to dscotuty the estmato. Ths s due to the supermposto of sharp pulses cetered at the gve sample pots whe h s small. To overcome ths shortcomg, we ca cosder a more geeral form of wdow fucto rather tha the hypercube. Note that f the followg two codtos are met, the estmated p (x) s guarateed to be proper.

( x) 0 ad ( x) dx 1 Therefore a better choce of wdow fucto whch removes dscotuty ca be Gaussa wdow: The estmated desty s gve by x x 1 1 x x exp h h p x 1 1 1 x x exp () v 1 h Cosder a oe-dmeso case, assume that v h /, thus h v h /, where h s a adjustable costat. Substtute to formula () we have 1 1 1 x x 1 1 1 x x p ( x) exp exp v 1 h h 1 h / We ca see that f equals oe, p (x) s just the wdow fucto. If approaches fty, p (x) ca coverge to ay complex form. If s relatvely small, p (x) s very sestve to the value of h. I geeral small h leads to the ose error whle large h leads to the over-smoothg error, whch ca be llustrated by the followg example. I ths expermet samples are 5000 pots o -D plae wth Gaussa dstrbuto. The mea vector s [1 ], ad the covarace matrx s [1 0; 0 1]. Choose rectagle 4 Parze wdow wth h 4/, thus v ( h) 16 /. Fg. 1 shows the sample dstrbuto. Fg. shows the deal probablty desty dstrbuto. Fg. 3 shows the result of Parze desty estmato. Fgure 1. 5000 sample pots o -D plae wth Gaussa dstrbuto

Fgure. The deal probablty desty dstrbuto Fgure 3. The result of Parze desty estmato Next we chage the value of h ad see how t affects the estmato. Fg. 4 shows the result of Parze desty estmato whe h s twce ts tal value. Fg. 5 shows the result of Parze desty estmato whe h s ts tal value dvded by two. We ca see that the results agree wth the aforesad property of h.

Fgure 4. The result of Parze desty estmato whe h s twce ts tal value Fgure 5. The result of Parze desty estmato whe h s ts tal value dvded by two To desg a classfer usg Parze wdow method [3], we estmate the destes for each class ad classfy the test pot by the label correspodg to the maxmum posteror. Below lsts some advatages ad dsadvatages of Parze desty estmato: Advatages: ) p (x) ca coverge to ay complex form whe approaches fty; ) applcable to data wth ay dstrbuto. Dsadvatages: ) eed a large umber of samples to obta a accurate estmato; ) computatoally expesve, ot sutable for feature space wth very hgh dmesos;

) the adjustable costat h has a relatvely heavy fluece o the decso boudares whe s small, ad s ot easy to choose practce. 4. K-Nearest Neghbor Desty Estmato I k-earest eghbor desty estmato (use acroym k-nn the followg text) k s drectly determed by whle v s a radom varable whch deotes the volume that ecompasses just k sample pots sde v ad o ts boudary. If v s a sphere, t ca be gve by h ( hk ) vk ( x) m h ( 1) ( 1) where h s the radus of the sphere wth ceter x. h k equals x lk - x where x lk s the k th closest sample pot to x. The the probablty desty at x s approxmated by px ( ) k v ( x) 1 (3) where k 1 s umber of sample pots o the boudary of v k (x). Most of the tme formula (3) ca be rewrtte as It ca be proved that E[ p( x)] p( x). k 1 p( x) ( k ) v ( x) k I Parze desty estmato v oly depeds o ad s the same for all the test pots, whle k-nn v s smaller at hgh desty area ad s larger at low desty area. Ths strategy seems more reasoable tha the strategy to determe v Parze desty estmato sce ow v s adaptve to the local desty. I practce, whe we wat to classfy data usg k-nn estmato, t turs out that we ca get the posteror p(w x) drectly wthout worryg about p(x). If we have k samples fall to volume v aroud pot x, ad amog the k samples there are k samples belogg to class w, the we have p w x k, k v The posteror p(w x) s gve by p w x,, p w x p w x k m P( x) k p w, x j1 j (4) where m s the umber of classes. Formula (4) tells us oe smple decso rule: the

class of a test pot x s the same as the most frequet oe amog the earest k pots of x. Smple ad tutve, s t t? Havg sad that, choosg k k-nn s stll a otrval problem as choosg h Parze desty estmato. Small k leads to osy decso boudares whle large k leads to over-smoothed boudares, whch s llustrated by the followg example. I ths expermet samples are 00 pre-labeled (red or blue) pots. The task s to fd the classfcato boudares uder dfferet k values. Fg. 6-9 show the results. Fgure 6. k-nn decso boudares expermet (k=) Fgure 7. k-nn decso boudares expermet (k=3)

Fgure 8. k-nn decso boudares expermet (k=5) Fgure 9. k-nn decso boudares expermet (k=8) I practce we ca use cross-valdato to choose the best k. Below lsts some advatages ad dsadvatages of k-nn: Advatages: ) decso performace s good f s large eough; ) applcable to data wth ay dstrbuto; ) smple ad tutve. Dsadvatages: ) eed a large umber of samples to obta a accurate estmato, whch s evtable local desty estmato; ) computatoally expesve, low effcecy for feature space wth very hgh dmesos; ) choosg the best k s otrval.

5. Referece [1] Mrelle Bout, ECE66: Statstcal Patter Recogto ad Decso Makg Processes, Purdue Uversty, Sprg 014 [] http://www.cse.buffalo.edu/~jcorso/t/cse555/fles/aote_8feb_oprm.pdf [3] http://www.csd.uwo.ca/~olga/courses/cs434a_541a/lecture6.pdf