Nonparametric Density Estimation Intro

Noarametrc Desty Estmato Itro Parze Wdows

No-Parametrc Methods Nether robablty dstrbuto or dscrmat fucto s kow Haes qute ofte All we have s labeled data a lot s kow easer salmo bass salmo salmo Estmate the robablty dstrbuto from the labeled data lttle s kow harder

NoParametrc Techques: Itroducto I revous lectures we assumed that ether. someoe gves us the desty ( c j ) I atter recogto alcatos ths ever haes 2. someoe gves us ( θ cj ) Does hae sometmes, but we are lkely to susect whether the gve ( θ) models the data well Most arametrc destes are umodal (have a sgle local mamum), whereas may ractcal roblems volve mult-modal destes

NoParametrc Techques: Itroducto Noarametrc rocedures ca be used wth arbtrary dstrbutos ad wthout ay assumto about the forms of the uderlyg destes There are two tyes of oarametrc methods: Parze wdows Estmate lkelhood ( c j ) Nearest Neghbors Byass lkelhood ad go drectly to osteror estmato P(c j )

NoParametrc Techques: Itroducto Noarametrc techques attemt to estmate the uderlyg desty fuctos from the trag data Idea: the more data a rego, the larger s the desty fucto () Pr [ X R] f( ) R d salmo legth

NoParametrc Techques: Itroducto [ ] How ca we aromate Pr X R ad Pr X R2? 6 6 Pr[ X R] ad Pr[ X R2] 20 Pr [ X R] f( ) Should the desty curves above R ad R 2 be equally hgh? No, sce R s smaller tha R 2 Pr X R f d () R 20 d [ ] ( ) f( ) d Pr[ R ] X 2 To get desty, ormalze by rego sze R R [ ] 2 R R2 salmo legth

NoParametrc Techques: Itroducto Assumg f() s bascally flat sde R, # of samles R total # of samles Pr [ X R] f( y) R dy f ( ) Volume( R) Thus, desty at a ot sde R, ca be aromated f ( ) # of samles R total # of samles Volume ( R) Now let s derve ths formula more formally

Bomal Radom Varable Let us fl a co tmes (each oe s called tral ) Probablty of head ρ, robablty of tal s -ρ Bomal radom varable K couts the umber of heads trals ( ) k( ) k P K k k ρ ρ Mea s where ( K) ρ E Varace s ( ) ( ) var K k ρ ρ! k!( k)!

Desty Estmato: Basc Issues From the defto of a desty fucto, robablty ρ that a vector wll fall rego R s: [ R] ρ Pr ( ' )d' Suose we have samles, 2,, draw from the dstrbuto (). The robablty that k ots fall R s the gve by bomal dstrbuto: Pr [ K k] R k k ( ) ρ ρ Suose that k ots fall R, we ca use MLE to estmate the value of ρ. The lkelhood fucto s ( ρ),..., k k ρ ( ρ) k k

Desty Estmato: Basc Issues ( ρ) k,..., k ρ ( ρ) k Ths lkelhood fucto s mamzed at ρ Thus the MLE s ˆ ρ k Assume that () s cotuous ad that the rego R s so small that () s aromately costat R R ( ') d' ( ) V s R ad V s the volume of R Recall from the revous slde: Thus () ca be aromated: R k ρ ( ') d' ( ) k / V () R

Desty Estmato: Basc Issues Ths s eactly what we had before: ( ) R k / V s sde some rego R k umber of samles sde R total umber of samles V volume of R Our estmate wll always be the average of true desty over R ( ) k / V ρˆ V R ( ') d' Ideally, () should be costat sde R V

Desty Estmato: Hstogram 0 90 6 90 (l) ( ) k / V 90 0 0 20 30 40 50 R R 3 R 2 6/9 ( ) 3/9 ( ) ( ) 0 30 0/9 0 If regos R s do ot overla, we have a hstogram

Desty Estmato: Accuracy How accurate s desty aromato ( )? V We have made two aromatos k. ˆ ρ as creases, ths estmate becomes more accurate 2. R ( ') d' ( ) V as R grows smaller, the estmate becomes more accurate As we shrk R we have to make sure t cotas samles, otherwse our estmated () 0 for all R Thus theory, f we have a ulmted umber of samles, we get covergece as we smultaeously crease the umber of samles, ad shrk rego R, but ot too much so that R stll cotas a lot of samles k /

Desty Estmato: Accuracy k / ( ) V I ractce, the umber of samles s always fed Thus the oly avalable oto to crease the accuracy s by decreasg the sze of R (V gets smaller) If V s too small, ()0 for most, because most regos wll have o samles Thus have to fd a comromse for V ot too small so that t has eough samles but also ot too large so that () s aromately costat sde V

. Desty Estmato: Two Aroaches ( ) 2. k-nearest Neghbors. Parze Wdows: k / V Choose a fed value for volume V ad determe the corresodg k from the data Choose a fed value for k ad determe the corresodg volume V from the data Uder arorate codtos ad as umber of samles goes to fty, both methods ca be show to coverge to the true ()

Parze Wdows ( ) k / V s sde some rego R k umber of samles sde R total umber of samles V volume of R To estmate the desty at ot, smly ceter the rego R at, cout the umber of samles R, ad substtute everythg our formula R ( ) 3 / 6 0

Parze Wdows I Parze-wdow aroach to estmate destes we f the sze ad shae of rego R Let us assume that the rego R s a d-dmesoal hyercube wth sde legth h thus t s volume s h d R R R h h h dmeso 2 dmesos 3 dmesos

Parze Wdows Let u[u, u 2,, u d ] ad defe a wdow fucto u j j,..., d (u) 2 0 otherwse dmeso (u) /2 u 2 s sde u u s 0 outsde /2 2 dmesos

Parze Wdows Recall we have d-dmesoal samles, 2,,. Let j be the jth coordate of samle.the - h j j j ) 2 0 otherwse u u j 2 - ( h R h,..., d - ( h ) f s sde the hyercube wth wdth h ad cetered at 0 otherwse

Parze Wdows How do we cout the total umber of samle ots, 2,, whch are sde the hyercube wth sde h ad cetered at? k k / V Recall ( ), Vh d h Thus we get the desred aalytcal eresso for the estmate of desty () ( ) h d h / h d h

Parze Wdows ( ) h d h Let s make sure () s fact a desty ( ) 0 volume of hyercube ( ) d h d h d h d h d d h h d

Parze Wdows: Eamle D ( ) h d h Suose we have 7 samles D{2,3,4,8,0,,2} 2 () Let wdow wdth h3, estmate desty at () 7 3 3 7 2 2 3 3 + 3 4 + 3 2 +... + 3 () 7 3 7 3 2 / 2 2 3 > / 2 > / 2 3 [ + 0+ 0+... + 0] 2 > 3 / 2

Parze Wdows: Sum of Fuctos Now let s look at our desty estmate () aga: ( ) d h h h d h sde square cetered at 0 otherwse Thus () s just a sum of bo lke fuctos each of heght d h

Parze Wdows: Eamle D Let s come back to our eamle 7 samles D{2,3,4,8,0,,2}, h3 () 2 To see what the fucto looks lke, we eed to geerate 7 boes ad add them u The wdth s h3 ad the heght s d h 2

Parze Wdows: Iterolato I essece, wdow fucto s used for terolato: each samle cotrbutes to the resultg desty at f s close eough to () 2

Parze Wdows: Drawbacks of Hyercube As log as samle ot ad are the same hyercube, the cotrbuto of to the desty at s costat, regardless of how close s to 2 h h 2 The resultg desty () s ot smooth, t has dscotutes ()

Parze Wdows: geeral ( ) h d h We ca use a geeral wdow as log as the resultg () s a legtmate desty,.e.. (u ) 0 satsfed f ( u ) 0 2. ( ) d satsfed f ( u) du (u) 2 (u) d h ( u) du ( )d d d d h h h chage coordate s to u, thus du h d h u

Parze Wdows: geeral ( ) h d h Notce that wth the geeral wdow we are o loger coutg the umber of samles sde R. We are coutg the weghted average of otetally every sgle samle ot (although oly those wth dstace h have ay sgfcat weght) Wth fte umber of samles, ad arorate codtos, t ca stll be show that ( ) ( )

Parze Wdows: Gaussa ( ) h d h A oular choce for s N(0,) desty u ( ) 2 / 2 u e 2ππ (u) u Solves both drawbacks of the bo wdow Pots whch are close to the samle ot receve hgher weght Resultg desty () s smooth

Parze Wdows: Eamle wth Geeral Let s come back to our eamle 7 samles D{2,3,4,8,0,,2}, h ( ) 7 7 ( ) () s the sum of of 7 Gaussas, each cetered at oe of the samle ots, ad each scaled by /7

Parze Wdows: Dd We Solve the Problem? Let s test f we solved the roblem. Draw samles from a kow dstrbuto 2. Use our desty aromato method ad comare wth the true desty We wll vary the umber of samles ad the wdow sze h We wll lay wth 2 dstrbutos N(0,) tragle ad uform mture

Parze Wdows: True Desty N(0,) h h 0.5 h 0. 0

Parze Wdows: True Desty N(0,) 00 h h 0.5 h 0. h h /

Parze Wdows: True desty s Mture of Uform ad Tragle h h 0.5 h 0.2 6

Parze Wdows: True desty s Mture of Uform ad Tragle h h 0.5 h 0.2 256 h h /

Parze Wdows: Effect of Wdow Wdth h By choosg h we are guessg the rego where desty s aromately costat Wthout kowg aythg about the dstrbuto, t s really hard to guess were the desty s aromately costat () h h

Parze Wdows: Effect of Wdow Wdth h If h s small, we suermose shar ulses cetered at the data Each samle ot flueces too small rage of Smoothed too lttle: the result wll look osy ad ot smooth eough If h s large, we suermose broad slowly chagg fuctos, Each samle ot flueces too large rage of Smoothed too much: the result looks oversmoothed or outof-focus Fdg the best h s challegg, ad deed o sgle h may work well May eed to adat h for dfferet samle ots However we ca try to lear the best h to use from our labeled data

Learg wdow wdth h From Labeled Data Dvde labeled data to trag set, valdato set, test set For a rage of dfferet values of h (ossbly usg bary search), costruct desty estmate () usg Parze wdows Test the classfcato erformace o the valdato set for each value of h you tred For the fal desty estmate, choose h gvg the smallest error o the valdato set Now you ca test the erformace of the classfer o the test set Notce we eed valdato set to fd best arameter h, we ca t use test set for ths because test set caot be used for trag I geeral, eed valdato set f our classfer has some tuable arameters

Parze Wdows: Classfcato Eamle I classfers based o Parze-wdow estmato: We estmate the destes for each category ad classfy a test ot by the label corresodg to the mamum osteror The decso rego for a Parze-wdow classfer deeds uo the choce of wdow fucto as llustrated the followg fgure

Parze Wdows: Classfcato Eamle For small eough wdow sze h the classfcato o trag data s erfect However decso boudares are comle ad ths soluto s ot lkely to geeralze well to ovel data For larger wdow sze h, classfcato o trag data s ot erfect However decso boudares are smler ad ths soluto s more lkely to geeralze well to ovel data

Parze Wdows: Summary Advatages Ca be aled to the data from ay dstrbuto I theory ca be show to coverge as the umber of samles goes to fty Dsadvatages Number of trag data s lmted ractce, ad so choosg the arorate wdow sze h s dffcult May eed large umber of samles for accurate estmates Comutatoally heavy, to classfy oe ot we have to comute a fucto whch otetally deeds o all samles ( ) h d h But we eed a lot of samles for accurate desty estmato!