Grou M D L M Chater 4 No-Parameter Estmato X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty
Cotets Itroducto Parze Wdows K-Nearest-Neghbor Estmato Classfcato Techques The Nearest-Neghbor rule(-nn The Nearest-Neghbor rule(k-nn Dstace Metrcs X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 2
Bayes Rule for Classfcato To comute the osteror robablty, we eed to kow the ror robablty ad the lkelhood. Case I: has certa arametrc form Mamum-Lkelhood Estmato Bayesa Parameter Estmato Problems: P( ( ( ( ( The assumed arametrc form may ot ft the groudtruth desty ecoutered ractce, e.g., assumed arametrc form: umodal; groud-truth: multmodal X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 3
No-Parameter Estmato Case II: does t have arametrc form How? ( Let the data seak for themselves! Parze Wdows K -Nearest-Neghbor X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 4
Goals Estmate class-codtoal destes ( Estmate osteror robabltes P( X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 5
Desty Estmato Assume ( s cotuous, ad R s small Fudametal fact The robablty of a vector fall to a rego R: P( XR ( ' d' R (V R P R ( d' Gve eamles (..d. {, 2,, }, let K deote the radom varable reresetg umber of samles fallg to R, K wll take Bomal dstrbuto: k K ~ B(, PR P( K k PR ( PR k X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 6 R k + R samles
Desty Estmato Assume ( s cotuous, ad R s small Fudametal fact The robablty of a vector fall to a rego R: P( XR ( ' d' R (V R P R ( d' ( V R P R E[ K]/ ( E[ K] P R VR Let k R deote the actual umber of samles R X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 7 ( R k R V / R + R samles
Desty Estmato Use subscrt to take samle sze to accout k / V We hoe that: lm ( To do ths, we should have lmv 0 ( k ( R V / R ( + R lm k lm k / 0 samles X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 8
Desty Estmato ( k / V What tems ca be cotrolled? How? F V ad determe k Parze Wdows F k ad determe V k -Nearest-Neghbor X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 9
Parze Wdows ( k / V F V ad determe k Assume R s a d-dmesoal hyercube The legth of each edge s h d V h Determe k wth wdow fucto a.k.a. kerel fucto, otetal fucto. Emauel Parze (929- X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 0
Wdow fucto It defes a ut hyercube cetered at the org. h h / 2 0 otherwse h h h X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty
X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 2 Wdow fucto meas that falls wth the hyercube of volume v cetered at. k : # samles sde the hyercube cetered at, otherwse 0 2 / j j h h h h h h k
X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 3 Parze Wdow Estmato s ot lmted to be the hyercube wdow fucto defed revously. It could be ay df fucto h k V k / ( h V ( (u ( u u d Parze df
Parze Wdow Estmato ( V h ( s a df fucto? Set (- /h =u. V h d d V h u du Wdow fucto Beg df Wdow wdth Trag data Parze df X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 4
X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 5 Parze Wdow Estmato (: suerosto ( 叠加 of terolatos ( 插值 : cotrbutes to ( based o ts dstace from. h V ( h V ( ( - ( What s the effect of h (wdow wdth o the Parze df?
X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 6 Parze Wdow Estmato The effect of h d h h h V ( Affects the wdth (horzotal scale Affects the amltude (vertcal scale
Parze Wdow Estmato ( V h Suose φ(. beg a 2-d Gaussa df. The shae of δ ( wth decreasg values of h X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 7
Parze Wdow Estmato ( ( ( - V h Whe h s very large, δ ( wll be broad wth small amltude. P ( wll be the suerosto of broad, slowly chagg fuctos,.e., beg smooth wth low resoluto. Whe h s very small, δ ( wll be shar wth large amltude. P ( wll be the suerosto of shar ulses,.e., beg varable/ustable wth hgh resoluto. X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 8
X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 9 Parze Wdow Estmato Paze wdow estmatos for fve samles, suosg that φ(. s a 2-d Gaussa df. h V ( ( - (
Parze Wdow Estmato Covergece codtos To esure covergece,.e., lm E[ ( ] ( lmvar[ ( ] 0 We have the followg addtoal costrats: su( u u lmv 0 lm ( u u d u 0 lm V X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 20
X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 2 Illustratos Oe dmeso case: 2 / 2 2 ( e u u h h ( h h / h h / ~ N(0, X
X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 22 Illustratos Oe dmeso case: h h ( h h / h h / 2 / 2 2 ( e u u
X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 23 Illustratos Two dmeso case: h h / h h / h h 2 (
Classfcato Eamle Smaller wdow Larger wdow X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 24
Choosg Wdow Fucto V must aroach zero whe, but at a rate slower tha /, e.g., V V / The value of tal volume V s mortat. I some cases, a cell volume s roer for oe rego but usutable a dfferet rego. X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 25
k -Nearest Neghbor ( k / V F k ad the determe V To estmate (, we ca ceter a cell about ad let t grow utl t catures k samles, k s some secfed fucto of, e.g., k Prcled rule to choose k lm k lmv 0 X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 26
k -Nearest Neghbor Eght ots oe dmeso(=8, d= Red curve: k =3 Black curve: k =5 Thrty-oe ots two dmesos ( = 3, d=2 Black surface: k =5 X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 27
X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 28 Estmato of A Posteror robablty P ( =? c j j P, (, ( ( V k /, ( c j j V k /, ( k k
X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 29 Estmato of A Posteror robablty P ( =? c j j P, (, ( ( V k /, ( c j j V k /, ( k k The value of V or k ca be determed base o Parze wdow or k -earest-eghbor techque.
Nearest Neghbor Classfer Store all trag eamles Gve a ew eamle to be classfed, search for the trag eamle (, y whose s most smlar (or closest to, ad redct y. (Lazy Learg (, k P ( c k (, j j X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 30
Decso Boudares Decso Boudares The voroo dagram Gve a set of ots, a Voroo dagram descrbes the areas that are earest to ay gve ot. These areas ca be vewed as zoes of cotrol. X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 3
Decso Boudares Decso boudary s formed by oly retag these le segmet searatg dfferet classes. The more trag eamles we have stored, the more comle the decso boudares ca become. X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 32
Decso Boudares Wth large umber of eamles ad ose the labels, the decso boudary ca become asty! It ca be bad some tmes-ote the slads ths fgure, they are formed because of osy eamles. If the earest eghbor haes to be a osy ot, the redcto wll be correct. How to deal wth ths? X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 33
Effect of k Dfferet k values gve dfferet results: Large k roduces smoother boudares The mact of class label oses caceled out by oe aother. Whe k s too large, what wll hae. Oversmlfed boudares, e.g., k=n, we always redct the majorty class X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 34
How to Choose k? Ca we choose k to mmze the mstakes that we make o trag eamles? (trag error What s the trag error of earest-eghbor? Ca we choose k to mmze the mstakes that we make o test eamles? (test error X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 35
How to Choose k? How do trag error ad test error chage as we chage the value ok k? X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 36
Model Selecto Choosg k for k-nn s just oe of the may model selecto roblems we face mache learg. Model selecto s about choosg amog dfferet models Lear regresso vs. quadratc regresso K-NN vs. decso tree Heavly studed mache learg, crucal mortace ractce. If we use trag error to select models, we wll always choose more comle oes. X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 37
Model Selecto Choosg k for k-nn s just oe of the may model selecto roblems we face mache learg. Model selecto s about choosg amog dfferet models Lear regresso vs. quadratc regresso K-NN vs. decso tree Heavly studed mache learg, crucal mortace ractce. If we use trag error to select models, we wll always choose more comle oes. X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 38
Model Selecto We ca kee art of the labeled data aart as valdato data. Evaluate dfferet k values based o the redcto accuracy o the valdato data Choose k that mmze valdato error Valdato ca be vewed as aother ame for testg, but the ame testg s tycally reserved for fal evaluato urose, whereas valdato s mostly used for model selecto urose. X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 39
Model Selecto The mact of valdato set sze If we oly reserve oe ot our valdato set, should we trust the valdato error as a relable estmate of our classfer s erformace? The larger the valdato set, the more relable our model selecto choces are Whe the total labeled set s small, we mght ot be able to get a bg eough valdato set leadg to urelable model selecto decsos X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 40
Model Selecto K-fold Cross Valdato Perform learg/testg K tmes Each tme reserve oe subset for valdato set, tra o the rest Secal case: Learve oe-out crass valdato X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 4
Other ssues of knn It ca be comutatoally eesve to fd the earest eghbors! Seed u the comutato by usg smart data structures to quckly search for aromate solutos For large data set, t requres a lot of memory Remove umortat eamles X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 42
Fal words o KNN KNN s what we call lazy learg (vs. eager learg Lazy: learg oly occur whe you see the test eamle Eager: lear a model before you see the test eamle, trag eamles ca be throw away after learg Advatage: Cocetually smle, easy to uderstad ad ela Very fleble decso boudares Not much learg at all! Dsadvatage It ca be hard to fd a good dstace measure Irrelevat features ad ose ca be very detrmetal Tycally ca ot hadle more tha 30 attrbutes Comutatoal cost: requres a lot comutato ad memory X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 43
Dstace Metrcs Dstace Measuremet s a mortace factor for earest-eghbor classfer, e.g., To acheve varat atter recogto ad data mg results. The effect of chage uts X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 44
Dstace Metrcs Dstace Measuremet s a mortace factor for earest-eghbor classfer, e.g., To acheve varat atter recogto ad data mg results. The effect of chage uts X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 45
Proertes of a Dstace Metrc Noegatvty Reflevty Symmetry Tragle Iequalty D( a, b 0 D( a, b 0 ff a D( a, b D( b, a b D( a, b D( b, c D( a, c X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 46
Mkowsk Metrc (L Norm. L orm Mahatta or cty block dstace 2. L 2 orm Eucldea dstace 3. L orm Chessboard dstace L L d / L d ( a, b a b a b ( d 2 2 a, b a b 2 a b ( a, b a b d a b / / 2 ma( a b X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 47
Mkowsk Metrc (L Norm. L orm Mahatta or cty block dstace 2. L 2 orm Eucldea dstace 3. L orm Chessboard dstace L L d / L d ( a, b a b a b ( d 2 2 a, b a b 2 a b ( a, b a b d a b / / 2 ma( a b X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 48
Summary Basc settg for o-arametrc techques Let the data seak for themselves Parametrc form ot assumed for class-codtoal df Estmate class-codtoal df from trag eamles Make redctos based o Bayes Theorem Fudametal results desty estmato ( k / V X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 49
Summary ( k / V Parze Wdows F V ad the determe k ( V h Wdow fucto Beg df Wdow wdth Trag data Parze df X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 50
Summary k -Nearest-Neghbor F k ad the determe V F k ad the determe V To estmate (, we ca ceter a cell about ad let t grow utl t catures k samles, where s some secfed fucto of, e.g., k Prcled rule to choose k lm k lmv 0 X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 5
Grou Ay Questo? X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty