Mache Learg Topc 4: Measurg Dstace Bra Pardo Mache Learg: EECS 349 Fall 2009
Wh measure dstace? Clusterg requres dstace measures. Local methods requre a measure of localt Search eges requre a measure of smlart Bra Pardo Mache Learg: EECS 349 Fall 2009
Dmeso 2 Eucldea Dstace What people tutvel thk of as dstace d ) 2 2 ) 2 2) Dmeso Bra Pardo Mache Learg: EECS 349 Fall 2009
Geeralzed Eucldea Dstace = the umber of dmesos d where ) 2 2...... 2 / 2 } ad ) Bra Pardo Mache Learg: EECS 349 Fall 2009
L p orms L p orms are all specal cases of ths: d ) p / p p chages the orm L orm Mahatta Dstace: p 2 L 2 orm Eucldea Dstace: p 2 Hammg Dstace: p ad 0 Bra Pardo Mache Learg: EECS 349 Fall 2009
Weghtg Dmesos Put pot the cluster wth the closest ceter of gravt Whch cluster should the red pot go? How do I measure dstace a wa that gves the rght aswer for both stuatos? Bra Pardo Mache Learg: EECS 349 Fall 2009
Weghted Norms You ca compesate b weghtg our dmesos. d / ) w p p Ths lets ou tur our crcle of equal-dstace to a elpse wth aes parallel to the dmesos of the vectors. Bra Pardo Mache Learg EECS 349 Fall 2009
Mahalaobs dstace The rego of costat Mahalaobs dstace aroud the mea of a dstrbuto forms a ellpsod. The aes of ths ellpsod do t have to be parallel to the dmesos descrbg the vector Images from: http://www.aaccess.et/eglsh/glossares/glosmod/e_gm_mahalaobs.htm Bra Pardo Mache Learg: EECS 349 Fall 2009 8
Calculatg Mahalaobs d ) ) T S ) Ths matr S - s called the covarace matr ad s calculated from the data dstrbuto Let s look at the demo here: http://www.aaccess.et/eglsh/glossares/glosmod/e_gm_mahalaobs.htm#amato%20mahalaobs Bra Pardo Mache Learg: EECS 349 Fall 2009 9
Take-awa o Mahalaobs Is good for osphercall smmetrc dstrbutos. Accouts for scalg of coordate aes Ca reduce to Eucldea Bra Pardo Mache Learg: EECS 349 Fall 2009 0
What s a metrc? A metrc has these four qualtes. otherwse call t a measure equalt) tragle ) ) ) smmetr) ) ) o - egatve) 0 ) reflev t) ff 0 ) z d z d d d d d d Bra Pardo Mache Learg: EECS 349 Fall 2009
Metrc or ot? Drvg dstace wth -wa streets Categorcal Stuff : Is dstace Jazz to Blues to Rock) o less tha dstace Jazz to Rock)? Bra Pardo Mache Learg: EECS 349 Fall 2009
Categorcal Varables Cosder feature vectors for gere & vocals: Gere: {Blues Jazz Rock Zdeco} Vocals: {vocalso vocals} s = {rock vocals} s2 = {jazz o vocals} s3 = { rock o vocals} Whch two sogs are more smlar? Bra Pardo Mache Learg: EECS 349 Fall 2009
Oe Soluto:Hammg dstace Blues Jazz Rock Zdeco 0 0 0 0 0 0 0 0 0 0 0 Vocals s = {rock vocals} s2 = {jazz o_vocals} s3 = { rock o_vocals} Hammg Dstace = umber of bts dfferet betwee bar vectors Bra Pardo Mache Learg: EECS 349 Fall 2009
Hammg Dstace Bra Pardo Mache Learg: EECS 349 Fall 2009 {0}) ad }...... where ) 2 2 d
Defg our ow dstace a eample) How ofte does artst quote artst? Quote Frequec Beethove Beatles Lz Phar Beethove 7 0 0 Beatles 4 5 0 Lz Phar? 2 Let s buld a dstace measure! Bra Pardo Mache Learg: EECS 349 Fall 2009
Defg our ow dstace a eample) Beethove Beatles Lz Phar Beethove 7 0 0 Beatles 4 5 0 Lz Phar? 2 Quotefrequec Q Dstace d ) f ) value table Q f Q zartsts ) f z) Bra Pardo Mache Learg: EECS 349 Fall 2009
Mssg data What f for some categor o some eamples there s o value gve? Approaches: Dscard all eamples mssg the categor Fll the blaks wth the mea value Ol use a categor the dstace measure f both eamples gve a value Bra Pardo Mache Learg: EECS 349 Fall 2009
Dealg wth mssg data w d w ) ) else are defed ad both f 0 Bra Pardo Mache Learg: EECS 349 Fall 2009
Edt Dstace Quer = strg from fte alphabet Target = strg from fte alphabet Cost of Edts = Dstace Target: C A G E D - - Quer: C E A E D
Oe more dstace measure Kullback Lebler dvergece Related to etrop & formato ga ot a metrc sce t s ot smmetrc Take EECS 428:Iformato Theor to fd out more Bra Pardo Mache Learg: EECS 349 Fall 2009 2