Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Lecure Sdes for INTRODUCTION TO Machne Learnng ETHEM ALPAYDIN The MIT Press, 2004 aaydn@boun.edu.r h://www.cme.boun.edu.r/~ehem/2m

CHAPTER 7: Cuserng

Semaramerc Densy Esmaon Paramerc: Assume a snge mode for ( C ) (Chaer 4 and 5) Semaramerc: ( C ) s a mure of denses Mue ossbe eanaons/rooyes: Dfferen handwrng syes, accens n seech Nonaramerc: No mode; daa seaks for sef (Chaer 8) Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 3

Mure Denses k ( ) ( G ) P( G ) 1 where G he comonens/grous/cusers, P ( G ) mure roorons (rors), ( G ) comonen denses Gaussan mure where ( G ) ~ N ( µ, ) arameers Φ {P ( G ), µ, } k 1 unabeed same X{ } (unsuervsed earnng) Lecure Noes for E Aaydın 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 4

Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 5 Casses vs. Cusers Suervsed: X {,r } Casses C 1,...,K where ( C ) ~ N ( µ, ) Φ {P (C ), µ, } K 1 Unsuervsed : X { } Cusers G 1,...,k where ( G ) ~ N ( µ, ) Φ {P ( G ), µ, } k 1 Labes, r? ( ) ( ) ( ) k P 1 G G ( ) ( ) ( ) K P 1 C C ( ) ( )( ) T r r r r N r C Pˆ m m m S

k-means Cuserng Fnd k reference vecors (rooyes/codebook vecors/codewords) whch bes reresen daa Reference vecors, m, 1,...,k Use neares (mos smar) reference: Reconsrucon error E b m Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) ({ } ) k m X 1 1 0 f mn m oherwse m b mn m m 6

Encodng/Decodng b 1 f m mn m 0 oherwse Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 7

k-means Cuserng Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 8

Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 9

Eecaon-Mamzaon (EM) Log kehood wh a mure mode L ( ) ( Φ X og Φ) ( G ) P( G ) Assume hdden varabes z, whch when known, make omzaon much smer Comee kehood, L c (Φ X,Z), n erms of and z Incomee kehood, L(Φ X), n erms of og k 1 Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 10

E- and M-ses Ierae he wo ses 1. E-se: Esmae z gven X and curren Φ 2. M-se: Fnd new Φ gven z, X, and od Φ. E - se : Q M - se : Φ ( ) [ ( ) ] Φ Φ E LC Φ X,Z X, Φ + 1 ( arg ma Q Φ Φ ) Φ An ncrease n Q ncreases ncomee kehood ( + Φ 1 X ) L( Φ X ) L Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 11

Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 12 EM n Gaussan Mures z 1 f beongs o G, 0 oherwse (abes r of suervsed earnng); assume ( G )~N(µ, ) E-se: M-se: Use esmaed abes n ace of unknown abes [ ] ( ) ( ) ( ) ( ) ( ) h P P P, z E Φ Φ Φ Φ, G G, G G G, X ( ) ( )( ) + + + + T h h h h N h P 1 1 1 1 m m m S G

P(G 1 )h 1 0.5 Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 13

Mures of Laen Varabe Modes Reguarze cusers 1. Assume shared/dagona covarance marces 2. Use PCA/FA o decrease dmensonay: Mures of PCA/FA ( ) ( T G N m V V + ψ), Can use EM o earn V (Ghahraman and Hnon, 1997; Tng and Bsho, 1999) Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 14

Afer Cuserng Dmensonay reducon mehods fnd correaons beween feaures and grou feaures Cuserng mehods fnd smares beween nsances and grou nsances Aows knowedge eracon hrough number of cusers, ror robabes, cuser arameers,.e., cener, range of feaures. Eame: CRM, cusomer segmenaon Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 15

Cuserng as Prerocessng Esmaed grou abes h (sof) or b (hard) may be seen as he dmensons of a new k dmensona sace, where we can hen earn our dscrmnan or regressor. Loca reresenaon (ony one b s 1, a ohers are 0; ony few h are nonzero) vs Dsrbued reresenaon (Afer PCA; a z are nonzero) Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 16

Mure of Mures In cassfcaon, he nu comes from a mure of casses (suervsed). If each cass s aso a mure, e.g., of Gaussans, (unsuervsed), we have a mure of mures: k ( C ) ( ) ( ) G P G 1 K ( ) ( C ) ( ) P C 1 Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 17

Herarchca Cuserng Cuser based on smares/dsances Dsance measure beween nsances r and s Mnkowsk (L ) (Eucdean for 2) d m [ ] 1/ ( r s ) d ( r s, ) 1 Cy-bock dsance d cb ( r s ) d r, 1 s Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 18

Aggomerave Cuserng Sar wh N grous each wh one nsance and merge wo coses grous a each eraon Dsance beween wo grous G and G : Snge-nk: d Comee-nk: d Average-nk, cenrod ( ) ( r s G, G mn d ), r s G, G ( ) ( r s G, G ma d ), r s G, G Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 19

Eame: Snge-Lnk Cuserng Dendrogram Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 20

Choosng k Defned by he acaon, e.g., mage quanzaon Po daa (afer PCA) and check for cusers Incremena (eader-cuser) agorhm: Add one a a me un ebow (reconsrucon error/og kehood/nergrou dsances) Manua check for meanng Lecure Noes for E ALPAYDIN 2004 Inroducon o Machne Learnng The MIT Press (V1.1) 21