ETHEM ALPAYDIN The MIT Press, 2014 Lecure Sdes for INTRODUCTION TO MACHINE LEARNING 3RD EDITION aaydn@boun.edu.r h://www.ce.boun.edu.r/~ehe/23e
CHAPTER 7: CLUSTERING
Searaerc Densy Esaon 3 Paraerc: Assue a snge ode for ( C ) (Chaers 4 and 5) Searaerc: ( C ) s a ure of denses Mue ossbe eanaons/rooyes: Dfferen handwrng syes, accens n seech Nonaraerc: No ode; daa seaks for sef (Chaer 8)
Mure Denses 4 k G P G 1 where G he coonens/grous/cusers, P ( G ) ure roorons (rors), ( G ) coonen denses Gaussan ure where ( G ) ~ N ( μ, ) araeers Φ = {P ( G ), μ, } k =1 unabeed sae X={ } (unsuervsed earnng)
Casses vs. Cusers Suervsed: X = {,r } Casses C =1,...,K where ( C ) ~ N(μ, ) Φ = {P (C ), μ, } K =1 Unsuervsed : X = { } Cusers G =1,...,k where ( G ) ~ N ( μ, ) Φ = {P ( G ), μ, } k =1 Labes r? 5 k G P G 1 K P 1 C C T r r r r N r C P S ˆ
Fnd k reference vecors (rooyes/codebook vecors/codewords) whch bes reresen daa Reference vecors,, =1,...,k Use neares (os sar) reference: Reconsrucon error k-means Cuserng 6 n oherwse n f 0 1 1 k b b E X
Encodng/Decodng 7 b 1 f n 0 oherwse
8 k-eans Cuserng
9
Eecaon-Mazaon (EM) 10 Log kehood wh a ure ode L X og og k G P G 1 Assue hdden varabes z, whch when known, ake ozaon uch ser Coee kehood, L c (Φ X,Z), n ers of and z Incoee kehood, L(Φ X), n ers of
E- and M-ses 11 Ierae he wo ses 1. E-se: Esae z gven X and curren Φ 2. M-se: Fnd new Φ gven z, X, and od Φ. E - se: Q M- se: EL C X,Z 1 argaq X, An ncrease n Q ncreases ncoee kehood 1 X L X L
z = 1 f beongs o G, 0 oherwse (abes r of suervsed earnng); assue ( G )~N(μ, ) E-se: M-se: EM n Gaussan Mures 12 h G P G P G G P G z E,,, X, T h h h h N h P 1 1 1 1 S G Use esaed abes n ace of unknown abes
13 P(G 1 )=h 1 =0.5
Mures of Laen Varabe Modes 14 Reguarze cusers 1. Assue shared/dagona covarance arces 2. Use PCA/FA o decrease densonay: Mures of PCA/FA T N,V V ψ G Can use EM o earn V (Ghahraan and Hnon, 1997; Tng and Bsho, 1999)
Afer Cuserng 15 Densonay reducon ehods fnd correaons beween feaures and grou feaures Cuserng ehods fnd sares beween nsances and grou nsances Aows knowedge eracon hrough nuber of cusers, ror robabes, cuser araeers,.e., cener, range of feaures. Eae: CRM, cusoer segenaon
Cuserng as Prerocessng 16 Esaed grou abes h (sof) or b (hard) ay be seen as he densons of a new k densona sace, where we can hen earn our dscrnan or regressor. Loca reresenaon (ony one b s 1, a ohers are 0; ony few h are nonzero) vs Dsrbued reresenaon (Afer PCA; a z are nonzero)
Mure of Mures 17 In cassfcaon, he nu coes fro a ure of casses (suervsed). If each cass s aso a ure, e.g., of Gaussans, (unsuervsed), we have a ure of ures: k C G P G 1 K C PC 1
Secra Cuserng 18 Cuser usng redefned arwse sares B rs nsead of usng Eucdean or Mahaanobs dsance Can be used even f nsances no vecoray reresened Ses: I. Use Laacan Egenas (chaer 6) o a o a new z sace usng B rs II. Use k-eans n hs new z sace for cuserng
Herarchca Cuserng 19 Cuser based on sares/dsances Dsance easure beween nsances r and s Mnkowsk (L ) (Eucdean for = 2) d Cy-bock dsance r s d r s, 1 1/ d cb r s d r, 1 s
Aggoerave Cuserng 20 Sar wh N grous each wh one nsance and erge wo coses grous a each eraon Dsance beween wo grous G and G : Snge-nk: d r s G, G n d, r s G, G Coee-nk: d r s G, G a d, r s G, G Average-nk, cenrod d r s G, G ave d, r s G, G
Eae: Snge-Lnk Cuserng 21 Dendrogra
Choosng k 22 Defned by he acaon, e.g., age quanzaon Po daa (afer PCA) and check for cusers Increena (eader-cuser) agorh: Add one a a e un ebow (reconsrucon error/og kehood/nergrou dsances) Manuay check for eanng