Soft k-means Clustering. Comp 135 Machine Learning Computer Science Tufts University. Mixture Models. Mixture of Normals in 1D

Comp 35 Machn Larnng Computr Scnc Tufts Unvrsty Fall 207 Ron Khardon Th EM Algorthm Mxtur Modls Sm-Suprvsd Larnng Soft k-mans Clustrng ck k clustr cntrs : Assocat xampls wth cntrs p,j ~~ smlarty b/w cntr and x j R-calculat mans as wghtd avrag of xampls n clustr Untl convrgnc Mxtur Modls Motvatd by soft k-mans W dvlop a gnratv modl for clustrng: Assum thr ar k clustrs Clustrs ar not rqurd to hav th sam numbr of ponts And not rqurd to hav th sam shap Mxtur of Normals n D Rpat for =,...,N ck clustr Id z from dscrt dstrbuton wth paramtrs p,p 2,...,p k Not: z 2 {, 2,...,k} ck th xampl x from normal dstrbuton wth paramtrs µ z, z Exampl: whn z =3usngµ 3 and 3 Gvn a datast gnratd by ths procss th clustrng task s to dntfy th paramtrs {p j,µ j, j} j =,...,k Mxtur of Normals n D Maxmum lklhood stmaton Rpat for =,...,N ck clustr Id z from dscrt dstrbuton wth paramtrs p,p 2,...,p k Not: z 2 {, 2,...,k} ck th xampl x from normal dstrbuton wth paramtrs µ z, z Exampl: whn z =3usngµ 3 and 3 To smplfy analyss n class w assum 8j, p j =/k and 8j, j =, ar known and that th x ar dmnsonal Frst analyz assumng z ar known Convnnt notaton: rprsnt th numbr z as a unt vctor bt squnc Exampl: k=4 z =) 000 z =2) 000 z =3) 000 z =4) 000 Notaton: z,j s j th bt wthn z z =2) 000 ) z,2 = z,3 =0

Maxmum lklhood stmaton Frst analyz assumng z ar known Th Complt Data ncluds all th x,z Maxmum lklhood stmaton Th Lklhood p(z )p(x z,µ z ) Data =(x,z ), (x 2,z 2 ),...,(x N,z N ) L = Y = Y (/k) p 2 2 (x µz )2 = Y (/k) p 2 2 j z,j(x µj)2 Notaton trck: xactly on trm rmans from th sum! Maxmum lklhood stmaton Maxmum lklhood stmaton L = Y (/k) p LogL = const @LogL µ j =...=0 ) µ j = z,j x z,j 2 2 j z,j(x j µj)2 X X 2 2 z,j (x µ j ) 2 Ths s not surprsng. Why? Frst analyz assumng z ar known Th Complt Data ncluds all th x,z Data =(x,z ), (x 2,z 2 ),...,(x N,z N ) Th Obsrvd Data ncluds all th Data = x,x 2...,x N à Cannot us prvous stmat. What s th lklhood n ths cas? x Maxmum lklhood stmaton Th EM Algorthm Th Obsrvd Data ncluds z all th Data = x,x 2...,x N Maxmum lklhood prscrbs that w should optmz: p(obsrvd) = p(x,...,x N ) = X X X... p(x,...,x N,z,...,z N ) z z 2 z N Th Equaton for th lklhood nds to sum out (margnalz) ovr th z. No smpl closd form. x A gnral algorthm for maxmzng lklhood whn w hav hddn random varabls Th algorthm has a smpl form whn appld to mxtur modls W wll constran ourslvs to that smpl form And wll mnton th gnral schm of th EM algorthm brfly 2

Th EM Algorthm EM s an tratv algorthm Intalz probablty modl p us p to calculat an mprovd modl p St p =p Untl no furthr mprovmnt EM Algorthm for Mxtur Modls [E] Calculat usng p f,j = E p(z X,{µ 0`})[z,j ]=p(z = j {µ 0`},Data) [M] Estmat p paramtrs usng max lklhood soluton of th complt data by rplacng th unknown z,j by f,j EM for Mxturs n D EM for Mxturs n D [E] Calculat f,j = E p(z X,{µ 0`})[z,j ]=p(z = j {µ 0`},Data) f,j = p((z = j) and x ) p(x ) = p((z = j) and x ) ` p((z = `) and x ) = (/k) p `(/k) p 2 2 (x µ0 j )2 2 2 (x µ0`)2 [M] Estmat paramtrs usng max lklhood rplacng th unknown z,j by µ j = z,j x z ),j µ 00 j = f,j x f,j f,j Frst part holds for any mxtur modl. EM for Mxturs n D [E] Calculat for all,j f,j = (/k) p `(/k) p [M] Calculat for all j µ 00 j = f,j x f,j 2 2 (x µ0 j )2 2 2 (x µ0`)2 Gnral form of EM Dfn an auxlary functon Q(p,p ) Rlatv to obsrvd varabls O and hddn varabls H Q(p 0,p 00 )=E p0 (H O)[log p 00 (H, O)] Assgn for all j: µ 0 j = µ 00 j 3

Th EM Algorthm EM s an tratv algorthm Th EM Algorthm EM s an tratv algorthm Intalz probablty modl p us p to calculat an mprovd modl p St p =p Untl no furthr mprovmnt Intalz probablty modl p ck p so as to maxmz Q(p,p ) St p =p Untl no furthr mprovmnt EM Algorthm for Mxtur Modls [E] Calculat usng p f,j = E p(z X,{µ 0`})[z,j ]=p(z = j {µ 0`},Data) [M] Estmat p paramtrs usng max lklhood rplacng th unknown z,j by Usng th sam mthodology on any mxtur modl (not just Gaussan) ylds th sam tmplat. f,j Sm-Suprvsd Naïv Bays Modl Naïv Bays: robablstc modl wth strong smplfyng assumptons Illustratng applcaton: txt catgorzaton whr w hav data for (documnt,labl ) What f w hav many documnts but labls for only a fw of thm? Can th unlabld documnts hlp? Sm-Suprvsd Naïv Bays Modl What f w hav many documnts but labls for only a fw of thm? Can th unlabld documnts hlp? Bfor xplorng ths quston w wll dvlop th EM algorthm for ths modl whr th labls ar not known Rcall: Naïv Bays Modl Each class nducs a dstrbuton ovr faturs. Faturs ar condtonally ndpndnt gvn th class In ths slds I us th modl wth bnary faturs 4

Rcall: Naïv Bays Modl Rcall: Maxmum Lklhood p(z = j) =p j p(x,` = class j) =q j,` p(x class j) = Ỳ q x,` j,` ( q ( x,`) p(z = j and x )=p j Y p(z and x )= Y j " ` p j Y ` q x,` j,` ( q ( x,`) q x,` j,` ( q ( x,`) # z,j p j = p(z = j) = numbr of xampls wth class j numbr of xampls q j,` = p(x,` = z = j) = num of x wth class j and x,` = numbr of xampls wth class j Naïv Bays as Mxtur Modl EM Algorthm Rpat for =,...,N ck clustr Id z from dscrt dstrbuton wth paramtrs p,p 2,...,p k ck th xampl x from Nav Bays dstrbuton wth paramtrs q z Complt Data Lklhood L = Y " Y Y p j q x,` j,` ( j ` q ( x,`) Log Lklhood # z,j LogL = X X z,j (log p j + X` x,` log q j,` +( x,`) log( q ) j EM Algorthm EM Algorthm Maxmum Lklhood for complt data LogL = X X z,j (log p j + X` x,` log q j,` +( x,`) log( q ) j [w alrady solvd ths a fw lcturs ago] p j = z,j N q j,` = z,jx,` z,j E Stp: Calculatng f,j f,j = E p 0 (Z X)[z,j ]= p0 (z = j and x ) c p0 (z = c and x ) = p0 j Q` q0x,` j,` ( q0 ( x,`) c p0 c Q` q0x,` c,` ( q0 ( x,`) c,`) 5

EM Algorthm for Naïv Bays Calculat: Calculat: f,j = p 00 j = f,j N q 00 j,` = p0 j Q` q0x,` c p0 c f,jx,` f,j Assgn: p ßp and q ß q j,` ( q0 ( x,`) Q` q0x,` c,` ( q0 ( x,`) c,`) Sm-Suprvsd Naïv Bays Modl Naïv Bays for txt catgorzaton What f w hav many documnts but labls for only a fw of thm? Can th unlabld documnts hlp? Us EM: for xampls whr z s known us f,j =z,j nstad of stmatng t Nothng ls changs n th algorthm! 20 nwgroups data 20 nwgroups data 00% 90% 80% 0000 unlabld documnts No unlabld documnts 00% 90% 80% 3000 labld documnts 600 labld documnts 300 labld documnts 40 labld documnts 40 labld documnts 70% 70% Accuracy 60% 50% 40% Accuracy 60% 50% 40% 30% 30% 20% 20% 0% 0% 0% 0 20 50 00 200 500 000 2000 5000 Numbr of Labld Documnts [From Ngam t all MLJ 999.] 0% 0 000 3000 5000 7000 9000 000 3000 Numbr of Unlabld Documnts [From Ngam t all MLJ 999.] Summary EM s a gnral algorthmc framwork for nfrnc wth hddn random varabls It taks a smpl form for mxtur modls altrnatng btwn stmatng fractonal mmbrshps and usng ths n maxmum lklhood calculatons. Gnral drvaton through th Q(p,p ) functon s applcabl n mor complx modls. Mxtur modl asly gnralzs to captur sm-suprvsd larnng 6