Comp 35 Machn Larnng Computr Scnc Tufts Unvrsty Fall 207 Ron Khardon Th EM Algorthm Mxtur Modls Sm-Suprvsd Larnng Soft k-mans Clustrng ck k clustr cntrs : Assocat xampls wth cntrs p,j ~~ smlarty b/w cntr and x j R-calculat mans as wghtd avrag of xampls n clustr Untl convrgnc Mxtur Modls Motvatd by soft k-mans W dvlop a gnratv modl for clustrng: Assum thr ar k clustrs Clustrs ar not rqurd to hav th sam numbr of ponts And not rqurd to hav th sam shap Mxtur of Normals n D Rpat for =,...,N ck clustr Id z from dscrt dstrbuton wth paramtrs p,p 2,...,p k Not: z 2 {, 2,...,k} ck th xampl x from normal dstrbuton wth paramtrs µ z, z Exampl: whn z =3usngµ 3 and 3 Gvn a datast gnratd by ths procss th clustrng task s to dntfy th paramtrs {p j,µ j, j} j =,...,k Mxtur of Normals n D Maxmum lklhood stmaton Rpat for =,...,N ck clustr Id z from dscrt dstrbuton wth paramtrs p,p 2,...,p k Not: z 2 {, 2,...,k} ck th xampl x from normal dstrbuton wth paramtrs µ z, z Exampl: whn z =3usngµ 3 and 3 To smplfy analyss n class w assum 8j, p j =/k and 8j, j =, ar known and that th x ar dmnsonal Frst analyz assumng z ar known Convnnt notaton: rprsnt th numbr z as a unt vctor bt squnc Exampl: k=4 z =) 000 z =2) 000 z =3) 000 z =4) 000 Notaton: z,j s j th bt wthn z z =2) 000 ) z,2 = z,3 =0
Maxmum lklhood stmaton Frst analyz assumng z ar known Th Complt Data ncluds all th x,z Maxmum lklhood stmaton Th Lklhood p(z )p(x z,µ z ) Data =(x,z ), (x 2,z 2 ),...,(x N,z N ) L = Y = Y (/k) p 2 2 (x µz )2 = Y (/k) p 2 2 j z,j(x µj)2 Notaton trck: xactly on trm rmans from th sum! Maxmum lklhood stmaton Maxmum lklhood stmaton L = Y (/k) p LogL = const @LogL µ j =...=0 ) µ j = z,j x z,j 2 2 j z,j(x j µj)2 X X 2 2 z,j (x µ j ) 2 Ths s not surprsng. Why? Frst analyz assumng z ar known Th Complt Data ncluds all th x,z Data =(x,z ), (x 2,z 2 ),...,(x N,z N ) Th Obsrvd Data ncluds all th Data = x,x 2...,x N à Cannot us prvous stmat. What s th lklhood n ths cas? x Maxmum lklhood stmaton Th EM Algorthm Th Obsrvd Data ncluds z all th Data = x,x 2...,x N Maxmum lklhood prscrbs that w should optmz: p(obsrvd) = p(x,...,x N ) = X X X... p(x,...,x N,z,...,z N ) z z 2 z N Th Equaton for th lklhood nds to sum out (margnalz) ovr th z. No smpl closd form. x A gnral algorthm for maxmzng lklhood whn w hav hddn random varabls Th algorthm has a smpl form whn appld to mxtur modls W wll constran ourslvs to that smpl form And wll mnton th gnral schm of th EM algorthm brfly 2
Th EM Algorthm EM s an tratv algorthm Intalz probablty modl p us p to calculat an mprovd modl p St p =p Untl no furthr mprovmnt EM Algorthm for Mxtur Modls [E] Calculat usng p f,j = E p(z X,{µ 0`})[z,j ]=p(z = j {µ 0`},Data) [M] Estmat p paramtrs usng max lklhood soluton of th complt data by rplacng th unknown z,j by f,j EM for Mxturs n D EM for Mxturs n D [E] Calculat f,j = E p(z X,{µ 0`})[z,j ]=p(z = j {µ 0`},Data) f,j = p((z = j) and x ) p(x ) = p((z = j) and x ) ` p((z = `) and x ) = (/k) p `(/k) p 2 2 (x µ0 j )2 2 2 (x µ0`)2 [M] Estmat paramtrs usng max lklhood rplacng th unknown z,j by µ j = z,j x z ),j µ 00 j = f,j x f,j f,j Frst part holds for any mxtur modl. EM for Mxturs n D [E] Calculat for all,j f,j = (/k) p `(/k) p [M] Calculat for all j µ 00 j = f,j x f,j 2 2 (x µ0 j )2 2 2 (x µ0`)2 Gnral form of EM Dfn an auxlary functon Q(p,p ) Rlatv to obsrvd varabls O and hddn varabls H Q(p 0,p 00 )=E p0 (H O)[log p 00 (H, O)] Assgn for all j: µ 0 j = µ 00 j 3
Th EM Algorthm EM s an tratv algorthm Th EM Algorthm EM s an tratv algorthm Intalz probablty modl p us p to calculat an mprovd modl p St p =p Untl no furthr mprovmnt Intalz probablty modl p ck p so as to maxmz Q(p,p ) St p =p Untl no furthr mprovmnt EM Algorthm for Mxtur Modls [E] Calculat usng p f,j = E p(z X,{µ 0`})[z,j ]=p(z = j {µ 0`},Data) [M] Estmat p paramtrs usng max lklhood rplacng th unknown z,j by Usng th sam mthodology on any mxtur modl (not just Gaussan) ylds th sam tmplat. f,j Sm-Suprvsd Naïv Bays Modl Naïv Bays: robablstc modl wth strong smplfyng assumptons Illustratng applcaton: txt catgorzaton whr w hav data for (documnt,labl ) What f w hav many documnts but labls for only a fw of thm? Can th unlabld documnts hlp? Sm-Suprvsd Naïv Bays Modl What f w hav many documnts but labls for only a fw of thm? Can th unlabld documnts hlp? Bfor xplorng ths quston w wll dvlop th EM algorthm for ths modl whr th labls ar not known Rcall: Naïv Bays Modl Each class nducs a dstrbuton ovr faturs. Faturs ar condtonally ndpndnt gvn th class In ths slds I us th modl wth bnary faturs 4
Rcall: Naïv Bays Modl Rcall: Maxmum Lklhood p(z = j) =p j p(x,` = class j) =q j,` p(x class j) = Ỳ q x,` j,` ( q ( x,`) p(z = j and x )=p j Y p(z and x )= Y j " ` p j Y ` q x,` j,` ( q ( x,`) q x,` j,` ( q ( x,`) # z,j p j = p(z = j) = numbr of xampls wth class j numbr of xampls q j,` = p(x,` = z = j) = num of x wth class j and x,` = numbr of xampls wth class j Naïv Bays as Mxtur Modl EM Algorthm Rpat for =,...,N ck clustr Id z from dscrt dstrbuton wth paramtrs p,p 2,...,p k ck th xampl x from Nav Bays dstrbuton wth paramtrs q z Complt Data Lklhood L = Y " Y Y p j q x,` j,` ( j ` q ( x,`) Log Lklhood # z,j LogL = X X z,j (log p j + X` x,` log q j,` +( x,`) log( q ) j EM Algorthm EM Algorthm Maxmum Lklhood for complt data LogL = X X z,j (log p j + X` x,` log q j,` +( x,`) log( q ) j [w alrady solvd ths a fw lcturs ago] p j = z,j N q j,` = z,jx,` z,j E Stp: Calculatng f,j f,j = E p 0 (Z X)[z,j ]= p0 (z = j and x ) c p0 (z = c and x ) = p0 j Q` q0x,` j,` ( q0 ( x,`) c p0 c Q` q0x,` c,` ( q0 ( x,`) c,`) 5
EM Algorthm for Naïv Bays Calculat: Calculat: f,j = p 00 j = f,j N q 00 j,` = p0 j Q` q0x,` c p0 c f,jx,` f,j Assgn: p ßp and q ß q j,` ( q0 ( x,`) Q` q0x,` c,` ( q0 ( x,`) c,`) Sm-Suprvsd Naïv Bays Modl Naïv Bays for txt catgorzaton What f w hav many documnts but labls for only a fw of thm? Can th unlabld documnts hlp? Us EM: for xampls whr z s known us f,j =z,j nstad of stmatng t Nothng ls changs n th algorthm! 20 nwgroups data 20 nwgroups data 00% 90% 80% 0000 unlabld documnts No unlabld documnts 00% 90% 80% 3000 labld documnts 600 labld documnts 300 labld documnts 40 labld documnts 40 labld documnts 70% 70% Accuracy 60% 50% 40% Accuracy 60% 50% 40% 30% 30% 20% 20% 0% 0% 0% 0 20 50 00 200 500 000 2000 5000 Numbr of Labld Documnts [From Ngam t all MLJ 999.] 0% 0 000 3000 5000 7000 9000 000 3000 Numbr of Unlabld Documnts [From Ngam t all MLJ 999.] Summary EM s a gnral algorthmc framwork for nfrnc wth hddn random varabls It taks a smpl form for mxtur modls altrnatng btwn stmatng fractonal mmbrshps and usng ths n maxmum lklhood calculatons. Gnral drvaton through th Q(p,p ) functon s applcabl n mor complx modls. Mxtur modl asly gnralzs to captur sm-suprvsd larnng 6