Mixture of Gaussians Expectation Maximization (EM) Part 2

Mture of Gaussans Eectaton Mamaton EM Part 2 Most of the sldes are due to Chrstoher Bsho BCS Summer School Eeter 2003. The rest of the sldes are based on lecture notes by A. Ng

Lmtatons of K-means Hard assgnments of data onts to clusters small shft of a data ont can fl t to a dfferent cluster Not clear how to choose the value of K Soluton: relace hard clusterng of K-means wth soft robablstc assgnments Reresents the robablty dstrbuton of the data as a Gaussan mture model BCS Summer School Eeter 2003 Chrstoher M. Bsho

Gaussan Mtures Lnear suer-oston of Gaussans Normalaton and ostvty requre Can nterret the mng coeffcents as ror robabltes K 1 BCS Summer School Eeter 2003 Chrstoher M. Bsho

Mamum Lelhood for the GMM The log lelhood functon taes the form Note: sum over comonents aears nsde the log There s no closed form soluton for mamum lelhood How to mame the log lelhood solved by eectaton-mamaton EM algorthm BCS Summer School Eeter 2003 Chrstoher M. Bsho

EM Algorthm Informal Dervaton The solutons are not closed form snce they are couled Suggests an teratve scheme for solvng them: Mae ntal guesses for the arameters Alternate between the followng two stages: 1. E-ste: evaluate resonsbltes 2. M-ste: udate arameters usng ML results BCS Summer School Eeter 2003 Chrstoher M. Bsho

General Vew of the EM algorthm Jensen's nequalty: Let f be a conve functon f '' 0 for all and X be a and random varable. Then E[ f X] f E[ X]. f s conve X =a wth robablty 0.5 X =b wth robablty 0.5 E[X] s gven by the mdont between a and b. E[fX] s the mdont between fa and fb. E[fX] fex.

Jensen's nequalty cont. Further f f s a strctly conve functon f '' 0 then E[ f X] f E[ X] holds true f and only f E[ X] X wth robablty 1.e. f X s a constant. Jensen's nequalty also holds for concave functons f '' 0 but wth the drecton of all the nequaltes reversed E[ f X] f E[ X] etc..

Problem Defnton Suose we have an estmaton roblem n whch we have a tranng set.. } of d samles. { 1 m We wsh to ft the arameters of a model to the data. We want to mame the lelhood m 1 Dong t elctly may be hard snce s are the nonobserved. If s were observed then often mamum lelhood estmaton would be easy. 1 l log log m observed hdden arameters

EM at glance Our strategy wll be to reeatedly construct a lower-bound on l E-ste otme that lower-bound M-ste. Lower bounds on l l 0 1 2 3 *

0 1 EM algorthm dervaton m m l 1 1 log log m 1 log m E 1 log s some dstrbuton over s E s wth resect to drawn accordng to the dstrbuton gven by m E 1 log m 1 log Jensen's nequalty: ] [log ] [ log X E X E Ths s a lower bound on l

EM algorthm dervaton cont. m l 1 log We want a lower bound to be equal to l at the revous

EM algorthm dervaton cont. Snce we now that then 1 const To ensure that we should choose such that nequalty n our dervaton above would hold wth equalty. We requre that:

General EM Algorthm Reeat untl convergence { E-ste: For each set : M-ste: : arg ma m log 1 } Lower bound on l

EM for MoG revsted For 1 N 1 j K defne hdden varables 1 f samle was generated by comonent j otherwse 0 j are ndcator random varables they ndcate whch Gaussan comonent generated samle { 1... K } Let ndcator r.v. corresond to samle. We say that the rest are 0. when ts st coordnate s 1 and Condtoned on dstrbuton of s Gaussan ~ N j

EM for MoG revsted E-ste: j j j j N N γ

EM for MoG revsted M-ste: N K N 1 1 γ log γ m 1 log ma 0... set N N 1 1 γ γ Smlarly N T N 1 1 γ γ γ 1 1 N N

K-means Algorthm Goal: reresent a data set n terms of K clusters each of whch s summared by a rototye Intale rototyes then terate between two hases: E-ste: assgn each data ont to nearest rototye M-ste: udate rototyes to be the cluster means BCS Summer School Eeter 2003 Chrstoher M. Bsho

Resonsbltes Resonsbltes assgn data onts to clusters such that Eamle: 5 data onts and 3 clusters BCS Summer School Eeter 2003 Chrstoher M. Bsho

K-means Cost Functon data resonsbltes rototyes BCS Summer School Eeter 2003 Chrstoher M. Bsho

Mnmng the Cost Functon E-ste: mnme w.r.t. assgns each data ont to nearest rototye M-ste: mnme w.r.t gves each rototye set to the mean of onts n that cluster Convergence guaranteed snce there s a fnte number of ossble settngs for the resonsbltes BCS Summer School Eeter 2003 Chrstoher M. Bsho

EM Eamle Eamle from R. Guterre-Osuna Tranng set of 900 eamles formng an annulus Mture model wth m = 30 Gaussan comonents of unnown mean and varance s used Tranng: Intalaton: means to 30 random eamles covarance matrces ntaled to be dagonal wth large varances on the dagonal comared to the tranng data varance Durng EM tranng comonents wth small mng coeffcents were trmmed Ths s a trc to get n a more comact model wth fewer than 30 Gaussan comonents

EM Eamle from R. Guterre-Osuna

EM Moton Segmentaton Eamle Three frames from the MPEG flower garden sequence Fgure from Reresentng Images wth layers by J. Wang and E.H. Adelson IEEE Transactons on Image Processng 1994 c 1994 IEEE