Expectation Maximization Mixture Models HMMs

-755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood and MAP estmaton can be found n Aart/Pars sldes Ponted to n a prevous class Soluton: Assgn a model to the dstrbuton Learn parameters of model from data Models can be arbtrarly comple Mture denstes, Herarchcal models. Learnng must be done usng Followng sldes: An ntutve eplanaton usng a smple eample of multnomals 2 A Thought Eperment 6 3 5 4 2 4 A person shoots a loaded dce repeatedly You observe the seres of outcomes You can form a good dea of how the dce s loaded Fgure out what the probabltes of the varous are for dce number = countnumber/sumrolls Ths s a mamum lelhood estmate Estmate that maes the observed sequence of most probable The Multnomal Dstrbuton A probablty dstrbuton over a dscrete collecton of tems s a Multnomal : belongs to a dscrete set E.g. the roll of dce : n,2,3,4,5,6 Or the toss of a con : n head, tals 3 4 Mamum Lelhood Estmaton n n 2 n 3 n 4 n 5 n 6 p p p 6 3 p p 4 p 2 p 4 2 p5 p p 5 p p6 3 p 6 Basc prncple: Assgn a form to the dstrbuton E.g. a multnomal Or a Gaussan Fnd the dstrbuton that best fts the hstogram of the data 5 Defnng Best Ft The data are generated by draws from the dstrbuton I.e. the generatng process draws from the dstrbuton Assumpton: The dstrbuton has a hgh probablty of generatng the observed data ot necessarly true Select the dstrbuton that has the hghest probablty of generatng the data Should assgn lower probablty to less frequent observatons and vce versa 6

Mamum Lelhood Estmaton: Multnomal Segue: Gaussans Probablty of generatng n, n 2, n 3, n 4, n 5, n 6 n P n, n2, n3, n4, n5, n6 Const p Fnd p,p 2,p 3,p 4,p 5,p 6 so that the above s mamzed Alternately mamze n, n2, n3, n4, n5, n log Const n logp log 6 Log s a monotonc functon argma f = argma logf Solvng for the probabltes gves us Requres constraned optmzaton to ensure probabltes sum to p 7 n n j j EVETUALLY ITS JUST COUTIG! ;, Parameters of a Gaussan: Mean, Covarance 8 T ep 0.5 2 d Mamum Lelhood: Gaussan Gven a collecton of observatons, 2,, estmate mean and covarance log T, 2,... ep 0.5 d 2 T, 2,... C 0.5log Mamzng w.r.t and gves us T ITS STILL JUST COUTIG! Laplacan L ;, b ep 2b b Parameters: Mean, scale b b > 0 9 0 Mamum Lelhood: Laplacan Gven a collecton of observatons, 2,, estmate mean and scale b,,... C log b log 2 Mamzng w.r.t and b gves us b b Drchlet K=3. Clocwse from top left: α=6, 2, 2, 3, 7, 5, 6, 2, 6, 2, 3, 4 from wpeda log of the densty as we change α from α=0.3, 0.3, 0.3 to 2.0, 2.0, 2.0, eepng all the ndvdual α's s equal to each other. Parameters are s Determne mode and curvature Defned only of probablty vectors = [ 2.. K ], =, >= 0 for all 2 D ; 2

0.3 0.25 0.2 0.5 0. 0.05 0 2 3 4 5 6 0.3 0.25 0.2 0.5 0. 0.05 0 2 3 4 5 6 Mamum Lelhood: Drchlet Gven a collecton of observatons, 2,, estmate, 2,... log j log log j log, o closed form soluton for s. eeds gradent ascent Several dstrbutons have ths property: the ML estmate of ther parameters have no closed form soluton Contnung the Thought Eperment 6 3 5 4 2 4 4 4 6 3 2 2 Two persons shoot loaded dce repeatedly The dce are dfferently loaded for the two of them We observe the seres of outcomes for both persons How to determne the probablty dstrbutons of the two dce? 3 4 Estmatng Probabltes Observaton: The sequence of from the two dce As ndcated by the colors, we now who rolled what number 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 Estmatng Probabltes Observaton: The sequence of from the two dce As ndcated by the colors, we now who rolled what number Segregaton: Separate the blue observatons from the red 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 6 5 2 4 2 3 6.. 4 3 5 2 4 4 2 6.. Collecton of blue Collecton of red 5 6 Estmatng Probabltes Observaton: The sequence of from the two dce As ndcated by the colors, we now who rolled what number 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 A Thought Eperment 6 4 5 3 2 2 2 Segregaton: Separate the blue observatons from the red 6 5 2 4 2 3 6.. 4 3 5 2 4 4 2 6.. From each set compute probabltes for each of the 6 possble outcomes no.of tmes number was rolled number total number of observed rolls 7 6 3 5 4 2 4 4 4 6 3 2 2 ow magne that you cannot observe the dce yourself Instead there s a caller who randomly calls out the outcomes 40% of the tme he calls out the number from the left shooter, and 60% of the tme, the one from the rght and you now ths At any tme, you do not now whch of the two he s callng out How do you determne the probablty dstrbutons for the two dce? 8 3

A Thought Eperment 6 4 5 3 2 2 2 A Mture Multnomal The caller wll call out a number n any gven callout IF He selects RED, and the Red de rolls the number OR He selects BLUE and the Blue de rolls the number = Red Red + Blue Blue E.g. 6 = Red6 Red + Blue6 Blue 6 3 5 4 2 4 4 4 6 3 2 2 How do you now determne the probablty dstrbutons for the two sets of dce.. If you do not even now what fracton of tme the blue are called, and what fracton are red? 9 A dstrbuton that combnes or mes multple multnomals s a mture multnomal P Mture weghts Component multnomals 20 Mture Dstrbutons P Mture weghts Component dstrbutons Mture of Gaussans and Laplacans P ; z, z P Mture Gaussan P ;, P L ;, b z z, Mture dstrbutons m several component dstrbutons Component dstrbutons may be of vared type Mng weghts must sum to.0 Component dstrbutons ntegrate to.0 Mture dstrbuton ntegrates to.0 2 z z Mamum Lelhood Estmaton For our problem: = color of dce P n P n, n2, n3, n4, n5, n6 Const Const Mamum lelhood soluton: Mamze log P n P, n2, n3, n4, n5, n6 log Const n log o closed form soluton summaton nsde log! In general ML estmates for mtures do not have a closed form USE EM! 22 n It s possble to estmate all parameters n ths setup usng the or EM algorthm Frst descrbed n a landmar paper by Dempster, Lard and Rubn Mamum Lelhood Estmaton from ncomplete data, va the EM Algorthm, Journal of the Royal Statstcal Socety, Seres B, 977 Much wor on the algorthm snce then The prncples behnd the algorthm ested for several years pror to the landmar paper, however. 23 Iteratve soluton Get some ntal estmates for all parameters Dce shooter eample: Ths ncludes probablty dstrbutons for dce AD the probablty wth whch the caller selects the dce Two steps that are terated: Epectaton Step: Estmate statstcally, the values of unseen varables Mamzaton Step: Usng the estmated values of the unseen varables as truth, estmates of the model parameters 24 4

EM: The aulary functon EM teratvely optmzes the followng aulary functon Q, =, log, are the unseen varables Assumng s dscrete may not be are the parameter estmates from the prevous teraton are the estmates to be obtaned n the current teraton as countng Instance from blue dce Instance from red dce Dce unnown 6 6 6 6 6........ 6 6 6.. 6.. Collecton of blue Collecton of red Collecton of blue Collecton of red Collecton of blue Collecton of red Hdden varable: Dce: The dentty of the dce whose number has been called out If we new for every observaton, we could estmate all terms By addng the observaton to the rght bn Unfortunately, we do not now t s hdden from us! Soluton: FRAGMET THE OBSERVATIO 25 26 Fragmentng the Observaton EM s an teratve algorthm At each tme there s a current estmate of parameters The sze of the fragments s proportonal to the a posteror probablty of the component dstrbutons The a posteror probabltes of the varous values of are computed usng Bayes rule: C Every dce gets a fragment of sze dce number 27 Hypothetcal Dce Shooter Eample: We obtan an ntal estmate for the probablty dstrbuton of the two sets of dce somehow: blue 0.35 0.3 0.25 0.2 0.5 0. 0.05 0 40. 3 0.05 2 4 5 6 2 3 5 6 We obtan an ntal estmate for the probablty wth whch the caller calls out the two shooters somehow 0.5 0.5 28 red 0.45 0.4 0.35 0.3 0.25 02 0.2 0.5 0. 0.05 0 Hypothetcal Dce Shooter Eample: Intal estmate: blue = red = 0.5 4 blue = 0., for 4 red = 0.05 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 4 0.33 4 0.67 Caller has just called out 4 Posteror probablty of colors: red 4 C 4 red red C 0.05 0.5 C0.025 blue 4 C 4 blue blue C 0. 0.5 C0.05 ormalzng : P red 4 0.33; blue 4 0.67 29 30 5

Every observed roll of the dce contrbutes to both Red and Blue 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 Every observed roll of the dce contrbutes to both Red and Blue 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 6 0.8 6 0.2 3 32 Every observed roll of the dce contrbutes to both Red and Blue 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 Every observed roll of the dce contrbutes to both Red and Blue 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 6 0.8, 4 0.33 6 0.2, 4 0.67 6 0.8, 4 0.33, 6 0.2, 4 0.67, 5 0.33, 50.67, 33 34 Every observed roll of the dce contrbutes to both Red and Blue 6 4 5 2 3 4 5 2 2 4 3 4 6 2 6 6 0.8, 4 0.33, 5 0.33, 0.57, 2 0.4, 3 0.33, 4 0.33, 5 0.33, 2 0.4, 2 0.4, 0.57, 4 0.33, 3 0.33, 4 0.33, 6 0.8, 2 0.4, 0.57, 6 0.8 6 0.2, 4 0.67, 5 0.67, 0.43, 2 0.86, 3 0.67, 4 0.67, 5 0.67, 2 0.86, 2 0.86, 0.43, 4 0.67, 3 0.67, 4 0.67, 6 0.2, 2 0.86, 0.43, 6 0.2 35 Every observed roll of the dce contrbutes to both Red and Blue Total count for Red s the sum of all the posteror probabltes n the red column 7.3 Total count for Blue s the sum of all the posteror probabltes n the blue column 0.69 ote: 0.69 + 7.3 = 8 = the total number of nstances Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 36 6

Total count for Red : 7.3 Total count for :.7 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 37 Total count for Red : 7.3 Total count for :.7 Total count for 2: 0.56 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 38 Total count for Red : 7.3 Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 39 Total count for Red : 7.3 Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Total count for 4:.32 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 40 Total count for Red : 7.3 Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Total count for 4:.32 Total count for 5: 0.66 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 4 Called red blue Total count for Red : 7.3 Total count for :.7.57.43 Total count for 2: 0.56 2.4.86 Total count for 3: 0.66 Total count for 4:.32 Total count for 5: 0.66 Total count for 6: 2.4 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 42 7

Total count for Red : 7.3 Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Total count for 4:.32 Total count for 5: 0.66 Total count for 6: 2.4 Updated probablty of Red dce: Red =.7/7.3 = 0.234 2 Red = 0.56/7.3 = 0.077 3 Red = 0.66/7.3 = 0.090 4 Red =.32/7.3 = 0.8 5 Red = 0.66/7.3 = 0.090 6 Red = 2.40/7.3 = 0.328 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 43 Total count for Blue : 0.69 Total count for :.29 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 44 Total count for Blue : 0.69 Total count for :.29 Total count for 2: 3.44 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 45 Total count for Blue : 0.69 Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 46 Total count for Blue : 0.69 Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Total count for 4: 2.68 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 47 Total count for Blue : 0.69 Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Total count for 4: 2.68 Total count for 5:.34 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 48 8

Called red blue Total count for Blue : 0.69 Total count for :.29.57.43 Total count for 2: 3.44 2.4.86 Total count for 3:.34 Total count for 4: 2.68 Total count for 5:.34 Total count for 6: 0.6 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 49 Total count for Blue : 0.69 Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Total count for 4: 2.68 Total count for 5:.34 Total count for 6: 0.6 Updated probablty of Blue dce: Blue =.29/.69 = 0.22 2 Blue = 0.56/.69 = 0.322 3 Blue = 0.66/.69 = 0.25 4 Blue =.32/.69 = 0.250 5 Blue = 0.66/.69 = 0.25 6 Blue = 2.40/.69 = 0.056 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 50 Total count for Red : 7.3 Total count for Blue : 0.69 Total nstances = 8 ote 7.3+0.69 = 8 We also revse our estmate for the probablty that the caller calls out Red or Blue.e the fracton of tmes that he calls Red and the fracton of tmes he calls Blue =Red = 7.3/8 = 0.4 =Blue = 0.69/8 = 0.59 Called red blue.57.43 2.4.86 2.4.86 2.4.86.57.43 2.4.86.57.43 7.3 0.69 5 The updated values Probablty of Red dce: Called red blue Red =.7/7.3 = 0.234 2 Red = 0.56/7.3 = 0.077 3 Red = 0.66/7.3 = 0.090 4 Red =.32/7.3 = 0.8.57.43 5 Red = 0.66/7.3 = 0.090 2.4.86 6 Red = 2.40/7.3 = 0.328 Probablty of Blue dce: Blue =.29/.69 = 0.22 2.4.86 2 Blue = 0.56/.69 = 0.322 2.4.86 3 Blue = 0.66/.69 = 0.25.57.43 4 Blue =.32/.69 = 0.250 5 Blue = 0.66/.69 = 0.25 6 Blue = 2.40/.69 = 0.056 2 =Red = 7.3/8 = 0.4.4.86.57.43 =Blue = 0.69/8 = 0.59 THE UPDATED VALUES CA BE USED TO REPEAT THE 755/8797 PROCESS. ESTIMATIO IS A ITERATIVE PROCESS 2 Sep 200 52 The Dce Shooter Eample 6 4 5 3 2 2 2 6 3 5 4 2 4 4 4 6 3 2 2. Intalze, 2. Estmate for each, for each called out number Assocate wth each value of, wth weght 3. Re-estmate for every value of and 4. Re-estmate 5. If not converged, return to 2 53 In Squggles Gven a sequence of observatons O, O 2,.. s the number of observatons of number Intalze, for dce and Iterate: For each number : Update: P P O such that O O O P P P 54 ' ' ' ' ' 9

Solutons may not be unque A More Comple Model The EM algorthm wll gve us one of many solutons, all equally vald! The probablty of 6 beng called out: 6 6 red 6 blue P r P b Assgns P r as the probablty of 6 for the red de Assgns P b as the probablty of 6 for the blue de The followng too s a vald soluton [FI] P P 0. anythng 6.0 r b 0 Assgns.0 as the a pror probablty of the red de Assgns 0.0 as the probablty of the blue de The soluton s OT unque 55 T P ;, ep 0.5 d 2 Gaussan mtures are often good models for the dstrbuton of multvarate data Problem: Estmatng the parameters, gven a collecton of data 56 Gaussan Mtures: Generatng model P ;, The caller now has two Gaussans 6..4 5.3.9 4.2 2.2 4.9 0.5 At each draw he randomly selects a Gaussan, by the mture weght dstrbuton He then draws an observaton from that Gaussan Much le the dce problem only the outcomes are now real and can be anythng 57 Estmatng GMM wth complete nformaton Observaton: A collecton of drawn from a mture of 2 Gaussans As ndcated by the colors, we now whch Gaussan generated what number Segregaton: Separate the blue observatons from the red From each set compute parameters for that Gaussan red red red red red red T red red red red 6..4 5.3.9 4.2 2.2 4.9 0.5 6. 5.3 4.2 4.9...4.9 2.2 0.5.. 58 Fragmentng the observaton Gaussan unnown Collecton of blue 4.2 4.2 4.2 4.2.. 4.2.. Collecton of red The dentty of the Gaussan s not nown! Soluton: Fragment the observaton Fragment sze proportonal to a posteror probablty ;, ' ' ' ; ', ' ' ' Intalze, and for both Gaussans Important how we do ths Typcal soluton: Intalze means randomly, as the global covarance of the data and unformly Compute fragment szes for each Gaussan, for each observaton ' umber red blue 6..8.9. 5.3.75.25.9.4.59 4.2.64.36 2.2.43.57 4.9.66.34 0.5.05.95 ;, ' ;, ' ' 59 60 0

Each observaton contrbutes only as much as ts fragment sze to each statstc Meanred = 6.*0.8 +.4*0.33 + 5.3*0.75 +.9*0.4 + 4.2*0.64 + 2.2*0.43 + 4.9*0.66 + 0.5*0.05 / 0.8 + 0.33 + 0.75 + 0.4 + 0.64 + 0.43 + 0.66 + 0.05 = 7.05 / 4.08 = 4.8 umber red blue 6..8.9. 5.3.75.25.9.4.59 4.2.64.36 2.2.43.57 4.9.66.34 0.5.05.95 4.08 3.92 Varred = 6.-4.8 2 *0.8 +.4-4.8 2 *0.33 + 5.3-4.8 2 *0.75 +.9-4.8 2 *0.4 + 4.2-4.8 2 *0.64 + 2.2-4.8 2 *0.43 + 4.9-4.8 2 *0.66 + 0.5-4.8 2 *0.05 / 0.8 + 0.33 + 0.75 + 0.4 + 0.64 + 0.43 + 0.66 + 0.05 4.08 red 8 6 EM for Gaussan Mtures. Intalze, and for all Gaussans 2. For each observaton compute a posteror probabltes for all Gaussan ;, P ' ;, ' ' ' 3. Update mture weghts, means and varances for all Gaussans 2 4. If not converged, return to 2 62 EM estmaton of Gaussan Mtures An Eample The same prncple can be etended to mtures of other dstrbutons. E.g. Mture of Laplacans: Laplacan parameters become P b P Hstogram of 4000 nstances of a randomly generated data Indvdual parameters of a two-gaussan mture estmated by EM Two-Gaussan mture estmated by EM In a mture of Gaussans and Laplacans, Gaussans use the Gaussan update rules, Laplacans use the Laplacan rule 63 64 The EM algorthm s used whenever proper statstcal analyss of a phenomenon requres the nowledge of a hdden or mssng varable or a set of hdden/mssng varables The hdden varable s often called a latent varable Some eamples: Estmatng mtures of dstrbutons Only data are observed. The ndvdual dstrbutons and mng proportons must both be learnt. Estmatng the dstrbuton of data, when some attrbutes are mssng Estmatng the dynamcs of a system, based only on observatons that may be a comple functon of system state Solve ths problem: Caller rolls a dce and flps a con He calls out the number rolled f the con shows head Otherwse he calls the number+ Determne pheads and pnumber for the dce from a collecton of ouputs Caller rolls two dce He calls out the sum Determne dce from a collecton of ouputs 65 66

The dce and the con Heads or tal? The two dce 4 Tals count Heads count 4 3.. 4. 3 3, 4 2,2,3 Unnown: Whether t was head or tals Unnown: How to partton the number Count blue 3 += 3, 4 Count blue 2 += 2,2 4 Count blue +=,3 4 67 68 Fragmentaton can be herarchcal P, 2 2 3 4 More later Wll see a couple of other nstances of the use of EM Wor out HMM tranng Assume state output dstrbutons are multnomals Assume they are Gaussan Assume Gaussan mtures E.g. mture of mtures Fragments are further fragmented.. Wor ths out 69 70 2