Expectation Maximization Mixture Models

Size: px

Start display at page:

Download "Expectation Maximization Mixture Models"

Madlyn Lewis
5 years ago
Views:

-755 Machne Learnng for Sgnal Processng Mxture Models Understandng (and Predctng Data Many dfferent data streams around us We process, understand and respond What s the response based on? Class 0.

The data we observed Underlyng characterstcs that we nferred Understandng (and Predctng Data Many dfferent data streams around us We process, understand and respond What s the response based on?

1 -755 Machne Learnng for Sgnal Processng Mxture Models Understandng (and Predctng Data Many dfferent data streams around us We process, understand and respond What s the response based on? Class 0. Oct 2, Understandng (and Predctng Data Many dfferent data streams around us We process, understand and respond What s the response based on? The data we observed Underlyng characterstcs that we nferred Understandng (and Predctng Data Many dfferent data streams around us We process, understand and respond What s the response based on? The data we observed Underlyng characterstcs that we nferred Modeled usng latent varables 3 4 Examples Stoc Maret Examples Sports From Yahoo! Fnance Maret sentment as a latent varable? What slls n players should be valued? Sdenote: For anyone nterested, Baseball as a Marov Chan 5 6

2 Ptch (Hz Ptch (Hz Ptch (Hz Examples A Strange Observaton Many audo applcatons use latent varables Sgnal Separaton Voce Modfcaton Musc Analyss Lata Mangeshar, Anupama Pea: 570 Hz, Mean: 40 Hz Ala Yangn, Dl Ka Rshta Pea: 740 Hz, Mean: 580 Hz Musc and Speech Generaton 400 Shamshad Begum, Patanga Pea 30 Hz, Mean 278 Hz Year (AD The ptch of female Indan playbac sngers s on an ever-ncreasng trajectory 7 Oct 2, Comments on the hgh-ptched sngng A Strange Observaton Sarah McDonald (Holy Cow:.. shreng 800 Khazana.com:.. female Indan move playbac sngers who can produce ultra hgh frequnces whch only dogs can hear clearly Lata Mangeshar, Anupama Pea: 570 Hz, Mean: 40 Hz Ala Yangn, Dl Ka Rshta Pea: 740 Hz, Mean: 580 Hz Hgh ptched female sngers dong ther best to sound le they were seven years old.. 9 Average Female Talng Ptch Shamshad Begum, Patanga Pea 30 Hz, Mean 278 Hz Year (AD The ptch of female Indan playbac sngers s on an ever-ncreasng trajectory 0 A Dsturbng Observaton Lets Fx the Song 800 Glass Shatters The ptch s unpleasant The melody sn t bad 600 Lata Mangeshar, Anupama Pea: 570 Hz, Mean: 40 Hz Ala Yangn, Dl Ka Rshta Pea: 740 Hz, Mean: 580 Hz Modfy the ptch, but retan melody 400 Average Female Talng Ptch Shamshad Begum, Patanga Pea 30 Hz, Mean 278 Hz Year (AD The ptch of female Indan playbac sngers s on an ever-ncreasng trajectory Problem: Cannot just shft the ptch: wll destroy the musc The musc s fne, leave t alone Modfy the sngng ptch wthout affectng the musc 2 2

Personalzng the Song Separate the vocals from the bacground musc Modfy the separated vocals, eep musc unchanged Separaton need not be perfect Must only be suffcent to enable ptch modfcaton of vocals

Separaton example Dayya Dayya orgnal (only vocalzed regons Dayya Dayya separated musc Dayya Dayya separated vocals Oct 2, 202 3 Oct 2, 202 4 Some examples Some examples Example : Vocals shfted down

3 Personalzng the Song Separate the vocals from the bacground musc Modfy the separated vocals, eep musc unchanged Separaton need not be perfect Must only be suffcent to enable ptch modfcaton of vocals Ptch modfcaton s tolerant of low-level artfacts For octave level ptch modfcaton artfacts can be undetectable. Separaton example Dayya Dayya orgnal (only vocalzed regons Dayya Dayya separated musc Dayya Dayya separated vocals Oct 2, Oct 2, Some examples Some examples Example : Vocals shfted down by 4 semtonesexample 2: Gender of snger partally modfed Example : Vocals shfted down by 4 semtones Example 2: Gender of snger partally modfed Oct 2, Oct 2, Technques Employed Sgnal separaton Employed a smple latent-varable based separaton method Voce modfcaton Equally smple technques Wll consder the underlyng methods over next few lectures Extensve use of Learnng Dstrbutons for Data Problem: Gven a collecton of examples from some data, estmate ts dstrbuton Basc deas of Maxmum Lelhood and MAP estmaton can be found n Aart/Pars sldes Ponted to n a prevous class Soluton: Assgn a model to the dstrbuton Learn parameters of model from data Models can be arbtrarly complex Mxture denstes, Herarchcal models. Learnng can be done usng 7 8 3

A Thought Experment 6 3 5 4 2 4 A person shoots a loaded dce repeatedly You observe the seres of outcomes You can form a good dea of how the dce s loaded Fgure out what the probabltes of the varous

I.e. the generatng process draws from the dstrbuton Assumpton: The dstrbuton has a hgh probablty of generatng the observed data ot necessarly true Select the dstrbuton that has the hghest probablty

4 A Thought Experment A person shoots a loaded dce repeatedly You observe the seres of outcomes You can form a good dea of how the dce s loaded Fgure out what the probabltes of the varous are for dce number = count(number/sum(rolls Ths s a maxmum lelhood estmate Estmate that maes the observed sequence of most probable Generatve Model The data are generated by draws from the dstrbuton I.e. the generatng process draws from the dstrbuton Assumpton: The dstrbuton has a hgh probablty of generatng the observed data ot necessarly true Select the dstrbuton that has the hghest probablty of generatng the data Should assgn lower probablty to less frequent observatons and vce versa 9 20 The Multnomal Dstrbuton A probablty dstrbuton over a dscrete collecton of tems s a Multnomal : belongs toa dscrete set E.g. the roll of dce : n (,2,3,4,5,6 Or the toss of a con : n (head, tals 2 Maxmum Lelhood Estmaton: Multnomal Probablty of generatng (n, n 2, n 3, n 4, n 5, n 6 Fnd p,p 2,p 3,p 4,p 5,p 6 so that the above s maxmzed Alternately maxmze P n, n, n, n, n, n Const ( Log( s a monotonc functon argmax x f(x = argmax x log(f(x Solvng for the probabltes gves us Requres constraned optmzaton to ensure probabltes sum to 22 n, n2, n3, n4, n5, n log( Const n log p log 6 p j n n j n p EVETUALLY ITS JUST COUTIG! Segue: Gaussans ; m, Q Parameters of a Gaussan: Mean m, Covarance Q T exp 0.5( m Q ( m (2 Q d Maxmum Lelhood: Gaussan Gven a collecton of observatons (, 2,, estmate mean m and covarance Q log T, 2,... exp 0.5( m Q ( m d (2 Q T, 2,... C 0.5log Q ( m Q ( m Maxmzng w.r.t m and Q gves us T ITS STILL m Q m m JUST COUTIG!

Laplacan Maxmum Lelhood: Laplacan Gven a collecton of observatons (x, x 2,, estmate mean m and scale b x m x, x,... C log( b log 2 b Maxmzng w.r.t m and b gves us x m x L( x; m, b exp 2b b Parameters: Mean m, scale b (b > 0 m x b x m 25 26 Drchlet K=3.

Parameters are as Determne mode and curvature Defned only of probablty vectors = [x x 2.

5 Laplacan Maxmum Lelhood: Laplacan Gven a collecton of observatons (x, x 2,, estmate mean m and scale b x m x, x,... C log( b log 2 b Maxmzng w.r.t m and b gves us x m x L( x; m, b exp 2b b Parameters: Mean m, scale b (b > 0 m x b x m Drchlet K=3. Clocwse from top left: α=(6, 2, 2, (3, 7, 5, (6, 2, 6, (2, 3, 4 (from wpeda log of the densty as we change α from α=(0.3, 0.3, 0.3 to (2.0, 2.0, 2.0, eepng all the ndvdual α's equal to each other. Parameters are as Determne mode and curvature Defned only of probablty vectors = [x x 2.. x K ], S x =, x >= 0 for all 27 ( a D( ; a a x a Maxmum Lelhood: Drchlet Gven a collecton of observatons (, 2,, estmate a, 2,... ( a log( j log a log a j log, o closed form soluton for as. eeds gradent ascent Several dstrbutons have ths property: the ML estmate of ther parameters have no closed form soluton 28 Contnung the Thought Experment Estmatng Probabltes Observaton: The sequence of from the two dce As ndcated by the colors, we now who rolled what number Two persons shoot loaded dce repeatedly The dce are dfferently loaded for the two of them We observe the seres of outcomes for both persons How to determne the probablty dstrbutons of the two dce? Oct 2, Oct 2,

6 Estmatng Probabltes Observaton: The sequence of from the two dce As ndcated by the colors, we now who rolled what number Segregaton: Separate the blue observatons from the red Collecton of blue Collecton of red Estmatng Probabltes Observaton: The sequence of from the two dce As ndcated by the colors, we now who rolled what number Segregaton: Separate the blue observatons from the red From each set compute probabltes for each of the 6 possble outcomes no.of tmes number was rolled number totalnumber of observed rolls Oct 2, Oct 2, A Thought Experment A Thought Experment ow magne that you cannot observe the dce yourself Instead there s a caller who randomly calls out the outcomes 40% of the tme he calls out the number from the left shooter, and 60% of the tme, the one from the rght (and you now ths At any tme, you do not now whch of the two he s callng out How do you determne the probablty dstrbutons for the two dce? How do you now determne the probablty dstrbutons for the two sets of dce.. If you do not even now what fracton of tme the blue are called, and what fracton are red? 34 A Mxture Multnomal The caller wll call out a number n any gven callout IF He selects RED, and the Red de rolls the number OR He selects BLUE and the Blue de rolls the number = Red Red + Blue Blue E.g. 6 = Red6 Red + Blue6 Blue A dstrbuton that combnes (or mxes multple multnomals s a mxture multnomal P ( Z Mxture weghts Component multnomals Mxture Dstrbutons P ( Z Mxture weghts Component dstrbutons Mxture Gaussan P ( ; m, Q Mxture dstrbutons mx several component dstrbutons Component dstrbutons may be of vared type Mxng weghts must sum to.0 Component dstrbutons ntegrate to.0 Mxture dstrbuton ntegrates to.0 Z z z

7 Maxmum Lelhood Estmaton For our problem: Z = color of dce Maxmum lelhood soluton: Maxmze o closed form soluton (summaton nsde log! In general ML estmates for mxtures do not have a closed form USE EM! P ( Z n P ( n, n2, n3, n4, n5, n6 Const Const Z n log P ( Z log( P ( n, n2, n3, n4, n5, n6 log( Const Z n It s possble to estmate all parameters n ths setup usng the (or EM algorthm Frst descrbed n a landmar paper by Dempster, Lard and Rubn Maxmum Lelhood Estmaton from ncomplete data, va the EM Algorthm, Journal of the Royal Statstcal Socety, Seres B, 977 Much wor on the algorthm snce then The prncples behnd the algorthm exsted for several years pror to the landmar paper, however. Oct 2, Oct 2, Iteratve soluton Get some ntal estmates for all parameters Dce shooter example: Ths ncludes probablty dstrbutons for dce AD the probablty wth whch the caller selects the dce Two steps that are terated: Expectaton Step: Estmate statstcally, the values of unseen varables Maxmzaton Step: Usng the estmated values of the unseen varables as truth, estmates of the model parameters 39 EM: The auxlary functon EM teratvely optmzes the followng auxlary functon Q(q, q = S Z Z,q log(z, q Z are the unseen varables Assumng Z s dscrete (may not be q are the parameter estmates from the prevous teraton q are the estmates to be obtaned n the current teraton 40 as countng Instance from blue dce Instance from red dce Dce unnown Collecton of blue Collecton of red Collecton of blue Collecton of red Collecton of blue Collecton of red Hdden varable: Z Dce: The dentty of the dce whose number has been called out If we new Z for every observaton, we could estmate all terms By addng the observaton to the rght bn Unfortunately, we do not now Z t s hdden from us! Soluton: FRAGMET THE OBSERVATIO Fragmentng the Observaton EM s an teratve algorthm At each tme there s a current estmate of parameters The sze of the fragments s proportonal to the a posteror probablty of the component dstrbutons The a posteror probabltes of the varous values of Z are computed usng Bayes rule: Z C Every dce gets a fragment of sze dce number Oct 2, Oct 2,

8 blue red Hypothetcal Dce Shooter Example: We obtan an ntal estmate for the probablty dstrbuton of the two sets of dce (somehow: We obtan an ntal estmate for the probablty wth whch the caller calls out the two shooters (somehow Hypothetcal Dce Shooter Example: Intal estmate: blue = red = blue = 0., for 4 red = 0.05 Caller has just called out 4 Posteror probablty of colors: red 4 C 4 Z red Z red C C0.025 blue 4 C 4 Z blue Z blue C C0.05 ormalzn g : red ; blue Oct 2, Oct 2, Every observed roll of the dce contrbutes to both Red and Blue ( ( Every observed roll of the dce contrbutes to both Red and Blue Every observed roll of the dce contrbutes to both Red and Blue (0.8 6 (0.2 6 (0.8, 4 ( (0.2, 4 (0.67 Oct 2, Oct 2,

9 Every observed roll of the dce contrbutes to both Red and Blue Every observed roll of the dce contrbutes to both Red and Blue (0.8, 4 (0.33, 6 (0.2, 4 (0.67, 5 (0.33, 5 (0.67, 6 (0.8, 4 (0.33, 5 (0.33, (0.57, 2 (0.4, 3 (0.33, 4 (0.33, 5 (0.33, 2 (0.4, 2 (0.4, (0.57, 4 (0.33, 3 (0.33, 4 (0.33, 6 (0.8, 2 (0.4, (0.57, 6 (0.8 6 (0.2, 4 (0.67, 5 (0.67, (0.43, 2 (0.86, 3 (0.67, 4 (0.67, 5 (0.67, 2 (0.86, 2 (0.86, (0.43, 4 (0.67, 3 (0.67, 4 (0.67, 6 (0.2, 2 (0.86, (0.43, 6 (0.2 Oct 2, Oct 2, Every observed roll of the dce contrbutes to both Red and Blue Total count for Red s the sum of all the posteror probabltes n the red column 7.3 Total count for Blue s the sum of all the posteror probabltes n the blue column 0.69 ote: = 8 = the total number of nstances Called red blue Total count for Red : 7.3 Red: Total count for :.7 Oct 2, Oct 2, Called red blue Total count for Red : 7.3 Red: Total count for :.7 Total count for 2: 0.56 Called red blue 53 Total count for Red : 7.3 Red: Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Called red blue 54 9

10 Total count for Red : 7.3 Red: Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Total count for 4:.32 Called red blue 55 Total count for Red : 7.3 Red: Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Total count for 4:.32 Total count for 5: 0.66 Called red blue 56 Total count for Red : 7.3 Red: Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Total count for 4:.32 Total count for 5: 0.66 Total count for 6: 2.4 Called red blue 57 Total count for Red : 7.3 Red: Total count for :.7 Total count for 2: 0.56 Total count for 3: 0.66 Total count for 4:.32 Total count for 5: 0.66 Total count for 6: 2.4 Updated probablty of Red dce: Red =.7/7.3 = Red = 0.56/7.3 = Red = 0.66/7.3 = Red =.32/7.3 = Red = 0.66/7.3 = Red = 2.40/7.3 = Called red blue 58 Total count for Blue : 0.69 Blue: Total count for :.29 Called red blue 59 Total count for Blue : 0.69 Blue: Total count for :.29 Total count for 2: 3.44 Called red blue 60 0

11 Total count for Blue : 0.69 Blue: Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Called red blue 6 Total count for Blue : 0.69 Blue: Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Total count for 4: 2.68 Called red blue 62 Total count for Blue : 0.69 Blue: Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Total count for 4: 2.68 Total count for 5:.34 Called red blue 63 Total count for Blue : 0.69 Blue: Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Total count for 4: 2.68 Total count for 5:.34 Total count for 6: 0.6 Called red blue 64 Total count for Blue : 0.69 Blue: Total count for :.29 Total count for 2: 3.44 Total count for 3:.34 Total count for 4: 2.68 Total count for 5:.34 Total count for 6: 0.6 Updated probablty of Blue dce: Blue =.29/.69 = Blue = 0.56/.69 = Blue = 0.66/.69 = Blue =.32/.69 = Blue = 0.66/.69 = Blue = 2.40/.69 = Called red blue 65 Total count for Red : 7.3 Total count for Blue : 0.69 Total nstances = 8 ote = 8 We also revse our estmate for the probablty that the caller calls out Red or Blue.e the fracton of tmes that he calls Red and the fracton of tmes he calls Blue Z=Red = 7.3/8 = 0.4 Z=Blue = 0.69/8 = 0.59 Called red blue 66

12 The updated values Probablty of Red dce: Red =.7/7.3 = Red = 0.56/7.3 = Red = 0.66/7.3 = Red =.32/7.3 = Red = 0.66/7.3 = Red = 2.40/7.3 = Probablty of Blue dce: Blue =.29/.69 = Blue = 0.56/.69 = Blue = 0.66/.69 = Blue =.32/.69 = Blue = 0.66/.69 = Blue = 2.40/.69 = Z=Red = 7.3/8 = 0.4 Z=Blue = 0.69/8 = 0.59 Called red blue THE UPDATED VALUES CA BE USED TO REPEAT THE 67 PROCESS. ESTIMATIO IS A ITERATIVE PROCESS The Dce Shooter Example Intalze, 2. Estmate Z for each Z, for each called out number Assocate wth each value of Z, wth weght Z 3. Re-estmate for every value of and Z 4. Re-estmate 5. If not converged, return to 2 68 In Squggles Gven a sequence of observatons O, O 2,.. s the number of observatons of number Intalze, for dce Z and Iterate: For each number : Update: P P O such that ( O Z O ( Z O Z Z Z Z ' Z' Z' Z ' Z Z' Solutons may not be unque The EM algorthm wll gve us one of many solutons, all equally vald! The probablty of 6 beng called out: 6 a6 red 6 blue ap r P b Assgns P r as the probablty of 6 for the red de Assgns P b as the probablty of 6 for the blue de The followng too s a vald soluton ap P 0. anythng 6.0 r b 0 Assgns.0 as the a pror probablty of the red de Assgns 0.0 as the probablty of the blue de The soluton s OT unque Oct 2, Oct 2, A More Complex Model Gaussan Mxtures: Generatng model P ( ; m, Q T P ( ; m, Q exp 0.5( m Q ( m d (2 Q Gaussan mxtures are often good models for the dstrbuton of multvarate data Problem: Estmatng the parameters, gven a collecton of data 7 The caller now has two Gaussans At each draw he randomly selects a Gaussan, by the mxture weght dstrbuton He then draws an observaton from that Gaussan Much le the dce problem (only the outcomes are now real and can be anythng 72 2

Estmatng GMM wth complete nformaton Observaton: A collecton of drawn from a mxture of 2 Gaussans As ndcated by the colors, we now whch Gaussan generated what number Segregaton: Separate the blue

3.9 4.2 2.2 4.9 0.5 6. 5.3 4.2 4.9...4.9 2.2 0.5.. Fragmentng the observaton Gaussan unnown Collecton of blue Collecton of red The dentty of the Gaussan s not nown!

13 Estmatng GMM wth complete nformaton Observaton: A collecton of drawn from a mxture of 2 Gaussans As ndcated by the colors, we now whch Gaussan generated what number Segregaton: Separate the blue observatons from the red From each set compute parameters for that Gaussan mred red red Qred red red red T mred mred red Fragmentng the observaton Gaussan unnown Collecton of blue Collecton of red The dentty of the Gaussan s not nown! Soluton: Fragment the observaton Fragment sze proportonal to a posteror probablty ; m, Q ' ' ' ; m ', Q Oct 2, Oct 2, ' ' ' Intalze, m and Q for both Gaussans Important how we do ths Typcal soluton: Intalze means randomly, Q as the global covarance of the data and unformly Compute fragment szes for each Gaussan, for each observaton ' umber ; m, Q ' ; m, Q ' ' red blue Each observaton contrbutes only as much as ts fragment sze to each statstc Mean(red = (6.* * * * * * * *0.05 / ( = 7.05 / 4.08 = 4.8 umber red blue Var(red = (( *0.8 + ( * ( * ( *0.4 + ( * ( * ( * ( *0.05 / ( red 8 Oct 2, Oct 2, EM for Gaussan Mxtures. Intalze, m and Q for all Gaussans 2. For each observaton compute a posteror probabltes for all Gaussan ' ; m, Q ' ; m, Q 3. Update mxture weghts, means and varances for all Gaussans 2 ( m m Q 4. If not converged, return to 2 ' ' 77 EM estmaton of Gaussan Mxtures An Example Hstogram of 4000 nstances of a randomly generated data Indvdual parameters of a two-gaussan mxture estmated by EM Two-Gaussan mxture estmated by EM 78 3

14 The same prncple can be extended to mxtures of other dstrbutons. E.g. Mxture of Laplacans: Laplacan parameters become m x x x x x b In a mxture of Gaussans and Laplacans, Gaussans use the Gaussan update rules, Laplacans use the Laplacan rule x x x x x m The EM algorthm s used whenever proper statstcal analyss of a phenomenon requres the nowledge of a hdden or mssng varable (or a set of hdden/mssng varables The hdden varable s often called a latent varable Some examples: Estmatng mxtures of dstrbutons Only data are observed. The ndvdual dstrbutons and mxng proportons must both be learnt. Estmatng the dstrbuton of data, when some attrbutes are mssng Estmatng the dynamcs of a system, based only on observatons that may be a complex functon of system state Solve ths problem: Caller rolls a dce and flps a con He calls out the number rolled f the con shows head Otherwse he calls the number+ Determne p(heads and p(number for the dce from a collecton of ouputs The dce and the con Heads or tal? 4 Tals count Heads count Unnown: Whether t was head or tals.. Caller rolls two dce He calls out the sum Determne dce from a collecton of ouputs 8 82 The two dce Fragmentaton can be herarchcal 3, 4,3 P ( Z Z, Z 2,2 2 Unnown: How to partton the number Count blue (3 += 3, 4 Count blue (2 += 2,2 4 Count blue ( +=,3 4 Z Z 2 Z 3 Z 4 E.g. mxture of mxtures Fragments are further fragmented.. Wor ths out

15 More later Wll see a couple of other nstances of the use of EM Wor out HMM tranng Assume state output dstrbutons are multnomals Assume they are Gaussan Assume Gaussan mxtures 85 5

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood