Clustering with Gaussian Mixtures

Noe o oher eachers and users of hese sldes. Andrew would be delghed f you found hs source maeral useful n gvng your own lecures. Feel free o use hese sldes verbam, or o modfy hem o f your own needs. PowerPon orgnals are avalable. If you mae use of a sgnfcan poron of hese sldes n your own lecure, please nclude hs message, or he followng ln o he source reposory of Andrew s uorals: hp://www.cs.cmu.edu/~awm/uorals. Commens and correcons graefully receved. Cluserng wh Gaussan Mures Andrew W. Moore Assocae Professor School of Compuer Scence Carnege Mellon Unversy www.cs.cmu.edu/~awm awm@cs.cmu.edu 4-68-7599 Copyrgh, Andrew W. Moore Nov h,

Unsupervsed Learnng You wal no a bar. A sranger approaches and ells you: I ve go daa from classes. Each class produces observaons wh a normal dsrbuon and varance I. Sandard smple mulvarae gaussan assumpons. I can ell you all he P(w ) s. So far, loos sraghforward. I need a mamum lelhood esmae of he µ s. No problem: There s jus one hng. None of he daa are labeled. I have daapons, bu I don now wha class hey re from (any of hem!) Uh oh!! Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 3 Gaussan Bayes Classfer Remnder ) ( ) ( ) ( ) ( p y P y p y P ( ) ( ) ) ( ep ) ( ) ( / / µ Σ µ Σ p p y P T m π How do we deal wh ha?

Predcng wealh from age Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 4

Predcng wealh from age Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 5

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 6 Learnng modelyear, mpg ---> maer m m m m m L M O M M L L Σ

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 7 General: O(m ) parameers m m m m m L M O M M L L Σ

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 8 Algned: O(m) parameers m m 3 L L M M O M M M L L L Σ

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 9 Algned: O(m) parameers m m 3 L L M M O M M M L L L Σ

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde Sphercal: O() cov parameers L L M M O M M M L L L Σ

Mang a Classfer from a Densy Esmaor Caegorcal npus only Real-valued npus only Med Real / Ca oay Inpus Classfer Predc caegory Jon BC Naïve BC Gauss BC Dec Tree Inpus Densy Esmaor Probably Jon DE Naïve DE Gauss DE Inpus Regressor Predc real no. Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde

Ne bac o Densy Esmaon Wha f we wan o do densy esmaon wh mulmodal or clumpy daa? Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 3

The GMM assumpon There are componens. The h componen s called ω Componen ω has an assocaed mean vecor µ µ µ µ 3 Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 4

The GMM assumpon There are componens. The h componen s called ω Componen ω has an assocaed mean vecor µ Each componen generaes daa from a Gaussan wh mean µ and covarance mar I Assume ha each daapon s generaed accordng o he followng recpe: µ µ µ 3 Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 5

The GMM assumpon There are componens. The h componen s called ω Componen ω has an assocaed mean vecor µ Each componen generaes daa from a Gaussan wh mean µ and covarance mar I Assume ha each daapon s generaed accordng o he followng recpe:. Pc a componen a random. Choose componen wh probably P(ω ). µ Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 6

The GMM assumpon There are componens. The h componen s called ω Componen ω has an assocaed mean vecor µ Each componen generaes daa from a Gaussan wh mean µ and covarance mar I Assume ha each daapon s generaed accordng o he followng recpe:. Pc a componen a random. Choose componen wh probably P(ω ).. Daapon ~ N(µ, I ) µ Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 7

The General GMM assumpon There are componens. The h componen s called ω Componen ω has an assocaed mean vecor µ Each componen generaes daa from a Gaussan wh mean µ and covarance mar Σ Assume ha each daapon s generaed accordng o he followng recpe:. Pc a componen a random. Choose componen wh probably P(ω ).. Daapon ~ N(µ, Σ ) µ µ µ 3 Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 8

Unsupervsed Learnng: no as hard as loos Somemes easy Somemes mpossble and somemes n beween IN CASE YOU RE WONDERING WHAT THESE DIAGRAMS ARE, THEY SHOW -d UNLABELED DATA (X VECTORS) DISTRIBUTED IN -d SPACE. THE TOP ONE HAS THREE VERY CLEAR GAUSSIAN CENTERS Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 9

Compung lelhoods n unsupervsed case We have,, N We now P(w ) P(w ).. P(w ) We now P( w, µ, µ ) Prob ha an observaon from class w would have value gven class means µ µ Can we wre an epresson for ha? Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde

lelhoods n unsupervsed case We have n We have P(w ).. P(w ). We have. We can defne, for any, P( w, µ, µ.. µ ) Can we defne P( µ, µ.. µ )? Can we defne P(,,.. n µ, µ.. µ )? [YES, IF WE ASSUME THE X S WERE DRAWN INDEPENDENTLY] Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde

Unsupervsed Learnng: Medumly Good News We now have a procedure s.. f you gve me a guess a µ, µ.. µ, I can ell you he prob of he unlabeled daa gven hose µ s. Suppose s are -dmensonal. (From Duda and Har) There are wo classes; w and w P(w ) /3 P(w ) /3. There are 5 unlabeled daapons.68 -.59 3.35 4 3.949 : 5 -.7 Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde

Duda & Har s Eample Graph of log P(,.. 5 µ, µ ) agans µ ( ) and µ ( ) Ma lelhood (µ -.3, µ.668) Local mnmum, bu very close o global a (µ.85, µ -.57)* * corresponds o swchng w + w. Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 3

We can graph he prob. ds. funcon of daa gven our µ and µ esmaes. We can also graph he rue funcon from whch he daa was randomly generaed. Duda & Har s Eample They are close. Good. The nd soluon res o pu he /3 hump where he /3 hump should go, and vce versa. In hs eample unsupervsed s almos as good as supervsed. If he.. 5 are gven he class whch was used o learn hem, hen he resuls are (µ -.76, µ.684). Unsupervsed go (µ -.3, µ.668). Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 4

Fndng he ma lelhood µ,µ..µ We can compue P( daa µ,µ..µ ) How do we fnd he µ s whch gve ma. lelhood? The normal ma lelhood rc: Se log Prob (.) µ and solve for µ s. # Here you ge non-lnear non-analycallysolvable equaons Use graden descen Slow bu doable Use a much faser, cuer, and recenly very popular mehod Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 5

Epecaon Mamalzaon Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 6

DETOUR The E.M. Algorhm We ll ge bac o unsupervsed learnng soon. Bu now we ll loo a an even smpler case wh hdden nformaon. The EM algorhm Can do rval hngs, such as he conens of he ne few sldes. An ecellen way of dong our unsupervsed learnng problem, as we ll see. Many, many oher uses, ncludng nference of Hdden Marov Models (fuure lecure). Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 7

Slly Eample Le evens be grades n a class w Ges an A P(A) ½ w Ges a B P(B) µ w 3 Ges a C P(C) µ w 4 Ges a D P(D) ½-3µ (Noe µ /6) Assume we wan o esmae µ from daa. In a gven class here were a A s b B s c C s d D s Wha s he mamum lelhood esmae of µ gven a,b,c,d? Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 8

Trval Sascs P(A) ½ P(B) µ P(C) µ P(D) ½-3µ P( a,b,c,d µ) K(½) a (µ) b (µ) c (½-3µ) d log P( a,b,c,d µ) log K + alog ½ + blog µ + clog µ + dlog (½-3µ) FOR LogP µ Gves So f Ma MAX ma class le b µ µ LIKE + le go c µ µ µ, 6 SET 3d / 3µ A 4 b + LogP µ c ( b + c + d ) B 6 Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 3 C 9 D Borng, bu rue!

Same Problem wh Hdden Informaon Someone ells us ha Number of Hgh grades (A s + B s) h Number of C s c Number of D s d Wha s he ma. le esmae of µ now? REMEMBER P(A) ½ P(B) µ P(C) µ P(D) ½-3µ Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 3

Same Problem wh Hdden Informaon Someone ells us ha Number of Hgh grades (A s + B s) h Number of C s c Number of D s d Wha s he ma. le esmae of µ now? We can answer hs queson crcularly: EXPECTATION REMEMBER P(A) ½ P(B) µ P(C) µ P(D) ½-3µ If we now he value of µ we could compue he epeced value of a and b µ a h b + µ + µ Snce he rao a:b should be he same as he rao ½ : µ MAXIMIZATION If we now he epeced values of a and b we could compue he mamum lelhood value of µ µ 6 b + c ( b + c + d ) h Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 3

E.M. for our Trval Problem REMEMBER P(A) ½ We begn wh a guess for µ We erae beween EXPECTATION and MAXIMALIZATION o mprove our esmaes of µ and a and b. P(B) µ P(C) µ P(D) ½-3µ Defne µ() he esmae of µ on he h eraon µ( b() he esmae of b on h eraon µ() nal guess b( ) + ) 6 µ() h + µ( ) Ε b() + c ( b() + c + d ) ma le es of [ b µ( ) ] µ gven b() E-sep M-sep Connue erang unl converged. Good news: Convergng o local opmum s assured. Bad news: I sad local opmum. Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 33

E.M. Convergence Convergence proof based on fac ha Prob(daa µ) mus ncrease or reman same beween each eraon [NOT OBVIOUS] Bu can never eceed [OBVIOUS] So mus herefore converge [OBVIOUS] In our eample, suppose we had h c d µ() 3 µ().833.937.947 b().857 3.58 3.85 Convergence s generally lnear: error decreases by a consan facor each me sep. 4 5 6.948.948.948 3.87 3.87 3.87 Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 34

Bac o Unsupervsed Learnng of Remember: We have unlabeled daa R We now here are classes We now P(w ) P(w ) P(w 3 ) P(w ) We don now µ µ.. µ We can wre P( daa µ. µ ) p (... µ...µ ) R R j R p j ( µ...µ ) R p GMMs ( w,µ...µ ) P( w ) j K ep ( µ ) P( w ) j j j Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 35

For Ma lelhood we now Some wld' n'crazy algebra µ j R R P ( w,µ...µ ) P j ( w,µ...µ ) j E.M. for GMMs µ Ths s n nonlnear equaons n µ j s. log Pr ob ( daa µ...µ ) urns hs no :"For Ma lelhood, for each j, If, for each we new ha for each w j he prob ha µ j was n class w j s P(w j,µ µ ) Then we would easly compue µ j. If we new each µ j hen we could easly compue P(w j,µ µ j ) for each w j and. I feel an EM eperence comng on!! Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 36

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 37 E.M. for GMMs Ierae. On he h eraon le our esmaes be λ { µ (), µ () µ c () } E-sep Compue epeced classes of all daapons for each class ( ) ( ) ( ) ( ) ( ) ( ) c j j j j p w p w w w w ) ( ), (, p ) ( ), (, p p P, p, P I I µ µ λ λ λ λ M-sep. Compue Ma. le µ gven our daa s class membershp dsrbuons ( ) ( ) ( ) + w w λ λ, P, P µ Jus evaluae a Gaussan a

E.M. Convergence Your lecurer wll (unless ou of me) gve you a nce nuve eplanaon of why hs rule wors. As wh all EM procedures, convergence o a local opmum guaraneed. Ths algorhm s REALLY USED. And n hgh dmensonal sae spaces, oo. E.G. Vecor Quanzaon for Speech Daa Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 38

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 39 E.M. for General GMMs Ierae. On he h eraon le our esmaes be λ { µ (), µ () µ c (), Σ (), Σ () Σ c (), p (), p () p c () } E-sep Compue epeced classes of all daapons for each class ( ) ( ) ( ) ( ) ( ) ( ) Σ Σ c j j j j j p w p w w w w ) ( ) ( ), (, p ) ( ) ( ), (, p p P, p, P µ µ λ λ λ λ M-sep. Compue Ma. le µ gven our daa s class membershp dsrbuons p () s shorhand for esmae of P(ω ) on h eraon ( ) ( ) ( ) + w w λ λ, P, P µ ( ) ( ) ( ) [ ] ( ) [ ] ( ) + + + Σ T w w λ µ µ λ, P, P ( ) ( ) R w p +,λ P R #records Jus evaluae a Gaussan a

Gaussan Mure Eample: Sar Advance apologes: n Blac and Whe hs eample wll be ncomprehensble Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 4

Afer frs eraon Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 4

Afer nd eraon Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 4

Afer 3rd eraon Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 43

Afer 4h eraon Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 44

Afer 5h eraon Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 45

Afer 6h eraon Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 46

Afer h eraon Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 47

Some Bo Assay daa Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 48

GMM cluserng of he assay daa Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 49

Resulng Densy Esmaor Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 5

Where are we now? Inpus Inference Engne Learn P(E E ) Jon DE, Bayes Ne Srucure Learnng Inpus Classfer Predc caegory Dec Tree, Sgmod Percepron, Sgmod N.Ne, Gauss/Jon BC, Gauss Naïve BC, N.Negh, Bayes Ne Based BC, Cascade Correlaon Inpus Densy Esmaor Probably Jon DE, Naïve DE, Gauss/Jon DE, Gauss Naïve DE, Bayes Ne Srucure Learnng, GMMs Inpus Regressor Predc real no. Lnear Regresson, Polynomal Regresson, Percepron, Neural Ne, N.Negh, Kernel, LWR, RBFs, Robus Regresson, Cascade Correlaon, Regresson Trees, GMDH, Mullnear Inerp, MARS Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 5

The old rc Inpus Inference Engne Learn P(E E ) Jon DE, Bayes Ne Srucure Learnng Inpus Classfer Predc caegory Dec Tree, Sgmod Percepron, Sgmod N.Ne, Gauss/Jon BC, Gauss Naïve BC, N.Negh, Bayes Ne Based BC, Cascade Correlaon, GMM-BC Inpus Densy Esmaor Probably Jon DE, Naïve DE, Gauss/Jon DE, Gauss Naïve DE, Bayes Ne Srucure Learnng, GMMs Inpus Regressor Predc real no. Lnear Regresson, Polynomal Regresson, Percepron, Neural Ne, N.Negh, Kernel, LWR, RBFs, Robus Regresson, Cascade Correlaon, Regresson Trees, GMDH, Mullnear Inerp, MARS Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 5

Three classes of assay (each learned wh s own mure model) (Sorry, hs wll agan be sem-useless n blac and whe) Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 53

Resulng Bayes Classfer Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 54

Resulng Bayes Classfer, usng poseror probables o aler abou ambguy and anomalousness Yellow means anomalous Cyan means ambguous Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 55

Unsupervsed learnng wh symbolc arbues mssng NATION # KIDS MARRIED I s jus a learnng Bayes ne wh nown srucure bu hdden values problem. Can use Graden Descen. EASY, fun eercse o do an EM formulaon for hs case oo. Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 56

Fnal Commens Remember, E.M. can ge suc n local mnma, and emprcally DOES. Our unsupervsed learnng eample assumed P(w ) s nown, and varances fed and nown. Easy o rela hs. I s possble o do Bayesan unsupervsed learnng nsead of ma. lelhood. There are oher algorhms for unsupervsed learnng. We ll vs K-means soon. Herarchcal cluserng s also neresng. Neural-ne algorhms called compeve learnng urn ou o have neresng parallels wh he EM mehod we saw. Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 57

Wha you should now How o learn mamum lelhood parameers (locally ma. le.) n he case of unlabeled daa. Be happy wh hs nd of probablsc analyss. Undersand he wo eamples of E.M. gven n hese noes. For more nfo, see Duda + Har. I s a grea boo. There s much more n he boo han n your handou. Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 58

Oher unsupervsed learnng mehods K-means (see ne lecure) Herarchcal cluserng (e.g. Mnmum spannng rees) (see ne lecure) Prncpal Componen Analyss smple, useful ool Non-lnear PCA Neural Auo-Assocaors Locally weghed PCA Ohers Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 59