Clustering with Gaussian Mixtures

Similar documents
Clustering with Gaussian Mixtures

Clustering with Gaussian Mixtures

Advanced Machine Learning & Perception

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

Bayesian Networks Structure Learning (cont.)

Clustering (Bishop ch 9)

CHAPTER 10: LINEAR DISCRIMINATION

Machine Learning 2nd Edition

CHAPTER 5: MULTIVARIATE METHODS

Lecture VI Regression

Department of Economics University of Toronto

K-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Lecture 6: Learning for Control (Generalised Linear Regression)

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

Mixtures of Gaussians continued

Variants of Pegasos. December 11, 2009

CHAPTER 7: CLUSTERING

Reinforcement Learning

CHAPTER 2: Supervised Learning

( ) () we define the interaction representation by the unitary transformation () = ()

Fall 2010 Graduate Course on Dynamic Learning

Solution in semi infinite diffusion couples (error function analysis)

GMM parameter estimation. Xiaoye Lu CMPS290c Final Project

Econ107 Applied Econometrics Topic 5: Specification: Choosing Independent Variables (Studenmund, Chapter 6)

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

Advanced time-series analysis (University of Lund, Economic History Department)

Robustness Experiments with Two Variance Components

Machine Learning Linear Regression

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance

( t) Outline of program: BGC1: Survival and event history analysis Oslo, March-May Recapitulation. The additive regression model

( ) [ ] MAP Decision Rule

F-Tests and Analysis of Variance (ANOVA) in the Simple Linear Regression Model. 1. Introduction

Learning with Maximum Likelihood

Math 128b Project. Jude Yuen

Lecture 2 L n i e n a e r a M od o e d l e s

FI 3103 Quantum Physics

Computing Relevance, Similarity: The Vector Space Model

PHYS 1443 Section 001 Lecture #4

Hidden Markov Models Following a lecture by Andrew W. Moore Carnegie Mellon University

Discrete Markov Process. Introduction. Example: Balls and Urns. Stochastic Automaton. INTRODUCTION TO Machine Learning 3rd Edition

January Examinations 2012

Lecture 11 SVM cont

Normal Random Variable and its discriminant functions

Scattering at an Interface: Oblique Incidence

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s

Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms

Introduction to Boosting

CS 536: Machine Learning. Nonparametric Density Estimation Unsupervised Learning - Clustering

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data

Anomaly Detection. Lecture Notes for Chapter 9. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Structural Optimization Using Metamodels

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

WiH Wei He

Panel Data Regression Models

J i-1 i. J i i+1. Numerical integration of the diffusion equation (I) Finite difference method. Spatial Discretization. Internal nodes.

Mechanics Physics 151

Cubic Bezier Homotopy Function for Solving Exponential Equations

Learning Objectives. Self Organization Map. Hamming Distance(1/5) Introduction. Hamming Distance(3/5) Hamming Distance(2/5) 15/04/2015

Graduate Macroeconomics 2 Problem set 5. - Solutions

FTCS Solution to the Heat Equation

[ ] 2. [ ]3 + (Δx i + Δx i 1 ) / 2. Δx i-1 Δx i Δx i+1. TPG4160 Reservoir Simulation 2018 Lecture note 3. page 1 of 5

. The geometric multiplicity is dim[ker( λi. number of linearly independent eigenvectors associated with this eigenvalue.

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS

EM (cont.) November 26 th, Carlos Guestrin 1

Linear Response Theory: The connection between QFT and experiments

TSS = SST + SSE An orthogonal partition of the total SS

Let s treat the problem of the response of a system to an applied external force. Again,

THEORETICAL AUTOCORRELATIONS. ) if often denoted by γ. Note that

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

. The geometric multiplicity is dim[ker( λi. A )], i.e. the number of linearly independent eigenvectors associated with this eigenvalue.

UNIVERSITAT AUTÒNOMA DE BARCELONA MARCH 2017 EXAMINATION

Hidden Markov Models

Chapter Lagrangian Interpolation

On One Analytic Method of. Constructing Program Controls

Dishonest casino as an HMM

Testing a new idea to solve the P = NP problem with mathematical induction

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study)

Foundations of State Estimation Part II

Appendix to Online Clustering with Experts

RELATIONSHIP BETWEEN VOLATILITY AND TRADING VOLUME: THE CASE OF HSI STOCK RETURNS DATA

Displacement, Velocity, and Acceleration. (WHERE and WHEN?)

Fall 2009 Social Sciences 7418 University of Wisconsin-Madison. Problem Set 2 Answers (4) (6) di = D (10)

Least Squares Fitting (LSQF) with a complicated function Theexampleswehavelookedatsofarhavebeenlinearintheparameters

Lecture 18: The Laplace Transform (See Sections and 14.7 in Boas)

Matlab and Python programming: how to get started

An introduction to Support Vector Machine

Pendulum Dynamics. = Ft tangential direction (2) radial direction (1)

The Finite Element Method for the Analysis of Non-Linear and Dynamic Systems

Bayesian Inference of the GARCH model with Rational Errors

Filtrage particulaire et suivi multi-pistes Carine Hue Jean-Pierre Le Cadre and Patrick Pérez

NPTEL Project. Econometric Modelling. Module23: Granger Causality Test. Lecture35: Granger Causality Test. Vinod Gupta School of Management

5th International Conference on Advanced Design and Manufacturing Engineering (ICADME 2015)

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

EE 6885 Statistical Pattern Recognition

Let us start with a two dimensional case. We consider a vector ( x,

[Link to MIT-Lab 6P.1 goes here.] After completing the lab, fill in the following blanks: Numerical. Simulation s Calculations

(,,, ) (,,, ). In addition, there are three other consumers, -2, -1, and 0. Consumer -2 has the utility function

Machine Learning 4771

Track Properities of Normal Chain

Transcription:

Noe o oher eachers and users of hese sldes. Andrew would be delghed f you found hs source maeral useful n gvng your own lecures. Feel free o use hese sldes verbam, or o modfy hem o f your own needs. PowerPon orgnals are avalable. If you mae use of a sgnfcan poron of hese sldes n your own lecure, please nclude hs message, or he followng ln o he source reposory of Andrew s uorals: hp://www.cs.cmu.edu/~awm/uorals. Commens and correcons graefully receved. Cluserng wh Gaussan Mures Andrew W. Moore Assocae Professor School of Compuer Scence Carnege Mellon Unversy www.cs.cmu.edu/~awm awm@cs.cmu.edu 4-68-7599 Copyrgh, Andrew W. Moore Nov h,

Unsupervsed Learnng You wal no a bar. A sranger approaches and ells you: I ve go daa from classes. Each class produces observaons wh a normal dsrbuon and varance I. Sandard smple mulvarae gaussan assumpons. I can ell you all he P(w ) s. So far, loos sraghforward. I need a mamum lelhood esmae of he µ s. No problem: There s jus one hng. None of he daa are labeled. I have daapons, bu I don now wha class hey re from (any of hem!) Uh oh!! Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 3 Gaussan Bayes Classfer Remnder ) ( ) ( ) ( ) ( p y P y p y P ( ) ( ) ) ( ep ) ( ) ( / / µ Σ µ Σ p p y P T m π How do we deal wh ha?

Predcng wealh from age Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 4

Predcng wealh from age Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 5

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 6 Learnng modelyear, mpg ---> maer m m m m m L M O M M L L Σ

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 7 General: O(m ) parameers m m m m m L M O M M L L Σ

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 8 Algned: O(m) parameers m m 3 L L M M O M M M L L L Σ

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 9 Algned: O(m) parameers m m 3 L L M M O M M M L L L Σ

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde Sphercal: O() cov parameers L L M M O M M M L L L Σ

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde Sphercal: O() cov parameers L L M M O M M M L L L Σ

Mang a Classfer from a Densy Esmaor Caegorcal npus only Real-valued npus only Med Real / Ca oay Inpus Classfer Predc caegory Jon BC Naïve BC Gauss BC Dec Tree Inpus Densy Esmaor Probably Jon DE Naïve DE Gauss DE Inpus Regressor Predc real no. Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde

Ne bac o Densy Esmaon Wha f we wan o do densy esmaon wh mulmodal or clumpy daa? Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 3

The GMM assumpon There are componens. The h componen s called ω Componen ω has an assocaed mean vecor µ µ µ µ 3 Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 4

The GMM assumpon There are componens. The h componen s called ω Componen ω has an assocaed mean vecor µ Each componen generaes daa from a Gaussan wh mean µ and covarance mar I Assume ha each daapon s generaed accordng o he followng recpe: µ µ µ 3 Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 5

The GMM assumpon There are componens. The h componen s called ω Componen ω has an assocaed mean vecor µ Each componen generaes daa from a Gaussan wh mean µ and covarance mar I Assume ha each daapon s generaed accordng o he followng recpe:. Pc a componen a random. Choose componen wh probably P(ω ). µ Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 6

The GMM assumpon There are componens. The h componen s called ω Componen ω has an assocaed mean vecor µ Each componen generaes daa from a Gaussan wh mean µ and covarance mar I Assume ha each daapon s generaed accordng o he followng recpe:. Pc a componen a random. Choose componen wh probably P(ω ).. Daapon ~ N(µ, I ) µ Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 7

The General GMM assumpon There are componens. The h componen s called ω Componen ω has an assocaed mean vecor µ Each componen generaes daa from a Gaussan wh mean µ and covarance mar Σ Assume ha each daapon s generaed accordng o he followng recpe:. Pc a componen a random. Choose componen wh probably P(ω ).. Daapon ~ N(µ, Σ ) µ µ µ 3 Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 8

Unsupervsed Learnng: no as hard as loos Somemes easy Somemes mpossble and somemes n beween IN CASE YOU RE WONDERING WHAT THESE DIAGRAMS ARE, THEY SHOW -d UNLABELED DATA (X VECTORS) DISTRIBUTED IN -d SPACE. THE TOP ONE HAS THREE VERY CLEAR GAUSSIAN CENTERS Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 9

Compung lelhoods n unsupervsed case We have,, N We now P(w ) P(w ).. P(w ) We now P( w, µ, µ ) Prob ha an observaon from class w would have value gven class means µ µ Can we wre an epresson for ha? Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde

lelhoods n unsupervsed case We have n We have P(w ).. P(w ). We have. We can defne, for any, P( w, µ, µ.. µ ) Can we defne P( µ, µ.. µ )? Can we defne P(,,.. n µ, µ.. µ )? [YES, IF WE ASSUME THE X S WERE DRAWN INDEPENDENTLY] Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde

Unsupervsed Learnng: Medumly Good News We now have a procedure s.. f you gve me a guess a µ, µ.. µ, I can ell you he prob of he unlabeled daa gven hose µ s. Suppose s are -dmensonal. (From Duda and Har) There are wo classes; w and w P(w ) /3 P(w ) /3. There are 5 unlabeled daapons.68 -.59 3.35 4 3.949 : 5 -.7 Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde

Duda & Har s Eample Graph of log P(,.. 5 µ, µ ) agans µ ( ) and µ ( ) Ma lelhood (µ -.3, µ.668) Local mnmum, bu very close o global a (µ.85, µ -.57)* * corresponds o swchng w + w. Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 3

We can graph he prob. ds. funcon of daa gven our µ and µ esmaes. We can also graph he rue funcon from whch he daa was randomly generaed. Duda & Har s Eample They are close. Good. The nd soluon res o pu he /3 hump where he /3 hump should go, and vce versa. In hs eample unsupervsed s almos as good as supervsed. If he.. 5 are gven he class whch was used o learn hem, hen he resuls are (µ -.76, µ.684). Unsupervsed go (µ -.3, µ.668). Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 4

Fndng he ma lelhood µ,µ..µ We can compue P( daa µ,µ..µ ) How do we fnd he µ s whch gve ma. lelhood? The normal ma lelhood rc: Se log Prob (.) µ and solve for µ s. # Here you ge non-lnear non-analycallysolvable equaons Use graden descen Slow bu doable Use a much faser, cuer, and recenly very popular mehod Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 5

Epecaon Mamalzaon Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 6

DETOUR The E.M. Algorhm We ll ge bac o unsupervsed learnng soon. Bu now we ll loo a an even smpler case wh hdden nformaon. The EM algorhm Can do rval hngs, such as he conens of he ne few sldes. An ecellen way of dong our unsupervsed learnng problem, as we ll see. Many, many oher uses, ncludng nference of Hdden Marov Models (fuure lecure). Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 7

Slly Eample Le evens be grades n a class w Ges an A P(A) ½ w Ges a B P(B) µ w 3 Ges a C P(C) µ w 4 Ges a D P(D) ½-3µ (Noe µ /6) Assume we wan o esmae µ from daa. In a gven class here were a A s b B s c C s d D s Wha s he mamum lelhood esmae of µ gven a,b,c,d? Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 8

Slly Eample Le evens be grades n a class w Ges an A P(A) ½ w Ges a B P(B) µ w 3 Ges a C P(C) µ w 4 Ges a D P(D) ½-3µ (Noe µ /6) Assume we wan o esmae µ from daa. In a gven class here were a A s b B s c C s d D s Wha s he mamum lelhood esmae of µ gven a,b,c,d? Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 9

Trval Sascs P(A) ½ P(B) µ P(C) µ P(D) ½-3µ P( a,b,c,d µ) K(½) a (µ) b (µ) c (½-3µ) d log P( a,b,c,d µ) log K + alog ½ + blog µ + clog µ + dlog (½-3µ) FOR LogP µ Gves So f Ma MAX ma class le b µ µ LIKE + le go c µ µ µ, 6 SET 3d / 3µ A 4 b + LogP µ c ( b + c + d ) B 6 Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 3 C 9 D Borng, bu rue!

Same Problem wh Hdden Informaon Someone ells us ha Number of Hgh grades (A s + B s) h Number of C s c Number of D s d Wha s he ma. le esmae of µ now? REMEMBER P(A) ½ P(B) µ P(C) µ P(D) ½-3µ Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 3

Same Problem wh Hdden Informaon Someone ells us ha Number of Hgh grades (A s + B s) h Number of C s c Number of D s d Wha s he ma. le esmae of µ now? We can answer hs queson crcularly: EXPECTATION REMEMBER P(A) ½ P(B) µ P(C) µ P(D) ½-3µ If we now he value of µ we could compue he epeced value of a and b µ a h b + µ + µ Snce he rao a:b should be he same as he rao ½ : µ MAXIMIZATION If we now he epeced values of a and b we could compue he mamum lelhood value of µ µ 6 b + c ( b + c + d ) h Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 3

E.M. for our Trval Problem REMEMBER P(A) ½ We begn wh a guess for µ We erae beween EXPECTATION and MAXIMALIZATION o mprove our esmaes of µ and a and b. P(B) µ P(C) µ P(D) ½-3µ Defne µ() he esmae of µ on he h eraon µ( b() he esmae of b on h eraon µ() nal guess b( ) + ) 6 µ() h + µ( ) Ε b() + c ( b() + c + d ) ma le es of [ b µ( ) ] µ gven b() E-sep M-sep Connue erang unl converged. Good news: Convergng o local opmum s assured. Bad news: I sad local opmum. Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 33

E.M. Convergence Convergence proof based on fac ha Prob(daa µ) mus ncrease or reman same beween each eraon [NOT OBVIOUS] Bu can never eceed [OBVIOUS] So mus herefore converge [OBVIOUS] In our eample, suppose we had h c d µ() 3 µ().833.937.947 b().857 3.58 3.85 Convergence s generally lnear: error decreases by a consan facor each me sep. 4 5 6.948.948.948 3.87 3.87 3.87 Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 34

Bac o Unsupervsed Learnng of Remember: We have unlabeled daa R We now here are classes We now P(w ) P(w ) P(w 3 ) P(w ) We don now µ µ.. µ We can wre P( daa µ. µ ) p (... µ...µ ) R R j R p j ( µ...µ ) R p GMMs ( w,µ...µ ) P( w ) j K ep ( µ ) P( w ) j j j Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 35

For Ma lelhood we now Some wld' n'crazy algebra µ j R R P ( w,µ...µ ) P j ( w,µ...µ ) j E.M. for GMMs µ Ths s n nonlnear equaons n µ j s. log Pr ob ( daa µ...µ ) urns hs no :"For Ma lelhood, for each j, If, for each we new ha for each w j he prob ha µ j was n class w j s P(w j,µ µ ) Then we would easly compue µ j. If we new each µ j hen we could easly compue P(w j,µ µ j ) for each w j and. I feel an EM eperence comng on!! Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 36

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 37 E.M. for GMMs Ierae. On he h eraon le our esmaes be λ { µ (), µ () µ c () } E-sep Compue epeced classes of all daapons for each class ( ) ( ) ( ) ( ) ( ) ( ) c j j j j p w p w w w w ) ( ), (, p ) ( ), (, p p P, p, P I I µ µ λ λ λ λ M-sep. Compue Ma. le µ gven our daa s class membershp dsrbuons ( ) ( ) ( ) + w w λ λ, P, P µ Jus evaluae a Gaussan a

E.M. Convergence Your lecurer wll (unless ou of me) gve you a nce nuve eplanaon of why hs rule wors. As wh all EM procedures, convergence o a local opmum guaraneed. Ths algorhm s REALLY USED. And n hgh dmensonal sae spaces, oo. E.G. Vecor Quanzaon for Speech Daa Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 38

Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 39 E.M. for General GMMs Ierae. On he h eraon le our esmaes be λ { µ (), µ () µ c (), Σ (), Σ () Σ c (), p (), p () p c () } E-sep Compue epeced classes of all daapons for each class ( ) ( ) ( ) ( ) ( ) ( ) Σ Σ c j j j j j p w p w w w w ) ( ) ( ), (, p ) ( ) ( ), (, p p P, p, P µ µ λ λ λ λ M-sep. Compue Ma. le µ gven our daa s class membershp dsrbuons p () s shorhand for esmae of P(ω ) on h eraon ( ) ( ) ( ) + w w λ λ, P, P µ ( ) ( ) ( ) [ ] ( ) [ ] ( ) + + + Σ T w w λ µ µ λ, P, P ( ) ( ) R w p +,λ P R #records Jus evaluae a Gaussan a

Gaussan Mure Eample: Sar Advance apologes: n Blac and Whe hs eample wll be ncomprehensble Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 4

Afer frs eraon Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 4

Afer nd eraon Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 4

Afer 3rd eraon Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 43

Afer 4h eraon Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 44

Afer 5h eraon Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 45

Afer 6h eraon Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 46

Afer h eraon Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 47

Some Bo Assay daa Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 48

GMM cluserng of he assay daa Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 49

Resulng Densy Esmaor Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 5

Where are we now? Inpus Inference Engne Learn P(E E ) Jon DE, Bayes Ne Srucure Learnng Inpus Classfer Predc caegory Dec Tree, Sgmod Percepron, Sgmod N.Ne, Gauss/Jon BC, Gauss Naïve BC, N.Negh, Bayes Ne Based BC, Cascade Correlaon Inpus Densy Esmaor Probably Jon DE, Naïve DE, Gauss/Jon DE, Gauss Naïve DE, Bayes Ne Srucure Learnng, GMMs Inpus Regressor Predc real no. Lnear Regresson, Polynomal Regresson, Percepron, Neural Ne, N.Negh, Kernel, LWR, RBFs, Robus Regresson, Cascade Correlaon, Regresson Trees, GMDH, Mullnear Inerp, MARS Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 5

The old rc Inpus Inference Engne Learn P(E E ) Jon DE, Bayes Ne Srucure Learnng Inpus Classfer Predc caegory Dec Tree, Sgmod Percepron, Sgmod N.Ne, Gauss/Jon BC, Gauss Naïve BC, N.Negh, Bayes Ne Based BC, Cascade Correlaon, GMM-BC Inpus Densy Esmaor Probably Jon DE, Naïve DE, Gauss/Jon DE, Gauss Naïve DE, Bayes Ne Srucure Learnng, GMMs Inpus Regressor Predc real no. Lnear Regresson, Polynomal Regresson, Percepron, Neural Ne, N.Negh, Kernel, LWR, RBFs, Robus Regresson, Cascade Correlaon, Regresson Trees, GMDH, Mullnear Inerp, MARS Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 5

Three classes of assay (each learned wh s own mure model) (Sorry, hs wll agan be sem-useless n blac and whe) Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 53

Resulng Bayes Classfer Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 54

Resulng Bayes Classfer, usng poseror probables o aler abou ambguy and anomalousness Yellow means anomalous Cyan means ambguous Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 55

Unsupervsed learnng wh symbolc arbues mssng NATION # KIDS MARRIED I s jus a learnng Bayes ne wh nown srucure bu hdden values problem. Can use Graden Descen. EASY, fun eercse o do an EM formulaon for hs case oo. Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 56

Fnal Commens Remember, E.M. can ge suc n local mnma, and emprcally DOES. Our unsupervsed learnng eample assumed P(w ) s nown, and varances fed and nown. Easy o rela hs. I s possble o do Bayesan unsupervsed learnng nsead of ma. lelhood. There are oher algorhms for unsupervsed learnng. We ll vs K-means soon. Herarchcal cluserng s also neresng. Neural-ne algorhms called compeve learnng urn ou o have neresng parallels wh he EM mehod we saw. Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 57

Wha you should now How o learn mamum lelhood parameers (locally ma. le.) n he case of unlabeled daa. Be happy wh hs nd of probablsc analyss. Undersand he wo eamples of E.M. gven n hese noes. For more nfo, see Duda + Har. I s a grea boo. There s much more n he boo han n your handou. Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 58

Oher unsupervsed learnng mehods K-means (see ne lecure) Herarchcal cluserng (e.g. Mnmum spannng rees) (see ne lecure) Prncpal Componen Analyss smple, useful ool Non-lnear PCA Neural Auo-Assocaors Locally weghed PCA Ohers Copyrgh, Andrew W. Moore Cluserng wh Gaussan Mures: Slde 59