Clustering with Gaussian Mixtures
|
|
- Kenneth Cameron
- 5 years ago
- Views:
Transcription
1 Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your own needs. PowerPont orgnals are avalable. If you mae use of a sgnfcant porton of these sldes n your own lecture, please nclude ths message, or the followng ln to the source repostory of Andrew s tutorals: Comments and correctons gratefully receved. Clusterng wth Gaussan Mxtures Andrew W. Moore Assocate Professor School of Computer Scence Carnege Mellon Unversty awm@cs.cmu.edu Unsupervsed Learnng You wal nto a bar. A stranger approaches and tells you: I ve got data from classes. Each class produces observatons wth a normal dstrbuton and varance σ I. Standard smple multvarate gaussan assumptons. I can tell you all the P(w s. So far, loos straghtforward. I need a maxmum lelhood estmate of the s. No problem: There s ust one thng. None of the data are labeled. I have dataponts, but I don t now what class they re from (any of them! Uh oh!! Copyrght 00, Andrew W. Moore Nov 0th, 00 Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde Some data from a GMM The GMM assumpton There are components. The th component s called ω Component ω has an assocated mean vector 3 Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 3 Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 4 The GMM assumpton There are components. The th component s called ω Component ω has an assocated mean vector Each component generates data from a Gaussan wth mean and covarance matrx σ I Assume that each datapont s generated accordng to the followng recpe: 3 The GMM assumpton There are components. The th component s called ω Component ω has an assocated mean vector Each component generates data from a Gaussan wth mean and covarance matrx σ I Assume that each datapont s generated accordng to the followng recpe:. Pc a component at random. Choose component wth probablty P(ω. Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 5 Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 6
2 The GMM assumpton There are components. The th component s called ω Component ω has an assocated mean vector Each component generates data from a Gaussan wth mean and covarance matrx σ I Assume that each datapont s generated accordng to the followng recpe:. Pc a component at random. Choose component wth probablty P(ω.. Datapont ~ N(, σ I Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 7 x The General GMM assumpton There are components. The th component s called ω Component ω has an assocated mean vector Each component generates data from a Gaussan wth mean and covarance matrx Σ Assume that each datapont s generated accordng to the followng recpe:. Pc a component at random. Choose component wth probablty P(ω.. Datapont ~ N(, Σ Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 8 3 Unsupervsed Learnng: not as hard as t loos Sometmes easy Sometmes mpossble and sometmes n between IN CASE YOU E WONDEING WHAT THESE DIAGAMS AE, THEY SHOW -d UNLABELED DATA (X VECTOS DISTIBUTED IN -d SPACE. THE TOP ONE HAS THEE VEY CLEA GAUSSIAN CENTES Computng lelhoods n unsupervsed case We have x, x, x N We now P(w P(w.. P(w We now σ P(x w,, Prob that an observaton from class w would have value x gven class means x Can we wrte an expresson for that? Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 9 Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 0 lelhoods n unsupervsed case We have x x x n We have P(w.. P(w. We have σ. We can defne, for any x, P(x w,,.. Unsupervsed Learnng: Medumly Good News We now have a procedure s.t. f you gve me a guess at,.., I can tell you the prob of the unlabeled data gven those s. Can we defne P(x,..? Can we defne P(x, x,.. x n,..? [YES, IF WE ASSUME THE X S WEE DAWN INDEPENDENTLY] Suppose x s are -dmensonal. There are two classes; w and w P(w /3 P(w /3 σ. There are 5 unlabeled dataponts x x x x : x (From Duda and Hart Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde
3 Duda & Hart s Example Graph of log P(x, x.. x 5, aganst ( and ( Max lelhood ( -.3,.668 Local mnmum, but very close to global at (.085, -.57* * corresponds to swtchng w + w. Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 3 We can graph the prob. dst. functon of data gven our and estmates. We can also graph the true functon from whch the data was randomly generated. Duda & Hart s Example They are close. Good. The nd soluton tres to put the /3 hump where the /3 hump should go, and vce versa. In ths example unsupervsed s almost as good as supervsed. If the x.. x 5 are gven the class whch was used to learn them, then the results are ( -.76,.684. Unsupervsed got ( -.3,.668. Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 4 Fndng the max lelhood,.. We can compute P( data,.. How do we fnd the s whch gve max. lelhood? The normal max lelhood trc: Set log Prob (. 0 and solve for s. # Here you get non-lnear non-analytcallysolvable equatons Use gradent descent Slow but doable Use a much faster, cuter, and recently very popular method Expectaton Maxmalzaton Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 5 Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 6 DETOU The E.M. Algorthm We ll get bac to unsupervsed learnng soon. But now we ll loo at an even smpler case wth hdden nformaton. The EM algorthm Can do trval thngs, such as the contents of the next few sldes. An excellent way of dong our unsupervsed learnng problem, as we ll see. Many, many other uses, ncludng nference of Hdden Marov Models (future lecture. Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 7 Slly Example Let events be grades n a class w Gets an A P(A ½ w Gets a B P(B w 3 Gets a C P(C w 4 Gets a D P(D ½-3 (Note 0 /6 Assume we want to estmate from data. In a gven class there were a A s b B s c C s d D s What s the maxmum lelhood estmate of gven a,b,c,d? Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 8 3
4 Computng P(A ½ P(B P(C P(D ½-3 P( a,b,c,d K(½ a ( b ( c (½-3 d log P( a,b,c,d log K + alog ½ + blog + clog + dlog (½-3 LogP FO MAX LIKE, SET 0 LogP b c 3d + 0 / 3 b + c Gves max le 6( b + c + d So f class got A B C Max le Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 9 9 D 0 Same Problem wth Hdden Informaton Someone tells us that Number of Hgh grades (A s + B s h Number of C s c Number of D s d What s the max. le estmate of now? EMEMBE P(A ½ P(B P(C P(D ½-3 Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 0 Same Problem wth Hdden Informaton Someone tells us that Number of Hgh grades (A s + B s h Number of C s c Number of D s d What s the max. le estmate of now? We can answer ths queston crcularly: EMEMBE P(A ½ P(B P(C P(D ½-3 EXPECTATION If we now the value of we could compute the expected value of a and b a Snce the rato a:b should be the same as the rato ½ : h b h + + MAXIMIZATION If we now the expected values of a and b we could compute the maxmum lelhood b + c value of 6 b + c + d ( Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde E.M. for our Trval Problem We begn wth a guess for We terate between EXPECTATION and MAXIMALIZATION to mprove our estmates of and a and b. Defne (t the estmate of on the t th b(t the estmate of b on t th (0 ntal guess b( t (t h Ε + ( t b( t + c ( b( t + c + d [ b ( t ] ( t + 6 max le est of gven b ( t E-step M-step Contnue teratng untl converged. Good news: Convergng to local optmum s assured. Bad news: I sad local optmum. EMEMBE P(A ½ P(B P(C P(D ½-3 Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde E.M. Convergence Convergence proof based on fact that Prob(data must ncrease or reman same between each [NOT OBVIOUS] But t can never exceed [OBVIOUS] So t must therefore converge [OBVIOUS] In our example, suppose we had h 0 c 0 d 0 (0 0 t (t b(t Convergence s generally lnear: error decreases by a constant factor each tme step Copyrght 00, Andrew W. Moore Clusterng wth Mxtures: Gaussan Slde 3 Bac to Unsupervsed Learnng of GMMs emember: We have unlabeled data x x x We now there are classes We now P(w P(w P(w 3 P(w We don t now.. We can wrte P( data. p ( x... x... p( x... p( x w,... P( w K exp σ ( x P( w Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 4 4
5 P( w x,... x P( w x,... E.M. for GMMs For Max lelhood we now log Pr ob ( data... 0 Some wld' n'crazy algebra turns ths nto :"For Max lelhood, for each, Ths s n nonlnear equatons n s. If, for each x we new that for each w the prob that was n class w s P(w x, Then we would easly compute. If we new each then we could easly compute P(w x, for each w and x. Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 5 I feel an EM experence comng on!! E.M. for GMMs Iterate. On the t th let our estmates be λ t { (t, (t c (t } E-step Compute expected classes of all dataponts for each class ( w x, λ ( x w P( w λt p( x λ p p( x w, ( t, σ I p p( x w, ( t, σ I ( t P t c t p ( t M-step. Compute Max. le gven our data s class membershp dstrbutons x x ( t + x Just evaluate a Gaussan at x Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 6 E.M. Convergence As wth all EM procedures, convergence to a local optmum guaranteed. Ths algorthm s EALLY USED. And n hgh dmensonal state spaces, too. E.G. Vector Quantzaton for Speech Data Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 7 E.M. for General GMMs Iterate. On the t th let our estmates be p (t s shorthand for estmate of P(ω on t th λ t { (t, (t c (t, Σ (t, Σ (t Σ c (t, p (t, p (t p c (t } E-step Just evaluate a Gaussan at Compute expected classes of all dataponts for each class x p ( ( x w P( w λt p( x w, ( t, Σ ( t p ( t P w x c p( x λt p( x w, ( t, Σ ( t p ( t M-step. Compute Max. le gven our data s class membershp dstrbutons T x x x [ x ( t + ][ x ( t + ] ( t + Σ ( t + x x x,λt p ( t + #records Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 8 Gaussan Mxture Example: Start After frst Advance apologes: n Blac and Whte ths example wll be ncomprehensble Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 9 Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 30 5
6 After nd After 3rd Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 3 Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 3 After 4th After 5th Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 33 Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 34 After 6th After 0th Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 35 Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 36 6
7 Some Bo Assay data GMM clusterng of the assay data Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 37 Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 38 esultng Densty Estmator Three classes of assay (each learned wth t s own mxture model (Sorry, ths wll agan be sem-useless n blac and whte Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 39 Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 40 esultng Bayes Classfer esultng Bayes Classfer, usng posteror probabltes to alert about ambguty and anomalousness Yellow means anomalous Cyan means ambguous Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 4 Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 4 7
8 Unsupervsed learnng wth symbolc attrbutes NATION mssng MAIED # KIDS It s ust a learnng Bayes net wth nown structure but hdden values problem. Can use Gradent Descent. EASY, fun exercse to do an EM formulaton for ths case too. Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 43 Fnal Comments emember, E.M. can get stuc n local mnma, and emprcally t DOES. Our unsupervsed learnng example assumed P(w s nown, and varances fxed and nown. Easy to relax ths. It s possble to do Bayesan unsupervsed learnng nstead of max. lelhood. There are other algorthms for unsupervsed learnng. We ll vst K-means soon. Herarchcal clusterng s also nterestng. Neural-net algorthms called compettve learnng turn out to have nterestng parallels wth the EM method we saw. Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 44 What you should now How to learn maxmum lelhood parameters (locally max. le. n the case of unlabeled data. Be happy wth ths nd of probablstc analyss. Understand the two examples of E.M. gven n these notes. For more nfo, see Duda + Hart. It s a great boo. There s much more n the boo than n your handout. Other unsupervsed learnng methods K-means (see next lecture Herarchcal clusterng (e.g. Mnmum spannng trees (see next lecture Prncpal Component Analyss smple, useful tool Non-lnear PCA Neural Auto-Assocators Locally weghted PCA Others Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 45 Copyrght 00, Andrew W. Moore Clusterng wth Gaussan Mxtures: Slde 46 8
Clustering with Gaussian Mixtures
Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your
More informationThe conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above
The conjugate pror to a Bernoull s A) Bernoull B) Gaussan C) Beta D) none of the above The conjugate pror to a Gaussan s A) Bernoull B) Gaussan C) Beta D) none of the above MAP estmates A) argmax θ p(θ
More informationLearning with Maximum Likelihood
Learnng wth Mamum Lelhood Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm,
More informationBayesian Networks Structure Learning (cont.)
Koller & Friedman Chapters (handed out): Chapter 11 (short) Chapter 1: 1.1, 1., 1.3 (covered in the beginning of semester) 1.4 (Learning parameters for BNs) Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic
More informationInstance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification
Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n
More informationK-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1
EM Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 19 th, 2007 2005-2007 Carlos Guestrin 1 K-means 1. Ask user how many clusters they d like. e.g. k=5 2. Randomly guess
More informationHidden Markov Models
Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your
More informationSpace of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics
/7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space
More informationClustering with Gaussian Mixtures
Noe o oher eachers and users of hese sldes. Andrew would be delghed f you found hs source maeral useful n gvng your own lecures. Feel free o use hese sldes verbam, or o modfy hem o f your own needs. PowerPon
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationMixtures of Gaussians continued
Mixtures of Gaussians continued Machine Learning CSE446 Carlos Guestrin University of Washington May 17, 2013 1 One) bad case for k-means n Clusters may overlap n Some clusters may be wider than others
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationEM (cont.) November 26 th, Carlos Guestrin 1
EM (cont.) Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 26 th, 2007 1 Silly Example Let events be grades in a class w 1 = Gets an A P(A) = ½ w 2 = Gets a B P(B) = µ
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationExpectation Maximization Mixture Models HMMs
-755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationLecture Nov
Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More informationClustering & Unsupervised Learning
Clusterng & Unsupervsed Learnng Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 2012 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationOther NN Models. Reinforcement learning (RL) Probabilistic neural networks
Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationClustering & (Ken Kreutz-Delgado) UCSD
Clusterng & Unsupervsed Learnng Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y ), fnd an approxmatng
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More information1 GSW Iterative Techniques for y = Ax
1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn
More informationOverview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition
Overvew Hdden Marov Models and Gaussan Mxture Models Steve Renals and Peter Bell Automatc Speech Recognton ASR Lectures &5 8/3 January 3 HMMs and GMMs Key models and algorthms for HMM acoustc models Gaussans
More informationMixture o f of Gaussian Gaussian clustering Nov
Mture of Gaussan clusterng Nov 11 2009 Soft vs hard lusterng Kmeans performs Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng:
More informationSemi-Supervised Learning
Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More information9 : Learning Partially Observed GM : EM Algorithm
10-708: Probablstc Graphcal Models 10-708, Sprng 2012 9 : Learnng Partally Observed GM : EM Algorthm Lecturer: Erc P. Xng Scrbes: Mrnmaya Sachan, Phan Gadde, Vswanathan Srpradha 1 Introducton So far n
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationSTATS 306B: Unsupervised Learning Spring Lecture 10 April 30
STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear
More informationAn Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation
An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationDifferentiating Gaussian Processes
Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationChapter Newton s Method
Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationArtificial Intelligence Bayesian Networks
Artfcal Intellgence Bayesan Networks Adapted from sldes by Tm Fnn and Mare desjardns. Some materal borrowed from Lse Getoor. 1 Outlne Bayesan networks Network structure Condtonal probablty tables Condtonal
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationCourse 395: Machine Learning - Lectures
Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng
More informationProbability Density Function Estimation by different Methods
EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT Probablty Densty Functon Estmaton by dfferent Methods Vas Chandraant Rayar Abstract The am of the assgnment was to estmate the probablty densty functon (PDF of
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationStatistical Foundations of Pattern Recognition
Statstcal Foundatons of Pattern Recognton Learnng Objectves Bayes Theorem Decson-mang Confdence factors Dscrmnants The connecton to neural nets Statstcal Foundatons of Pattern Recognton NDE measurement
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationStatistical pattern recognition
Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informationLecture 2: Prelude to the big shrink
Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson
More informationMultilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata
Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,
More informationMDL-Based Unsupervised Attribute Ranking
MDL-Based Unsupervsed Attrbute Rankng Zdravko Markov Computer Scence Department Central Connectcut State Unversty New Brtan, CT 06050, USA http://www.cs.ccsu.edu/~markov/ markovz@ccsu.edu MDL-Based Unsupervsed
More information4DVAR, according to the name, is a four-dimensional variational method.
4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationHopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen
Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationHow its computed. y outcome data λ parameters hyperparameters. where P denotes the Laplace approximation. k i k k. Andrew B Lawson 2013
Andrew Lawson MUSC INLA INLA s a relatvely new tool that can be used to approxmate posteror dstrbutons n Bayesan models INLA stands for ntegrated Nested Laplace Approxmaton The approxmaton has been known
More informationReview: Fit a line to N data points
Revew: Ft a lne to data ponts Correlated parameters: L y = a x + b Orthogonal parameters: J y = a (x ˆ x + b For ntercept b, set a=0 and fnd b by optmal average: ˆ b = y, Var[ b ˆ ] = For slope a, set
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours
UNIVERSITY OF TORONTO Faculty of Arts and Scence December 005 Examnatons STA47HF/STA005HF Duraton - hours AIDS ALLOWED: (to be suppled by the student) Non-programmable calculator One handwrtten 8.5'' x
More informationErratum: A Generalized Path Integral Control Approach to Reinforcement Learning
Journal of Machne Learnng Research 00-9 Submtted /0; Publshed 7/ Erratum: A Generalzed Path Integral Control Approach to Renforcement Learnng Evangelos ATheodorou Jonas Buchl Stefan Schaal Department of
More informationFinding Dense Subgraphs in G(n, 1/2)
Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng
More informationRandomness and Computation
Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually
More informationLinear Classification, SVMs and Nearest Neighbors
1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush
More informationClustering gene expression data & the EM algorithm
CG, Fall 2011-12 Clusterng gene expresson data & the EM algorthm CG 08 Ron Shamr 1 How Gene Expresson Data Looks Entres of the Raw Data matrx: Rato values Absolute values Row = gene s expresson pattern
More informationMaximum Likelihood Estimation
Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?
More informationThe big picture. Outline
The bg pcture Vncent Claveau IRISA - CNRS, sldes from E. Kjak INSA Rennes Notatons classes: C = {ω = 1,.., C} tranng set S of sze m, composed of m ponts (x, ω ) per class ω representaton space: R d (=
More information14 Lagrange Multipliers
Lagrange Multplers 14 Lagrange Multplers The Method of Lagrange Multplers s a powerful technque for constraned optmzaton. Whle t has applcatons far beyond machne learnng t was orgnally developed to solve
More informationThe EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X
The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][
More informationLecture 4: Universal Hash Functions/Streaming Cont d
CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected
More informationCHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD
CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More informationBIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data
Lab : TWO-LEVEL NORMAL MODELS wth school chldren popularty data Purpose: Introduce basc two-level models for normally dstrbuted responses usng STATA. In partcular, we dscuss Random ntercept models wthout
More informationCS-433: Simulation and Modeling Modeling and Probability Review
CS-433: Smulaton and Modelng Modelng and Probablty Revew Exercse 1. (Probablty of Smple Events) Exercse 1.1 The owner of a camera shop receves a shpment of fve cameras from a camera manufacturer. Unknown
More informationInternet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks
Internet Engneerng Jacek Mazurkewcz, PhD Softcomputng Part 3: Recurrent Artfcal Neural Networks Self-Organsng Artfcal Neural Networks Recurrent Artfcal Neural Networks Feedback sgnals between neurons Dynamc
More informationEngineering Risk Benefit Analysis
Engneerng Rsk Beneft Analyss.55, 2.943, 3.577, 6.938, 0.86, 3.62, 6.862, 22.82, ESD.72, ESD.72 RPRA 2. Elements of Probablty Theory George E. Apostolaks Massachusetts Insttute of Technology Sprng 2007
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationWhy Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)
Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More information8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF
10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the
More information9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov
9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationLogistic Regression Maximum Likelihood Estimation
Harvard-MIT Dvson of Health Scences and Technology HST.951J: Medcal Decson Support, Fall 2005 Instructors: Professor Lucla Ohno-Machado and Professor Staal Vnterbo 6.873/HST.951 Medcal Decson Support Fall
More informationAnnouncements EWA with ɛ-exploration (recap) Lecture 20: EXP3 Algorithm. EECS598: Prediction and Learning: It s Only a Game Fall 2013.
Lecture 0: EXP3 Algorthm 1 EECS598: Predcton and Learnng: It s Only a Game Fall 013 Prof. Jacob Abernethy Lecture 0: EXP3 Algorthm Scrbe: Zhhao Chen Announcements None 0.1 EWA wth ɛ-exploraton (recap)
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More information