Chapter 08: Direct Maximum Likelihood/MAP Estimation and Incomplete Data Problems
|
|
- Flora Gray
- 5 years ago
- Views:
Transcription
1 LEARNING AND INFERENCE IN GRAPHICAL MODELS Chapter 08: Direct Maximum Likelihood/MAP Estimation and Incomplete Data Problems Dr. Martin Lauer University of Freiburg Machine Learning Lab Karlsruhe Institute of Technology Institute of Measurement and Control Systems Learning and Inference in Graphical Models. Chapter 08 p. 1/28
2 References for this chapter Christopher M. Bishop, Pattern Recognition and Machine Learning, ch. 9, Springer, 2006 Joseph L. Schafer, Analysis of Incomplete Multivariate Data, Chapman&Hall, 1997 Zoubin Ghahramani, Michael I. Jordan, Learning from incomplete data, Technical Report #1509, MIT Artificial Intelligence Laboratory, /AIM-1509.pdf?sequence=2 Arthur P. Dempster, Nan M. Laird, Donald B. Rubin, Maximum Likelihood from Incomplete Data via the EM Algorithm, in: Journal of the Royal Statistical Society Series B, vol. 39, pp. 1-38, 1977 Xiao-Li Meng and Donald B. Rubin, Maximum likelihood estimation via the ECM algorithm: A general framework, in: Biometrika, vol. 80, no. 2, pp , 1993 Learning and Inference in Graphical Models. Chapter 08 p. 2/28
3 Motivation up to now: 1. calculate/approximate p(parameters data) 2. find a meaningful reference value for p(parameters data), e.g. argmax parameters p(parameters data) requires more calculation than is actually necessary this chapter: findargmax parameters findargmax parameters p(parameters data) direct (MAP) or p(data parameters) direct (ML) Remark: ML and MAP require basically the same approaches. The only difference is whether we consider priors (which are just additional factors in graphical models). Therefore, we consider both approaches together. Learning and Inference in Graphical Models. Chapter 08 p. 3/28
4 Direct MAP calculation Posterior distribution in a graphical model: p(u 1,...,u n o 1,...,o m )= p(u 1,...,u n,o 1,...,o m ) p(o 1,...,o m ) p(u 1,...,u n,o 1,...,o m ) = i f i (Neighbors(i)) MAP means: solve arg max u 1,...,u n i logf i (Neighbors(i)) =e i logf i(neighbors(i)) Learning and Inference in Graphical Models. Chapter 08 p. 4/28
5 Direct MAP calculation Ways to find the MAP The systems of equations u j i logf i (Neighbors(i)) = 0 can be resolved analytically Each equation u j i logf i (Neighbors(i)) = 0 can be solved analytically analytical solution for MAP use an iterative approach Learning and Inference in Graphical Models. Chapter 08 p. 5/28
6 Direct MAP calculation Iterative approach 1. repeat 2. setu 1 argmax u1 i logf i(neighbors(i)) 3. setu 2 argmax u2 i logf i(neighbors(i)) setu n argmax un i logf i(neighbors(i)) 6. until convergence 7. return(u 1,...,u n ) The derivatives u j i logf i (Neighbors(i)) can be calculated easily numerical solution use generic gradient descent algorithm for Second approach often converges faster than generic gradient descent Learning and Inference in Graphical Models. Chapter 08 p. 6/28
7 Example: bearing-only tracking revisited observing a moving object from a fixed position object moves with constant velocity for every point in time, observer senses angle of observation, but only sometimes distance to object distributions: x 0 N( a,r) v N( b,s) y i x 0, v N( x 0 +t i v,σ 2 I) r i = y i w i = y i y i angle of observation σ t i x 0 unknown object movement w i y i unknown distance v r i x 0 v observer w i r i n Learning and Inference in Graphical Models. Chapter 08 p. 7/28
8 Example: bearing-only tracking revisited conditional distributions: x 0 v,( y i ),(t i ) N ( ( n σ 2I +R 1 ) 1 ( 1 σ 2 ( yi t i v)+r 1 a), ( n σ 2I +R 1 ) 1) v x 0,( y i ),(t i ) N ( ( 1 σ 2 t 2 i I +S 1 ) 1 ( 1 σ 2( t i ( y i x 0 ))+S 1 b), ( 1 σ 2 t 2 i I +S 1 ) 1) r i x 0, v,t i, w i N( w T i ( x 0 +t i v),σ 2 ) updates derived from conditionals: x 0 ( n σ 2I +R 1 ) 1 ( 1 σ 2 ( yi t i v)+r 1 a) v ( 1 σ 2 t 2 i I +S 1 ) 1 ( 1 σ 2( t i ( y i x 0 ))+S 1 b) r i w T i ( x 0 +t i v) Matlab demo (using non-informative priors) Learning and Inference in Graphical Models. Chapter 08 p. 8/28
9 Example: Gaussian mixtures revisited m 0 r 0 a 0 b 0 µ j s j k β w µ j N(m 0,r 0 ) s j Γ 1 (a 0,b 0 ) w D( β) Z i w C( w) X i Z i,µ Zi,s Zi N(µ Zi,s Zi ) X i Z i n Learning and Inference in Graphical Models. Chapter 08 p. 9/28
10 Example: Gaussian mixtures revisited conditional distributions: see slide 07/36 derive MAP updates: β 1 +n 1 1 w ( n k + k j=1 β j µ j s jm 0 +r 0 s j +n j r 0 i z i =j x i,..., β k +n k 1 n k + k j=1 β j ) withn j = {i z i = j} s j b z i argmax j i z i =j (x i µ j ) 2 1+a 0 + n j 2 w j 2πsj e 1 2 (x i µ j ) 2 Matlab demo (using priors close to non-informativity) s j Learning and Inference in Graphical Models. Chapter 08 p. 10/28
11 Example: Gaussian mixtures revisited Observations: convergence is very fast m 0 r 0 a 0 b 0 β result depends very much from initialization µ j s j k we treatz i like parameters of the model although mixture w model is completely specified by w,µ 1,...,µ k,s 1,...,s k z i are no parameters of the mixture but latent variables X i Z i n which are only used to simplify our calculations why should we maximize posterior w.r.tz i? Learning and Inference in Graphical Models. Chapter 08 p. 11/28
12 Latent variables Latent variables are not part of the stochastic model not interesting for the final estimate useful to simplify calculations often interpreted as missing observation Examples the class assignment variablesz i in the mixture modeling can be interpreted as missing class labels for a multi-class distribution the missing distancesr i in the bearing-only tracking task can be interpreted as missing parts of the data occluded parts of an object in an image can be seen as missing pixels data from a statistical evaluation which have been lost Learning and Inference in Graphical Models. Chapter 08 p. 12/28
13 Incomplete data problems Let us assume that all data x are split into an observed part y and a missing part z, i.e. x = ( y, z). We can distinguish three cases: completely missing at random (CMAR): whether an entry of x belongs to y or z is stochastically independent on both, y and z P(x i belongs to z) = P(x i belongs to z y) = P(x i belongs to z y, z) missing at random (MAR): whether an entry of x belongs to y or z is stochastically independent of z but might depend on y P(x i belongs to z) P(x i belongs to z y) = P(x i belongs to z y, z) censored data: whether an entry of x belongs to y or z is stochastically dependent on z P(x i belongs to z y) P(x i belongs to z y, z) Learning and Inference in Graphical Models. Chapter 08 p. 13/28
14 Incomplete data problems Discuss the following examples of incomplete data: thez i in mixture models a sensor that measures values only down to a certain minimal value an interrupted connection between a sensor and a host computer so that some measurements are not transmitted a stereo camera system that measures light intensity and distance but is unable to calculate the distance for overexposed areas a sensor that fails often if temperatures are low if the sensor measures the activities of the sun if the sensor measures the persons on a beach non-responses at public opinion polls Learning and Inference in Graphical Models. Chapter 08 p. 14/28
15 Incomplete data problems consequences for stochastic analysis CMAR: no problem at all, incomplete data do not disturb our results MAR: can be treated if we model the stochastic dependency between the observed data and the missing data censored data: no general treatment at all possible. Results will be disturbed. No reconstruction of missing data possible we focus on the CMAR+MAR case here Learning and Inference in Graphical Models. Chapter 08 p. 15/28
16 Inference for incomplete data problems variational Bayes, Monte Carlo: Model the full posterior over the parameters of the model and the latent (missing) data. Afterwards, ignore the latent variables and return the result for the parameters of your model. direct MAP/ML: do not maximize the posterior/likelihood over the parameters and the latent variables. But, consider all possible values that can be taken by the latent variables and maximize the posterior/likelihood only w.r.t. the parameters of your stochastic model. expectation-maximization algorithm (EM) expectation-conditional-maximization algorithm (ECM) Learning and Inference in Graphical Models. Chapter 08 p. 16/28
17 EM algorithm Let us denote the parameters of the stochastic model (the posterior distribution) θ = (θ 1,...,θ k ) the latent variables λ = (λ1,...,λ m ) the observed data o = (o 1,...,o n ) the log-posterior L( θ, λ, o) = i logf i (Neighbors(i)) Learning and Inference in Graphical Models. Chapter 08 p. 17/28
18 EM algorithm We aim at maximizing the expected log-posterior over all values of the latent variables arg max θ R m L( θ, λ, o) p( λ θ, o) d λ an iterative approach to solve it 1. start with some parameter vector θ 2. repeat 3. Q( θ ) R m L( θ, λ, o) p( λ θ, o) d λ 4. θ argmax θ Q( θ ) 5. until convergence This algorithm is known as expectation maximization algorithm (Dempster, Laird, Rubin, 1977) step 3: expectation step (E-step) step 4: maximization step (M-step) Learning and Inference in Graphical Models. Chapter 08 p. 18/28
19 EM algorithm Remarks: during the E-step intermediate variables are calculated which allow to representqwithout relying on the previous values of θ closed form expressions for Q and explicit maximization often requires lengthy algebraic calculations for some applications calculating the E-step means calculating the expectation values of latent variables. But this does not apply in general. Famous application areas mixture distributions learning hidden Markov models from example sequences (Baum-Welch-algorithm) Learning and Inference in Graphical Models. Chapter 08 p. 19/28
20 Example: bearing-only tracking revisited conditions distribution r i x o, v, w i N( w T i ( x 0 +t i v),σ 2 ) unknown object movement x 0 v the posterior distribution 1 2π 1 R e 2 ( x 0 a) T R 1 ( x 0 a) }{{} prior of x 0 1 2π 1 S e 2 ( v b) T S 1 ( v b) }{{} n i=1 1 2πσ 2e prior of v 1 2 x 0 +t i v r i w i 2 σ 2 } {{ } data term angle of observation σ t i x 0 w i y i unknown distance v r i observer w i r i n Learning and Inference in Graphical Models. Chapter 08 p. 20/28
21 Example: bearing-only tracking revisited... (after lengthy, error-prone calculations)... Q( x 0, v )=const 1 2 ( x 0 a) T R 1 ( x 0 a) 1 2 ( v b) T S 1 ( v b) 1 n x 0 +t i v ρ i w i 2 2 σ 2 with ρ i = i=1 { r i ( x 0 +t i v) T w i ifr i is observed ifr i is unobserved Determining the maxima w.r.t x 0, v ( ) R 1 + n I ( 1 σ 2 σ 2 ti )I ( 1 σ 2 ti )I S 1 +( 1 σ t 2 2 i )I ( ) x 0 v = ( ) R 1 a+ 1 σ 2 ρi w i S 1 b+ 1 σ 2 ti ρ i w i Matlab demo (using non-informative priors) Learning and Inference in Graphical Models. Chapter 08 p. 21/28
22 ECM algorithm We still aim at maximizing the expected log-posterior over all values of the latent variables arg max θ R m L( θ, λ, o) p( λ θ, o) d λ sometimes, the M-step of the EM-algorithm cannot be calculated, i.e. arg max θ 1,...,θ k Q( θ ) cannot be resolved analytically. But it might happen that arg max θ i Q( θ ) can be resolved for eachθ i or for groups of parameters Expectation-conditional-maximization algorithm (Meng & Rubin, 1993) Learning and Inference in Graphical Models. Chapter 08 p. 22/28
23 ECM algorithm Define a set of constraintsg i ( θ, θ) on the parameter set, e.g. g i : θ j = θ j for allj i replace the single M-step of the EM algorithm by a sequence of CM-steps, one for each constraint 1. start with some parameter vector θ 2. repeat 3. Q( θ ) R m L( θ, λ, o) p( λ θ, o) d λ (M-step) 4. θ argmax θ Q( θ ) subject tog 1 ( θ, θ) (CM-step) θ argmax θ Q( θ ) subject tog ν ( θ, θ) (CM-step) 7. until convergence Learning and Inference in Graphical Models. Chapter 08 p. 23/28
24 Example: Gaussian mixtures revisited m 0 r 0 a 0 b 0 µ j s j k β w µ j N(m 0,r 0 ) s j Γ 1 (a 0,b 0 ) w D( β) Z i w C( w) X i Z i,µ Zi,s Zi N(µ Zi,s Zi ) X i Z i n conditional distribution (cf. slide 07/35) z i = j w,x i,µ 1,...,µ k,s 1,...,s k C(h i,1,...,h i,k ) withh i,j w j 2πsj e 1 2 (x i µ j ) 2 s j Learning and Inference in Graphical Models. Chapter 08 p. 24/28
25 Example: Gaussian mixtures revisited Q( w,µ 1,...,µ k,s 1,...,s k ) = k k ( ( k µ 1 log( e 1 j m 0 2 r 0 ) + 2πr0 z 1 =1 z n =1 j=1 }{{} k j=1 b 0 s j ) log( ba 0 0 Γ(a 0 ) (s j) a0 1 e }{{} prior ofs j 1 log( e 1 2 2πs zj µ z j x i s z j ) } {{ } data term ofx i easily, we can maximize Q (blackboard/homework) prior ofµ j k (w j) βj 1 ) +log( Γ(β 1 + +β k ) Γ(β 1 ) Γ(β k ) j=1 }{{} prior of w ) ) hn,zn h 1,z1 + log(w z i ) }{{} data term ofz i + Learning and Inference in Graphical Models. Chapter 08 p. 25/28
26 Example: Gaussian mixtures revisited Matlab demo (using non-informative priors) Some observations on EM/ECM for Gaussian mixtures very popular very sensitive to initialization of parameters overfits the data if mixture is too large (for ML/MAP with non-informative priors) Learning and Inference in Graphical Models. Chapter 08 p. 26/28
27 Laplace approximation MAP calculates a best estimate. Can we derive an approximation for the posterior distribution? Idea: determine a Gaussian that is locally most similar to the posterior. Taylor-approximation of log-posterior around MAP estimate θ MAP logp( θ) logp( θ MAP )+grad( θ θ MAP )+ 1 2 ( θ θ MAP ) T H( θ θ MAP ) =logp( θ MAP )+ 1 2 ( θ θ MAP ) T H( θ θ MAP ) withh the Hessian oflogp log of a Gaussian aroundθ MAP : 1 log d 1 2π Σ 2 ( θ θ MAP ) T Σ 1 ( θ θ MAP ) We obtain the same shape of the Gaussian if we chooseσ 1 = H. This is known as Laplace approximation. Learning and Inference in Graphical Models. Chapter 08 p. 27/28
28 Summary direct maximization of likelihood/posterior latent variables incomplete data problems EM/ECM algorithm Laplace approximation Learning and Inference in Graphical Models. Chapter 08 p. 28/28
Gaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationChapter 04: Exact Inference in Bayesian Networks
LEARNING AND INFERENCE IN GRAPHICAL MODELS Chapter 04: Exact Inference in Bayesian Networks Dr. Martin Lauer University of Freiburg Machine Learning Lab Karlsruhe Institute of Technology Institute of Measurement
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationA Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Hidden Markov Models Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Additional References: David
More informationMH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution
MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous
More informationSpeech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.
Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Expectation-Maximization (EM)
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationU-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models
U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,
More informationWeighted Finite-State Transducers in Computational Biology
Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationLecture 6: Gaussian Mixture Models (GMM)
Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationChapter 05: Hidden Markov Models
LEARNING AND INFERENCE IN GRAPHICAL MODELS Chapter 05: Hidden Markov Models Dr. Martin Lauer University of Freiburg Machine Learning Lab Karlsruhe Institute of Technology Institute of Measurement and Control
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationLecture 6: April 19, 2002
EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationExpectation Maximization Mixture Models HMMs
11-755 Machine Learning for Signal rocessing Expectation Maximization Mixture Models HMMs Class 9. 21 Sep 2010 1 Learning Distributions for Data roblem: Given a collection of examples from some data, estimate
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationA Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution
A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationAdvanced Data Science
Advanced Data Science Dr. Kira Radinsky Slides Adapted from Tom M. Mitchell Agenda Topics Covered: Time series data Markov Models Hidden Markov Models Dynamic Bayes Nets Additional Reading: Bishop: Chapter
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationConvergence Rate of Expectation-Maximization
Convergence Rate of Expectation-Maximiation Raunak Kumar University of British Columbia Mark Schmidt University of British Columbia Abstract raunakkumar17@outlookcom schmidtm@csubcca Expectation-maximiation
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationMIXTURE MODELS AND EM
Last updated: November 6, 212 MIXTURE MODELS AND EM Credits 2 Some of these slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Simon Prince, University College London Sergios Theodoridis,
More informationGraphical models: parameter learning
Graphical models: parameter learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London London WC1N 3AR, England http://www.gatsby.ucl.ac.uk/ zoubin/ zoubin@gatsby.ucl.ac.uk
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationAn introduction to Variational calculus in Machine Learning
n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationMixtures of Rasch Models
Mixtures of Rasch Models Hannah Frick, Friedrich Leisch, Achim Zeileis, Carolin Strobl http://www.uibk.ac.at/statistics/ Introduction Rasch model for measuring latent traits Model assumption: Item parameters
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationMarkov Chains and Hidden Markov Models
Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationOverlapping Astronomical Sources: Utilizing Spectral Information
Overlapping Astronomical Sources: Utilizing Spectral Information David Jones Advisor: Xiao-Li Meng Collaborators: Vinay Kashyap (CfA) and David van Dyk (Imperial College) CHASC Astrostatistics Group April
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationEstimating the parameters of hidden binomial trials by the EM algorithm
Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationMachine Learning for Data Science (CS4786) Lecture 12
Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationHuman-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg
Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationDoes the Wake-sleep Algorithm Produce Good Density Estimators?
Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto
More informationAccelerating the EM Algorithm for Mixture Density Estimation
Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18 Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationUniversity of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I
University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()
More informationHidden Markov models
Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationEM for ML Estimation
Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation
More informationA Note on the Expectation-Maximization (EM) Algorithm
A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationMachine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /
Machine Learning for Signal rocessing Expectation Maximization Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning Distributions for Data roblem: Given a collection of examples from some data,
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationSeries 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)
Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof
More informationLast lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton
EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares
More informationThe Expectation Maximization Algorithm
The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-- February Abstract This note represents my attempt at explaining
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationPrincipal Component Analysis (PCA) for Sparse High-Dimensional Data
AB Principal Component Analysis (PCA) for Sparse High-Dimensional Data Tapani Raiko Helsinki University of Technology, Finland Adaptive Informatics Research Center The Data Explosion We are facing an enormous
More informationCOMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)
COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24 Administrivia This course Topics http://www.satyenkale.com/coms4771/ 1. Supervised learning Core
More informationHuman Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data
Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data 0. Notations Myungjun Choi, Yonghyun Ro, Han Lee N = number of states in the model T = length of observation sequence
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More information