Variational Mixture of Gaussians. Sargur Srihari

Size: px
Start display at page:

Download "Variational Mixture of Gaussians. Sargur Srihari"

Transcription

1 Variational Mixture of Gaussians Sargur 1

2 Objective Apply variational inference machinery to Gaussian Mixture Models Demonstrates how Bayesian treatment elegantly resolves difficulties with maximum likelihood issues Many more complex distributions can be solved using straightforward extensions of this analysis 2

3 Graphical Model for GMM Graphical model corresponding to likelihood function of standard GMM: Plate otation: Equivalent networks Directed acyclic graph Representing mixture For each observation x n we have a corresponding latent latent variable z n A 1-of- binary vector with elements z nk for,.. Denote observed data by X={x 1,..,x } Latent variables by Z={z 1,..,z } 3

4 Likelihood Function for GMM Mixture density function is Therefore Likelihood function is p(x π,µ,σ) = n =1 x π k (x n µ k,σ k ) Since z has values {z k } with probabilities π k Product is over the i.i.d. samples Therefore log-likelihood function is ln p(x π,µ,σ) = n =1 ln π k (x n µ k,σ k ) Find parameters, π, µ and Σ that maximize log-likelihood A more difficult problem than for a single Gaussian 4

5 GMM m.l.e. expressions Obtained using derivatives of log-likelihood 1 µ = γ( z )x k nk n k n= 1 Σ k = 1 k γ(z nk )(x n µ k )(x n µ k ) T n =1 k π k = k = γ(z nk ) n =1 Parameters (means) All three are in terms of responsibliities Parameters(covariance matrices) Parameters (Mixing Coefficients) ot closed form solutions for the parameters Since γ (z nk ) the responsibilities depend on those parameters in a complex way 5

6 EM For GMM E step use current value of parameters µ k,σ k,π k to evaluate posterior probabilities p(z/x), γ (z ) nk i.e., responsibilities M step use these posterior probabilities to to re-estimate p(x,z): means, covariances and mixing coefficients wrt p(z/x) 6

7 Graphical model for Bayesian GMM GMM Bayesian GMM Mixing coefficients precisions means To specify model we need these conditional probabilities: 1. p(z π): conditional distribution of Z given mixing coeffts 2. p(x Z, µ, Λ): 3. p(π): distribution of mixing coefficients 4. p(µ,λ): prior governing mean and precision of each component 7

8 Conditional Distribution Expressions 1. Conditional distribution of Z={z 1,.,z } given mix coefficients π Since components are mutually exclusive 2. Conditional distribution of observed data X={x 1,..,x } given latent variables and component parameters p(x Z, µ, Λ) Since components are Gaussian p(x Z,µ,Λ) = p(z π) = n=1 π k z nk p(x z) = (x n µ k,λ 1 k ) z nk p(z) = x µ k,σ k π k z k ( ) z k where µ ={µ k } and Λ={Λ k } use of precision matrix simplifies further analysis 8

9 Parameter Priors: Mixing Coefficients 3. Distribution of mixing coefficients p(π) Conjugate priors simplify analysis Dirichlet distribution over π p(π) = Dir(π α 0 ) = C(α 0 ) π k α 0 1 We have chosen the same parameter α 0 for each of the components C(α 0 ) is the normalization constant for the Dirichlet distribution 9

10 Parameter Priors: Mean, Precision 4. Distribution of Mean and Precision of Gaussian components p(µ,λ) Gaussian-Wishart prior is p(µ,λ) = p(µ Λ) p(λ) = µ k m 0 (β 0 Λ k ) 1 W (Λ k W 0,ν 0 ) k =1 ( ) Which represents the conjugate prior when both mean and precision are unknown Resulting model has: Link between Λ and µ Due to distribution (4) above

11 Bayesian etwork for Bayesian GMM Joint of all random variables: p(x,z,π,µ,λ) = p(x Z,µ,Λ)p(Z π)p(µ Λ)p(Λ) All the factors were given earlier Only X={x 1,..,x } are observed This B provides a nice distinction between latent variables and parameters Variables such as z n that appear inside the plate are latent variables o of such variables grows with data set Variables outside the plate are parameters Fixed in no. and outside of data set Mixing Coeffts Precisions Means 11 From viewpoint of PGMs no fundamental difference

12 Recall GMM The variational approach The EM approach: 1. Evaluation of posterior distribution p(z X) 2. Evaluation of expectation of p(x,z) wrt to p(z X) Our goal is to specify the variational distribution q(z,π,µ,λ) which will specify p(z,π,µ,λ X) Recall ln p(x) = L(q) + L(q p) where L(q) = and p(x) = p(z)p(x z) = π k x µ k,σ k " q(z)ln p(x,z) % # & dz $ q(z) ' " L{q p} = q(z)ln# $ z p(z X) q(z) % & dz ' ( ) Here p(z) has parameter π with distribution p(π) 12

13 Variational Distribution In variational inference we can specify q by M using a factorized distribution q(z) = q i (Z i ) For Bayesian GMM the latent variables and parameters are Z, π, µ and Λ. So we consider the variational distribution q(z,π,µ,λ)=q(z)q(π,µ,λ) Remarkably, this is the only assumption needed for a tractable solution to a Bayesian Mixture Model Functional forms of both q(z) and q(π,µ,λ) are determined automatically by optimizing the variational distribution 13 i=1 Subscripts for q s omitted

14 Sequential update equations Using general result of factorized distributions When L(q) is defined as the q that makes the functional L(q) largest is For Bayesian GMM log of optimized factor is Since lnq * j (Z j ) = E i j [ ln p(x,z) ]+ const lnq *(Z) = E π,µ,λ p(x, Z) L(q) = q(z)ln dz = q i ln p(x, Z) lnq i q(z) dz i i ( ) ln p X,Z, π,µ,λ +const p(x,z,π,µ,λ) = p(x Z,µ,Λ)p(Z π)p(µ Λ)p(Λ) lnq *(Z) = E π ln p(z π) + E µ,λ ln p(x Z,µ,Λ) +const we have ote: Expectations are are just weighted sums 14

15 Simplification of q*(z) Expression for factor q*(z) lnq *(Z) = E π ln p(z π) + E µ,λ Absorbing terms not depending on Z into constant lnq *(Z) = n=1 z nk where ln ρ nk = E ln π k ln ρ nk +const where D is dimensionality of data variable x Taking exponentials on both sides ormalized distribution is ln p(x Z,µ,Λ) +const E ln λ k D 2 ln(2π) 1 2 E (x µ k Δ k n µ k ) T Λ k (x k µ k ) q *(Z) n=1 z ρ nk nk q *(Z) = n=1 z r nk nk where r nk = ρ nk j=1 ρ nj r nk are positive since ρ nk are exponentials of real nos. and will sum to one as required 15

16 Factor q*(z) has same form as prior ormalized distribution is We have found form of q* to maximize the functional L(q) It has same form as prior q *(Z) = p(z π) = Distribution q*(z) is discrete and has the standard result E[z nk ]=r nk, which play the role of responsibilities Since equations for q*(z) depend on moments of other variables They are coupled and solved iteratively n=1 n=1 π k z nk z r nk nk 16

17 Variational EM Variational E-step: determine responsibilities r nk Variational M-step: 1. determine statistics of data set k = n=1 x k = 1 k S k = 1 k r nk r nk x n n=1 n=1 r nk ( x n x n )( x n x n ) T Responsibility of k th component Mean of k th component Covariance matrix of k th component and 2. find optimal solution for the factor q(π,µ,λ) 17

18 Factorization of q(π,µ,λ) Using general result of factorized distributions lnq j * (Z j ) = E i j [ ln p(x,z) ]+ const We can write ( ) lnq *(π,µ,λ) = ln p(π) + ln p µ k,λ k + E Z ln p(z π) + E z nk 1 ( ) ln x µ,λ k k k which decomposes into terms involving π and only µ,λ The terms involving µ and Λ comprise sum of terms involving µ k and Λ k leading to factorization n=1 +const q(π,µ,λ) = q(π) q(µ k,λ k ) 18

19 Factor q(π) is a Dirichlet Given the factorization Consider each factor in turn: q(π) and q(µ k,λ k ) (2a) Identifying terms depending on π, q(π) has the solution lnq *(π) = (α 0 1) lnz k + r nk ln π k +const Taking exponential on both sides we get q*(π) as a Dirichlet n=1 q(π,µ,λ) = q(π) q(µ k,λ k ) q *(π) = Dir(π α) where α has the components α k =α 0 + k Dirichlet: Dir Γ( α ) 0 k 1 ( µ α) = µ where α = k 0 αk Γ( α1)... Γ( αk ) k= 1 k= 1 α =3 α k =0.1

20 Factor q*(µ k,λ k ) is a Gaussian-Wishart (2b) Variational posterior for q*(µ k,λ k ) Does not further factorize into marginals It is a Gaussian-Wishart distribution q *(µ k,λ k ) = ( µ k m k (β k Λ k ) 1 )W (Λ k W 0,ν 0 ) W is the Wishart distribution It has the form W(Λ W,ν)=B Λ (ν-d-1)/2 exp[-½tr(w -1 Λ)] where ν is the no. of degrees of freedom, W is a D x D scale matrix and Tr is the trace. B(W,ν) is a normalization constant It is the conjugate prior for a Gaussian with known mean and unknown precision matrix Λ 20

21 Parameters of q*(µ k,λ k ) Gaussian-Wishart q *(µ k,λ k ) = ( µ k m k (β k Λ k ) 1 )W (Λ k W 0,ν 0 ) where we have defined β k = β 0 + k m k = 1 β k ( β 0 m 0 + k x k ) W k 1 =W k S k + υ k = υ 0 + k + 1 β 0 k ( x β 0 + k m 0 )( x k m 0 ) T k These update equations are analogous to M- step of EM for m.l. solution of GMM Involve evaluation of same sums as EM over the data set 21

22 Expression for Responsibilities For the M step we need expectations E[z nk ]=r nk Which are obtained by normalizing ρ nk Since where ln ρ nk = E[ lnπ k ]+ 1 2 E [ ln λ k ] D 2 ln(2π ) 1 2 E µ k Δ k (x n µ k ) T Λ k (x k µ k ) The three expectations wrt variational distribution of parameters are easily evaluated to give lnπ k E[ lnπ k ] =ψ (α k ) ν ln Λ k 1 2 E ln λ k ψ is the digamma function with Digamma r nk = ρ nk j=1 ρ nj D [ ] = ψ ν +1 i k i=1 2 + Dln2 + ln W k E µk Λ k (x n µ k ) T Λ k (x k µ k ) = Dβ 1 k +ν k (x n µ k ) T Λ k (x k µ k ) ψ (a) = d da lnγ(a) ˆα = α k k ν is the no. of degrees of freedom of Wishart appears in the definition of Dirichlet 22

23 Evaluation of Responsibilities Substituting the three expectations into ln ρ nk r nk!π! k Λ 1/2 exp D υ k ( 2β k 2 x m n k )W k (x n m k ) This is similar to responsibilities for mle for EM γ(z k ) p(z k =1 x) = p(z k =1) p(x z k =1) r p(z j =1) p(x z j =1) which can be written in the form j=1 = π k(x µ k,σ k ) j=1 π j (x µ k,σ j ) 1/2 r nk π k Λ k exp 1 2 x ( n µ k )Λ k (x n µ k ) where we have used precision Λ k instead of covariance Σ k to highlight similarity 23

24 Summary of Optimization Optimization of variational posterior distribution involves cycling between two stages Analogous to E and M steps on m.l. EM Variational E-step: Use current distribution over model parameters to evaluate moments and hence evaluate E[z nk ]=r nk Variational M step keep responsibilities fixed; use them to recompute variational distribution over the parameters using q *(π) = Dir(π α) and q *(µ k,λ k ) = ( µ k m k (β k Λ k ) 1 )W (Λ k W 0,ν 0 ) 24

25 Variational Bayesian GMM Old Faithful data set =6 components After convergence there are only two components Density of red ink inside each ellipse shows Mean value of Mixing coefficients 25

26 Similarity of Variational Bayes and EM Close similarity between variational solution for the Bayesin mixture of Gaussians and the EM algorithm for maximum likelihood In the limit as à, the Bayesian treatment converges to the maximum likelihood EM Variational algorithm is more expensive but problem of singularity is eliminated 26

27 Variational Lower Bound We can straight-forwardly evaluate the lower bound L(q) for this model Recall ln p(x) = L(q) + L(q p) where L(q) = q(z)ln and p(x, Z) q(z) L{q p} = q(z)ln The lower bound is used to monitor reestimation to test for convergence dz p(z X) q(z) dz 27

28 Predictive Density In using a Bayesian GMM we will be interested in the predictive density for a new value ˆx of the observed variable Assuming corresponding latent variable we can show that p( ˆx X) = 1ˆα ( ) α k St ˆx m k,l k,ν k +1-D where the k th component has mean m k and precision ( L k = ν k +1 D)β k W ( 1+ β k ) k The mixture of Student s T becomes a GMM as à 28

29 Determining no. of components Plot of variational lower bound L versus no. of components Distinct peak at =2 For each model is trained from 100 starts Results shown as + 29

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Latent Variable View of EM. Sargur Srihari

Latent Variable View of EM. Sargur Srihari Latent Variable View of EM Sargur srihari@cedar.buffalo.edu 1 Examples of latent variables 1. Mixture Model Joint distribution is p(x,z) We don t have values for z 2. Hidden Markov Model A single time

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

MCMC and Gibbs Sampling. Sargur Srihari

MCMC and Gibbs Sampling. Sargur Srihari MCMC and Gibbs Sampling Sargur srihari@cedar.buffalo.edu 1 Topics 1. Markov Chain Monte Carlo 2. Markov Chains 3. Gibbs Sampling 4. Basic Metropolis Algorithm 5. Metropolis-Hastings Algorithm 6. Slice

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of Discussion Functionals Calculus of Variations Maximizing a Functional Finding Approximation to a Posterior Minimizing K-L divergence Factorized

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x (i) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting,

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Yoan Miche CIS, HUT March 17, 27 Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 1 / 23 Mise en Bouche Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 2 / 23 Mise

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Variational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller

Variational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller Variational Message Passing By John Winn, Christopher M. Bishop Presented by Andy Miller Overview Background Variational Inference Conjugate-Exponential Models Variational Message Passing Messages Univariate

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Bayesian Linear Regression. Sargur Srihari

Bayesian Linear Regression. Sargur Srihari Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2017

Cheng Soon Ong & Christian Walder. Canberra February June 2017 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX

More information

EM & Variational Bayes

EM & Variational Bayes EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1 Introduction 1.2 Example: Mixture of vmfs 2. Variational Bayes 2.1 Introduction 2.2 Example: Bayesian Mixture of

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

A Gradient-Based Algorithm Competitive with Variational Bayesian EM for Mixture of Gaussians

A Gradient-Based Algorithm Competitive with Variational Bayesian EM for Mixture of Gaussians A Gradient-Based Algorithm Competitive with Variational Bayesian EM for Mixture of Gaussians Miael Kuusela, Tapani Raio, Antti Honela, and Juha Karhunen Abstract While variational Bayesian (VB) inference

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

ECE285/SIO209, Machine learning for physical applications, Spring 2017

ECE285/SIO209, Machine learning for physical applications, Spring 2017 ECE285/SIO209, Machine learning for physical applications, Spring 2017 Peter Gerstoft, 534-7768, gerstoft@ucsd.edu We meet Wednesday from 5 to 6:20pm in Spies Hall 330 Text Bishop Grading A or maybe S

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Technical Details about the Expectation Maximization (EM) Algorithm

Technical Details about the Expectation Maximization (EM) Algorithm Technical Details about the Expectation Maximization (EM Algorithm Dawen Liang Columbia University dliang@ee.columbia.edu February 25, 2015 1 Introduction Maximum Lielihood Estimation (MLE is widely used

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Variational inference

Variational inference Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

Clustering, K-Means, EM Tutorial

Clustering, K-Means, EM Tutorial Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Instructor: Dr. Volkan Cevher. 1. Background

Instructor: Dr. Volkan Cevher. 1. Background Instructor: Dr. Volkan Cevher Variational Bayes Approximation ice University STAT 631 / ELEC 639: Graphical Models Scribe: David Kahle eviewers: Konstantinos Tsianos and Tahira Saleem 1. Background These

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Probabilistic Graphical Models for Image Analysis - Lecture 4

Probabilistic Graphical Models for Image Analysis - Lecture 4 Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

Manifold Constrained Variational Mixtures

Manifold Constrained Variational Mixtures Manifold Constrained Variational Mixtures Cédric Archambeau and Michel Verleysen Machine Learning Group - Université catholique de Louvain, Place du Levant 3, B-1348 Louvain-la-Neuve, Belgium {archambeau,

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

Online Algorithms for Sum-Product

Online Algorithms for Sum-Product Online Algorithms for Sum-Product Networks with Continuous Variables Priyank Jaini Ph.D. Seminar Consistent/Robust Tensor Decomposition And Spectral Learning Offline Bayesian Learning ADF, EP, SGD, oem

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The

More information

Variational Inference and Learning. Sargur N. Srihari

Variational Inference and Learning. Sargur N. Srihari Variational Inference and Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics in Approximate Inference Task of Inference Intractability in Inference 1. Inference as Optimization 2. Expectation Maximization

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Bayesian Statistics and Data Assimilation. Jonathan Stroud. Department of Statistics The George Washington University

Bayesian Statistics and Data Assimilation. Jonathan Stroud. Department of Statistics The George Washington University Bayesian Statistics and Data Assimilation Jonathan Stroud Department of Statistics The George Washington University 1 Outline Motivation Bayesian Statistics Parameter Estimation in Data Assimilation Combined

More information

Algorithms for Variational Learning of Mixture of Gaussians

Algorithms for Variational Learning of Mixture of Gaussians Algorithms for Variational Learning of Mixture of Gaussians Instructors: Tapani Raiko and Antti Honkela Bayes Group Adaptive Informatics Research Center 28.08.2008 Variational Bayesian Inference Mixture

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

More information

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016 Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun Y. LeCun: Machine Learning and Pattern Recognition p. 1/? MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun The Courant Institute, New York University http://yann.lecun.com

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro

More information